Creating ebooks from book scans …. on Linux
Refined semiprofessional document scanning within Linux: here is a little collection of procedures and hints towards the production of e-books on your Linux system. I assume that your scanner is already running on Sane (see below) and you know how to get the suggested software packages from your repository if available. The running system is a Debian GNU/Linux testing (“Squeeze”) – I’ll give some availability information concerning other distributions (are there?) but not in a comprehensive way. In doubt please check the relevant project homepages and/or the source code hosts for more information on the applications. All the software mentioned is open source, so if nobody thought providing a packet for your distribution there is always the way to compile the code on your own – but it is for sure much more convenient to receive software from the repository through your packet management or at least to get it manually from the programmer or from somewhere else. By the way, most of the packets are available for non-Linux operating systems also. An easy way to get Linux stuff running on your Windows PC is Cygwin.
The e-book which is going to be produced is a single sided, black & white (b/w), OCR layer containing djvu or pdf file (“the containers”).
1. Scanning
Fortunately there is no specific insider knowledge needed for scanning on Linux, Sane is definitely the application which is mostly used for that purpose (libraries at Debian testing currently 1.0.20-13, frontends 1.0.14-9 – quite up-to-date!), and which should be available on most of the everyday Linux systems. The frontend Xsane (0.996-3) is quite convenient for batch scanning. It allows to choose a scan area if the book is smaller than the scanner, the pages could be saved rotated 90°, and you can auto adjust gamma/brightness/contrast after getting the first picture or the preview (the big buttons, the 2nd from the left). Other scanning solutions are surely also possible, the command line frontend scanimage for example is open to be run from a shell loop with custom intervals to save the need to push a button to proceed after every page.
I am scanning at 300 DPI grayscale with the file extension .pgm [1] because my scanner backend doesn’t support b/w scanning currently (I know, I know …), but if with your model it’s possible you could try to scan at b/w and skip the step converting the files to that after scanning. For post processing reasons it’s right to create a sequence 001.pgm, 002.pgm etc. and Xsane takes care of that. Usually there are two book pages in one picture – we are going to work on that next.
To discuss the DPI rate, when you check the 300 DPI outcome with your favourite image viewer you’ll see that the scans are much bigger than it would be necessary for reading on the screen but that’s just the right way because the images will appear shrinked in the containers and also the outcome of the conversion to b/w is better than producing scans with a lower DPI rate. And other as it is the case with grayscales, after converting to b/w different DPI rates have not such a significant effect on the overall file size of the final product so that there is no need to go below 300 to save resources.
[1] .pgm instead of the meta extension .pnm to separate it from .pbm after converting it to b/w (next step), and furthermore not as .tif because the post processing tool Unpaper (see below) couldn’t work with that.
2. Batch postprocessing (1): conversion and manipulation with Imagemagick
Imagemagick is a most versatile Swiss army knife for manipulation images at Linux systems. Like Sane it should be available broadly. Our scans could be easily converted to b/w using a simple shell loop:
for i in *pgm; do convert $i -verbose ${i%pgm}pbm; done
It’s also possible to manipulate the threshold for a pixel if it gets black or white but normally it works pretty well. The often remaining black stripe in the middle of the scan is going to be removed with Unpaper (next step). With Imagemagick resp. convert it’s also possible to rotate the scans (-rotate 90) and to cut out a rectangular region (-crop width x height +x +y) and lot of other manipulations are possible, please check out the command line options.
A hint for batch conversion: it’s always a good idea not to overwrite with the manipulated files but to write the new generation into another directory (like: … -verbose ~/foo/${i%pgm}pbm; done).
3. Batch postprocessing (2): Unpaper
Unpaper written by Jens Gulden (currently 0.3-1, also available for Ubuntu Karmic) is a tool for post processing scanned book pages. It can remove dark areas and corrects misaligned centring and rotation of book pages, removes blur and noise and also is able to split double book page scans into individual images. It’s made for heavy duty tasks dealing with scans even of the most ridiculous book xerocopies. Unpaper is able to perform batch processing jobs and the simple usage would be like:
unpaper --layout double --output-pages 2 %03d.pbm ~/foo/%03d.pbm
–layout double defines the input to carry two book pages on one scan image and –output-pages 2 tells Unpaper to split them up into two individual files, %03d is a shell variable for three-digit numbers. Unpaper is quite versatile and to get acquainted with everything needs some effort. While it does his job it’s a good a idea to constantly monitor the output. In the case unwanted results appear you could break the process and change the settings. Unpaper is very sensitive, for example in most of the cases when text blocks accidentally are removed on single pages you have to manipulate the mask scan setting (try a lower setting like -ms 25,25). The processing could be resumed from any file with –start-input x, but you have to align also –start-output and also to give –overwrite then. A useful user documentation is provided on the project’s homepage (here).
4. Creating djvu (& pdf)
If you haven’t known it already: djvu is a powerful container format for digital images which is faster and better in compression than other solutions and there are viewers available for nearly all the operating systems (see here, djview 4.5-3 at “Squeeze”). Even if djvu reveals its full potential especially at killer tasks like unreduced satellite pictures to my experience the workflow with it is always a little bit more fluent even with b/w book scans. To concatenate our post-processed book pages into djvu is no problem with the Djvulibre collection (3.5.22-7). First of all we have to convert the .pbm images into the djvu file format:
for i in *pbm; do cjb2 $i ${i%pbm}djvu; echo $i; done
After that we have to collect the container:
djvm -c mydjvu.djvu *djvu
That’s it!
As easy as this it is to create a pdf at Linux. First of all you have to convert the .pbms into .tifs (for i in *pbm; do convert $i -verbose ${i%pbm}tif; done), after that you have to create a multi-page tiff from these (tiffcp *tif bundle.tif ), and finally you could create a pdf from that with: tiff2pdf -o mypdf.pdf bundle.tif (Note: tiffcp and tiff2pdf are part of the libtiff-tools, 3.9.2-2 at “Squeeze”. For tiff2pdf the compression method has to be given also, -j (Jpeg) or -z (Zip), see the the manpage here).
5. OCR
The are also solutions available for Linux to derive OCR information from book scans for the text layers of djvu and pdf, and Tesseract seems to be the most mature application so far. The development of it has been taken over by Google and it is described to be “probably one oft the most accurate open source OCR engines available”. Tesseract-ocr is available for Debian testing in version 2.04-2 and there are a few language data files for the software which have to be installed also (tesseract-ocr-eng etc.). Playing around one could get the impression Tesseract is working quite nice especially when the correct language is chosen. Although it has problems with Sanskrit diacritics, but I’ve seen that Tesseract could be also trained (I’ll report when I found out more). It could be applied on individual image files also through batch processing (see some experiences here) but it is more convenient to work with a wrapper which also takes care of to re-combine the OCR output with the image automatically:
Ocrodjvu (0.3.2-1, Ubuntu Lucid) by Jakub Wilk is a foolproof wrapper for working on already djvu concatenated document scans which depends to OCRopus (0.3.1-2), an open source OCR system which is under development by the German Research Center for Artificial Intelligence (DFKI). OCRopus employs Tesseract to extract the textual information from the scanned document and, and that’s the clou here, saves also the page positioning information with every word so that a query at Djview or other viewers results not only in the relevant page but also in the highlighted word instances on these pages (layout analysis) – a feature which could hardly be missed nowadays. Ocrodjvu is easy to apply to the djvu we’ve created so far:
ocrodjvu -o mydjvu_ocr.djvu mydjvu.djvu --language=eng
or similar. Start’n'forget – live is easy.
For pdf e-books it’s a little bit more tricky because there isn’t a fully developed wrapper for OCRopus available so far for pdf (the little tool pdf2ocr which I’ve found in the net I couldn’t bring up to work properly) – so I will left that out here for now.
6. Gscan2pdf
Gscan2pdf (0.9.29-1, Ubuntu Hardy) is actually a very comfortable GUI frontend for the most of the multitude of tools we’ve discussed so far, Sane (scanning), Unpaper (postprocessing) and Tesseract (OCR) and the whole process of producing an e-book, both djvu and pdf, could be produced with this amazing tool. Gscan2pdf employs ports to Tesseract and also to the alternative Gocr, but as far as I’ve seen unfortunately it hasn’t a port to OCRopus nor couldn’t deal with layout analysed output (hocr) so this is a desideratum here.
7. Bookmarks
The final step to refine your e-book would be to apply bookmarks to the document. For djvu custom bookmarks (in the djvu world it is called “outline”) have to be in a form like:
(bookmarks
("Title" "#1")
("Main matter" "#5"
("Chapter 1" "#5")
("Chapter 2" "#15"))
)
After editing such a file, you could name it mydjvu.outline, djvused from the Djvulibre tools can apply the outline to the container:
djvused -e mydjvu.djvu 'set-outline mydjvu.outline' -s
That’s it. By the way, the djvu outline format is Unicode capable.
8. Miscellaneous stuff
If the book you want to scan is bigger than the affordable scanner the is the way to scan single pages at once. If then the lid of the scanner couldn’t be fully removed or for whatever other reason it could be the case that you have a set of even numbered scans on which single pages are rotated 180° in relation to the ones on the even numbered scans or the other way around. There is also a way to rotate any of them, try:
for i in *pbm; do p=`echo $i | cut -c 3`; if [ $(($p%2)) -eq 1 ]; then convert $i -rotate 180 -verbose $i; fi; done
(one line!). This is to rotate the set of odd numbered three-digit long .pbms. For working on the even numbered set, exchange - eq 1 with -eq 0. But if you try you’ll see that scanning such a way takes painful Prussian dicipline.
Unpaper employs batch processing with rising numbers. If you want to re-engineer your already created djvu containers like that you can unpack them with ddjvu which puts out a multi page tif (usage like: ddjvu -format=tiff -pages=1-25 ~/foo/mydjvu.djvu bundle.tif). That again could be bursted with tiffsplit which produces a set of images aaa.tif, aab.tif etc. After converting them to pbm (and then re-enconding them into djvu is so far that I’ve seen the only way to attain a custom page range djvu from djvu), and anyway a continuous numeric sequence of file names for the processing with Unpaper could be restored through this little shell script here.
TeX Live 2009: LuaLaTeX rolls on Debian (and the others)
Previous posting on this issue here.
1. TeX Live 2009 at Debian unstable
Since my Debian Unstable/”Sid” jumped over to TeX Live 2009 a couple of days before (after being backward with 2007 as standard TeX distribution for a long time they skipped 2008 which is really pleasing, see also here) it’s now possible to run LuaLateX without a manual install – the relevant LaTeX format files have been included in Tex Live 2009 and the lualatex executable is now available on the console (LuaTeX version 0.50). The relevant macro packages (see http://tug.ctan.org/tex-archive/macros/luatex/, github collection) are made available most convenient through the new packet texlive-luatex (2009-7). Pretty soon the whole smack is going to be available also at mostly unstable branch Debian derivates like Ubuntu. TeX Live also runs on other operating systems.
2. Hello world!
So now a rudimentary LuaLateX document like:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\begin{document}
Hello world!
\end{document}
… runs. By the way, luainputenc (doc) calls luatextra (doc) which is also included in the texlive-luatex bundle.
3. Running Lua code from within the document
But Luatex is not only an alternative project which you could use to replace the other up-to-date Unicode capable/pdf creating LaTeX engine XeTeX (BTW see an introduction to XeTeX in German here), the advantage of LuaTeX is that the scripting language Lua is implemented into the engine which converts it to a kind of “eierlegende Wollmilchsau” one could say in German (for a collection of advantages of that approach see here). On the professional level that means that the rebuild engine could have been made much more faster in processing and versatile than everything which has been possible before (see Kastrup’s presentation at BachoTeX 2008) – on the user level it just means that it is possible to run Lua code while processing the document. Lua code can be called from the macro package but also from within the document (a for somewhat comparable TeX project is Perltex). Let’s give it a try:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\begin{document}
So it's time to say:
\begin{luacode}
tex.print("Hello world!")
\end{luacode}
\end{document}
Results in:
But more significant:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\begin{document}
A random number:
\begin{luacode}
tex.print(math.random ())
\end{luacode}
\end{document}
Results in:
The one about font selection would be the next question. The availables packets are not quite mature but LuaTeX is under heavy development. The packet luaotfload (doc) is made for the purpose of font switching, the packet loads automatically with luainputenc resp. luatextra. The usage is the same as with the True/Opentype fontloader which is provided for the Plain/ConTeXt sister (see here). The basic usage would be something like:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\font\myfont="FreeSans.ttf"
\begin{document}
\myfont Mahābhārata
\end{document}
The font file has to be in the working directory. An alternative is going to be the LuaTeX implementation of Fontspec, which is known from XeTeX (see here) and which employs a font file lookup. An experimental 2.0 version (.dtx and makefile) is provided by Khaled Hosny at his Github account. But I couldn’t get it to run so far.
New e-vedica found on the net
Awaited to appear: 2nd edition of Aufrecht’s Ṛgveda (Bonn 1877) available (here and here); great: Lubotsky’s edition of Atharvaveda-Paippalāda 5 (Cambridge 2002) available here.
The Tanjur Bodhicaryāvatāra auxiliaries: a few collected bibliographical pointers
Filed under: Bibliography, Bodhicaryāvatāra, Footnotes
There are several works of literature transmitted in the Tanjur related to Śāntideva’s masterpiece Bodhicaryāvatāra (Bca), and unfortunately nearly all of them have been lost in their original Sanskrit versions (on them cf. Dietz: Śāntideva’s Bca – das Weiterwirken des Werkes dargestellt anhand der Überlieferungsgeschichte des Textes und seiner Kommentare [Lecture script] {Buddhismus in Geschichte und Gegenwart 3: Śāntideva’s “Eintritt i.d. Leben zur Erleuchtung.” Hamburg 1999, p. 27-41}, p. 35 sq. [IV. Die Kommentare zum Bca]; Ejima: Nyūbodaigyōron no chūshaku bunken ni tsuite {Indogaku Bukkyōgaku Kenkyū /Journal of Indian and Buddhist Studies 14,2 (1966), p. 644-48}; Williams: On Prakṛtinirvāṇa / Prakṛtinivṛta i.t. Bca {Altruism and reality. Studies i.t. philosophy of the Bca. Richmond 1998, p. 1-28 = *Asiatische Studien / Etudes Asiatiques 46,1 (1992), p. 516-50}, p. 3 sq. [1. Indian commentaries]). There are (in order of appearance):
1. The Bodhicaryāvatāra-Pañjikā (byaṅ chub kyi spyod pa la ‘jug pa’i dka’ ‘grel, Q 5273, 221 (la / mdo ‘grel 26), 45a7-325a5; facsimile ed. vol. 100) [1] by Prajñākaramati (Śes rab ‘byuṅ gnas blo gros) is a extensive commentary on the chapters 1-9 of the Bca. Besides its Tibetan carrier the text also survived in the original Sanskrit through a few very precious antique palm leaf manuscripts kept in Kolkata (cf. this former blog contrib), which has been edited three times: the 9th chapter in 1898 by La Vallée Poussin (1869-1938); the full text by the same in 1901-14 for the Bibliotheca Indica series and later again by Vaidya (1891-1978)[2] in 1960 for the Buddhist Sanskrit Text series; due to folio loss unfortunately there are two large lacunae, 3,22-4,45 and 8,109-186 [end of the 8th chapter] haven’t made it.[3] Prajñākaramati lived at the end of the 1st millennium and was a teacher and a so-called “gatekeeper” (an examiner?) of the famous Vikramaśīla university considered to be located in the Bhagalpur district of modern Bihar (cf. Bose: Indian teachers of Buddhist universities. Adyar 1923, p. 50 sq. [4. Prajñākaramati]; Dutt: Buddhist monks and monasteries of India. London 1962, p. 358 sq. [b. Vikramaśīlā]. Towards the patronage of the Pālas in general cf. the introduction chapter of: Huntington: Leaves from the bodhi tree. The art of Pāla India (8th-12th centuries). Seattle (etc.) 1990; and most recently: Sanderson: The Śaiva age {Einoo (Ed.): Genesis and development of Tantrism. Tokyo 2009, p. 41-349 [!]}, p. 87 sq. [The Pāla emperors and the great monasteries of eastern India]). In the colophons of the survived mss at the Asiatic Society of Bengal the author is called prajñākara (no. 3830) next to prajñākaramati (no. 3829, cf. Hara Prasada Shāstri: Descriptive catalogue of the Sanskrit mss i.t. Government Collection 1: Buddhist mss. Calcutta 1917, p. 49 sq.), which Bose claims to be common (cf. op.cit., p. 52). Furthermore he is titulated paṇḍitabhikṣu, which was an academic title in the university.
2. The Bodhisattvacaryāvatāra[4]-Vivṛttipañjikā (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i rnam par bśad pa’i <bka’i ] dka’> ‘grel, Q 5274, 221, 325a5-396a5) is an anonymous commentary on the shorter version of the Bca [5], and therefore employs 9 chapters while commenting also on the Pariṇāmanā, the 9th chapter of the Bca in its short version.[6] Saito writes that the author comments the text the from a Yogācāra-Mādhyamika point of view (cf. Saito: Śāntideva in the history of Mādhyamka philosophy {Sankarnarayan/Yoritomi/Joshi (Eds.): Buddhism in India and abroad. Mumbai (etc.) 1996, p. 257-63}, p. 259).
3. The Bodhisattvacaryāvatāra-Saṃskāra (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i legs par sbyar ba, Q 5275, 222 (śa / mdo ‘grel 27), 1-106a6) by Kalyāṇadeva (Dge ba’i lha), the “edition” of the Bca, is another commentary on the long version, but not as exhaustive and rich of citations as Prajñākaramati’s Pañjikā.
4. The Bodhisattvacaryāvatāra-Duravabodhanirṇaya-nāmagrantha (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i rtogs par dka’ ba’i gnas gtan la dbab pa źes bya ba’i gźuṅ, Q 5276, 222, 106a7-112b4), “book with the name: discussion of difficult (portions)” by Kṛṣṇapāda (Kṛṣṇa ba) is, like the name suggests, a non-continuous commentary. The author was a Newar brahmin and disciple of Śāntibhadra, a Newar scholar of the 11th century (cf. Lo Blue: The role of Newar scholars in transmitting the Indian Buddhist heritage to Tibet (c. 750-c. 1200) {Karnay/Sagant: Les habitants du toit du monde (Festschrift A.W. MacDonald). Nanterre 1997, p. 629-58}, p. 639).
5. The Bodhisattvacaryāvatāra-Pañjikā (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i dka’ ‘grel, Q 5277, 222, 112b4-186b7) by Vairocanarakṣita (Bai ro tsa na kṣi ta) is by its extend somewhat comparable to Kalyānadeva’s Saṃskāra. The author, also known as Vairocanavajra, was a contemporary of Atiśa (982-1054) and also a inhabitant of the Vikramaśīla university (cf. Schaeffer: The religious career of Vairocanavajra – a 12th century indian Buddhist master from Dakṣina Kośala {Journal of Indian Philosophy 28 (2000), p. 361-84}).
6. The Prajñāparicchedapañjikā (śes rab le’u'i dka’ ‘grel, Q 5287, 222, 186b7-210a5) is, like the name suggests, a commentary of the 9th chapter of the Bca.
7. The Bodhisattvacaryāvatāra-Vivṛtti (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i rnam par bśad pa, Q 5279, 222, 210a5-223b2) is nearly identical with the last two chapters of the Vivṛttipañjikā (Q 5274, see above), and thus also refers to the shorter version of the Bca.
8. The Bodhisattvacaryāvatāra-Ṣaṭtriṃśat-piṇḍārtha (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i don sum cu rtsa drug bsdus pa, Q 5280, 222, 223b2-227b5), and the
9. Bodhisattvacaryāvatāra-Piṇḍārtha (byaṅ chug sems dpa’i spyod pa la ‘jug pa’i don bsdus pa, Q 5281, 222, 227b5-229a7) are abridgements and consist of selected stanzas of the Bca (cf. Eimer: Suvarṇadvipa’s “commentaries” on the Bca {Bruhn/Wetzler: Studien zum Jainismus und Buddhismus (Festschrift Alsdorf). Wiesbaden 1981, p. 73-78}). The author is called in the colophon “Suvarṇadvipī Lama Dharmapāla” (Gser gliṇ gi bla ma chos skyoṅ), but that just refers to Atiśa’s teacher from Sumatra, Suvarṇadvīpa Dharmakīrti (Gser gliṅ pa chos kyi grags pa, cf. Eimer: Berichte über das Leben des Atiśa (Dīpaṃkaraśrījñāna). Wiesbaden 1977, p. 14, fn. 9).
10. The Bodhicaryāvatāra-tātparyapañjikā-Viśeṣadyotanī-nāma (byaṅ chub kyi spyod pa la ‘jug pa’i dgoṅs pa’i ‘grel pa khyad par gsal byed ces bya ba, Q 5282, 222, 229a8-343a4), was written and also translated by Vibhūticandra (rnal ‘byor zla ba). Karunaratne claims that the text is a sub-commentary on Prajñākaramati’s Pañjikā (cf. Bca-tātparyapañjikā-viśeṣadyotanī-nāma {Malalasekera (Ed.): Encyclopedia of Buddhism. Vol. 3: Bhārini-deva – Caura-vidhvaṃsana-n°. Sri Lanka 1971, p. 184 sq.}. In the beginning the text contains a short hagiographic biography of Śāntideva, which also survived in its Sanskrit original (cf. ms no. 9990 again i.t. Government Collection of the Asiatic Society of Bombay [what a precious collection they have got there, it taken scanned and put online would give a glorious major impulse for worldwide studies], no. 52 i.t. catalogue (op.cit., p. 51); examined by Jong as a review of Pezzali’s Śāntideva (Firenze 1968): La légende de Śāntideva {Indo-Iranian Journal 16,3 (1975), p. 161-82 = Buddhist Studies [Collected minor works]. Berkeley 1979, p. 119-40}). Master Vibhūticandra is not a minor figure, he was an important Kālacakra scholar and lived in the monastery of Jagddala until it was invaded in the 1190s. Then he accompanied Śākyaśrībhadra (1127-1225) to Tibet in 1204. The dates of his birth and death are unknown (cf. Das Gupta: Vibhūticandra of the Jagaddala Mahāvihāra {Indian Culture 5 (1938-39), p. 215-17}; Vogel: Lunar eclipses of the early 13th century predicted by the Buddhist master Vibhūticandra {Kollmar-Paulenz/Peter (Eds.): Tractata Tibetica et Mongolica (Festschrift Sagaster). Wiesbaden 2002, p. 305-11}; Stearns: The life and legacy of the Indian Mahāpaṇḍita Vibhūticandra {Journal of the International Association of Buddhist Studies 19,1 (1996), p. 127-71}).
These representations of Sanskrit works that have been lost suggest that there existed a vast literature on the Bca and maybe this again is only a cutout. Dietz points out that in the Duravabodhinirṇaya there are many references to unknown commentaries and subcommentaries (op.cit., p. 38). The Buddhist Sanskrit literature related to the Bca is a fascinating, challenging subject. A closer examination of the auxiliaries i.t. Tanjur remains a desideratum and could make a subject on its own.
Notes:
[1] Cf. the usual catalogues: Suzuki: The Tibetan Tripitaka. Catalogue & Index. Tokyo 1962, p. 639 sq.; *A comparative analytical catalogue of the Tanjur division of the Tibetan Tripitaka kept i.t. Otani Univ. Library. Kyoto 1965 ff. The other Tanjur editions could be located as always through the usual catalogues or a query at the general catalogue in Vienna.
[2] Obituary by B.V. Bapat i.t. Journal of the International Association of Buddhist Studies 1,1 (1978), p. 91 sq.
[3] Sometime some more Sanskrit mss which survived in Tibet are mentioned, cf. Martin: Tibskrit 2008, p. 1666: “KCDS [Microfilm catalogue of the Tibetan Cultural Research Center, Beijing], p. 150. Sanskrit palmleaf manuscript now belonging to the Potala”; Petech: Medieval history of Nepal (c. 750-1482). Roma 1984, p. 98: “23) Ms. of Prajñākaramati’s commentary on the Bca. Ṅor monastery in Tibet”, footnote: “… although the ms. seems to be listed in RS, XXI, 37 (no. 110) [refers to: Sāṅkrityāyana: Sanskrit palm-leaf mss. in Tibet {The Journal of the Bihar and Orissa Research Society 21 (1935), p. 21-43}, but no. 110 (p. 37) refers to a ms of the mūla]. From a hand-copy made by Professor G. Tucci in 1939″ [not listed in: Sferra: Sanskrit mss and photos of Sanskrit mss in Guiseppe Tucci's collection {Balcerowicz / Mejor (Eds.): On the understanding of other cultures (Festschrift Schayer). Warsaw 2000, p. 397-413}].
[4] In Tibetan the title Bodhisattvacaryāvatāra (sometimes refered to as “Bsa”) appears next to Bodhicaryavatāra. Lindtner claims that the longer version is original, Rachewiltz: The Mongolian Tanjur version of the Bodhicaryātāra. Wiesbaden 1966 [review] {Buddhist Studies Review 15,2 (1998), p. 238-40}, p. 239: “Moreover it is more important, the full title of the poem is Bodhisattvacaryāvatāra (rather than Bodhicaryāvatāra). This form is also supported by the Mongolian … Obviously, our poem is not an introduction to the life of bodhi, but to the career of a bodhisattva. The source of the abbreviated title probably the author himself.” But it’s a fact that the title Bca and even not a single time the longer version occurs everywhere in the Sanskrit transmission: India Office Library [incorporated into the British Library] ms no. 7713 colophon (cf. Keith: Catalogue of the Prākrit mss i.t. library of the India Office 2: Brahmanical and Jaina mss. Oxford 1935, p. 1394), Minaev’s own ms “M” (cf. Spasenie po učeniju pozdnějšich buddhistov [Salvation according to the teachings of the late Buddhist] {Zapiski Vostočnago Otdělenija Imperatorskago Russkago Archeologičeskago Obščestva [Memoirs of the Oriental Section of the Imperial Russian Archaeological Society] 4 (1889), p. 153-228}, p. 154); i.t. colophons of the mss no. 78 & 79 of Filliozat’s catalogue of the Sanskrit stocks of the French national library (Catalogue du fonds sanscrit 1: nos. 1 à 165. Paris 1941, p. 63); in no. 8067 i.t. Government Collection of the Asiatic Society of Bengal (op.cit., no. 19, p. 21) etc. etc. That the longer version of title is to be found also in Mongolian for me is no argument, that the Mongolian version of the text (and so its title) derived from the Tibetan has been found out already by Weller (cf. Über den Quellenbezug eines mongolischen Tanjurtextes. Berlin 1950 [Abhandlungen d. Sächsischen Akademie d. Wissenschaften zu Leipzig, phil.-hist. Klasse; 45,2]).
[5] The shorter version of the Bca, also lost in its original Sanskrit version, is transmitted in Tibetan outer-canonically and has been found in Dunhuang on the Silk Road. It employs 9 chapters in 701 verses while the longer version consists of 10 chapters and 913 verses; chapter 2 and 3 of the longer version are combined here, the verses there which are related to the ritual of taking the bodhisattva vow (saptavidhānottarapūjā, cf. Gómez: Bodhicitta (thought of awakening) {Buswell: Encyclopedia of Buddhism. Vol. 1: A-L. New York 2004, p. 54-56}) are still absent here and there are other interesting differences, mainly in the 5th chapter. The short version has been examined and edited by Saito in research projects of Mie University (cf. A study of Akṣayamati’s (=Śāntideva’s) Bodhisattvavaryāvatāra as found in the Tibetan mss from Tun-huang. Project no. 02801005 [Research project report, 1993]; A study of the Dūn-huáng recension of the Bodhisattvacaryāvatāra. Project no. 09610021 [Research project report, 2000]), and there is a vast (mostly Japanese) literature on that subject and its issues like the Akṣayamati hypothesis (that the authors original name is Akṣayamati and not Śāntideva), the Tabo ms of the Bca (cf. Saito: Remarks on the Tabo ms of the Bodhisattvacaryavatāra {Scherrer/Schaub: Tabo Studies II. Manuscripts, texts, inscriptions, and the arts. Roma 1999, p. 175-89}) etc.
[6] That the 10th chapter of the Bca (in its vulgate long version) is not original has been brought forward the first time by La Vallée Poussin in the French translation of the text (Bodhicaryāvatāra. Introduction a la pratique des futurs bouddhas. Paris 1907, p. 143 sq. [Note finale]: “Il entrait dans mon intention de publier la traduction du dixième chapitre du Bca …”). But this has been doubted among others by Ruegg (The literature of the Madhyamaka school of philosophy in India. Wiesbaden 1981, p. 83: “The authenticity of this final chapter has been questioned on the ground that not all commentators have commented on it, but this point does not appear to be decisive.” La Vallée Poussin didn’t recognized that some of the 9-chapter-commentaries are referring to another version of the mūla consisting only of 9 chapters anyway, and furthermore the fact that the Pañjikā omits the 10th chapter of the mūla does not mean necessarily that Prajñākaramati also considers it not to be original (another theory is that the Prajñāpāramitā chapter of the Pañjikā was written first and later the commentary on the chapters 1-8 as an addition, cf. this previous posting here). Dietz seeks to conclude the discussion pointing out that the Pariṇāmanā is to be found in just all known copies and versions of text, even the shorter Tibetan version and also the shorter Chinese version (Taishō no. 1662, Putixingjing (菩提行經), in 782 verses and 8 chapters), and I think that this is a decisive argument (cf. Dietz, op.cit., 30).
A few e-manuscripts from the state library Munich available now
There is some progress at the Bavarian state library / Bayerische Staatsbibliothek (BSB) in the digitalization of items of their Sanskrit manuscripts stocks (“Cod.sanscr.” in the collection Südasiatische Handschriften, a part of their catalogue (the first volume of their catalogue [no. 222 in Janert's Annotated bibliography, no. 693 at Biswas], Aufrecht 1909, which covers the Haug collection [predominantly Vedica], is online here; the second volume, Jolly 1912, is online here).
The items appear in line of their signatures, so that there some continuous scanning of this collection might be going on these days. So far there are the numbers 328-44 (from the Jolly collection) available – check them out here. All items are downloadable in pdf format. Additions could be tracked through this RSS-feed, but unfortunately only among all the other mss scans, more detailed it isn’t getting (cf. their RSS-feed page here).
I’ve got no time to examine anything, but the scans are made quite decent. Among them available so far there is a copy of Īśvarakṛṣṇa’s Sāṃkhyākārikās (342). There are also two scans of mss of Kauṭilya’s Ārthaśāstra (334 & 35) which are obviously the ones in which Jolly and Hillebrandt discovered the text in or about 1908 (Cf. Hillebrandt’s Das älteste Lehrbuch der indischen Politik, das in zwei Handschriften der Kgl. Hof- und Staatsbibliothek in München vorliegt und sich als der lange vermisste Text des Kauṭilya’s erweist. In: Kleine Schriften, pp. 355-84).
Towards the Munich collections in general cf. BSB: Das Buch im Orient. Handschriften und kostbare Drucke aus zwei Jahrtausenden. Ausstellung 16.11.1982-5.2.1983. Wiesbaden: Reichert 1982, esp. pp. 21-29: Kaltwasser: Die orientalischen Sammlungen der Bayerischen Staatsbibliothek (towards the Sanskrit collections p. 25), and this handlist.
Some events in 2010
20th European Association for South Asian Archaeology and Art (EASAA) Conference, Wien, 04.-10.07.2010
3rd International Workshop on Early Tantra (IWET), Hamburg, 15.-23.07.2010 (contact) [1st workshop, 2nd]
“Indo-European verb” – Arbeitstagung der Indogermanischen Gesellschaft, Los Angeles, 13.-15.09.2010
“Spiegelungen, Projektionen, Reflexionen” – 31. Deutscher Orientalistentag (DOT), Marburg, 20.-24.09.2010
“Crossing borders in Southeast Asian archaeology” – 13th International Conference of the European Association of Southeast Asian Archaeologists (EurASEAA13), Berlin, 27.09.-01.10.2010
4th International Sanskrit Computational Linguistics Symposium (4i-SCLS, formerly: ISSCL), New Delhi, 10.-12.12.2010 [1st ISSCL, 2nd, 3rd]
Upcoming:
2nd International Indology Graduate Research Symposium (IIGRS), Cambridge [1st IIGRS]
A basic “Evaṃ mayā śrutam ….” bibliography
Some contributions towards the formulaic opening of Buddhist sutras (Sanskrit: evaṃ mayā śrutam ekasmin samaye bhagavān … viharati sma) [chronologically]:
- John Brough: “Thus have I heard …”. In: Bulletin of the School of Oriental and African Studies 13,2 (1950), 416-26 [= Collected papers. Edited by Minoru Hara and J.C. Wright. London 1996, 63-73].
- N.H. Samtani: The opening of the Buddhist sutras. In: Bhāratī – Bulletin of the College of Indology 8,2 (1964-64), 47-63.
- Yuichi Kajiyama: “Thus spoke the blessed one …”. In: L. Lancaster (Ed.): Prajñāpāramitā and related systems. Studies in honor of E. Conze. Berkeley 1977, 93-99.
- Jonathan Silk: A note on the opening of Buddhist sutras. In: Journal of the International Association of Buddhist Studies 12,1 (1989), 158-63.
- Brian Galloway: “Thus have I heard: at one time …”. In: Indo-Iranian Journal 34,2 (1991), 87-104.
- Bernhard Kölver: Das Symbol evaṃ. In: Studien zur Indologie und Iranistik 16 (1992), 101-07.
- Mark Tatz: Brief Communication. In: Indo-Iranian Journal 36,4 (1993), 335-36 [= Indo-Iranian Journal 40,2 (1997), 117-18 (mistakenly)].
- Brian Galloway: A reply to Professor Mark Tatz. In: Indo-Iranian Journal 40,4 (1997), 367-71.
- Gregory Schopen: If you can’t remember, how to make it up. Some monastic rules for redacting canonical texts. In: P. Kieffer-Pülz (Ed.): Bauddhavidyāsudhākaraḥ. Studies in honour of Heinz Bechert on the occasion of his 65th birthday. Swisttal-Odendorf 1997, 571-82 [= Buddhist monks and business matters. Still more papers on monastic Buddhism in India. Honolulu 2004, 395-408].
- Fernando Tola, Carmen Dragonetti: Ekam samayam. In: Indo-Iranian Journal 42,1 (1999), 53-55.
- Konrad Klaus: Zu der formelhaften Einleitung der buddhistischen Sūtras. In: K. Klaus, J.-U. Hartmann (Ed.): Indica et Tibetica. Festschrift für Michael Hahn. Wien 2007, 309-22.
Additions welcome!
Transfered Grub can’t find /dev/md0 (Linux)
Please excuse, this pure Linux issue might not be very interesting for the indological audience of this blog. Pleased don’t be bored – this belongs to be on the net somewhere since others might face the same problem, too.
Problem (recent Debian testing, Kernel 2.6.30-1 – might not play a role, Grub 1.97~beta3-1): I’ve bought a new set of harddisks and assembled them to a brand new software raid1 chain from a Linux live system on CD the usual way with Mdadm and formatted it with ext3. After that I’ve copied my filesystem to that which I tarballed before to an external harddisks from my old set of harddisks – no secret involved so far. I’ve executed chroot to reach the mounted filesystem and did update-grub and grub-install “(md0)” to reinstall the bootloader – it executed without errors. After reboot Grub started from the MBR of the first harddisk like planned but the raid chain wasn’t been found and so the boot precedure hung after Grub’s boot menu – not very amusing. On the net I’ve found several tips for people having the same problem, it was been told that Grub can’t handle all of the superblock versions which could have been created by different livesystems, a guy said that the problem typically occurs if the order of harddisks was manipulated on the bios level (both wasn’t the problem), and finally there were several suggestions concerning the fact that the Initramfs has to be updated regarding the new created filesystem, too. I did this (update-initramfs -u, before that the livesystem’s /dev and /proc have to be linked with mount -o bind /dev /mnt/dev and mount -t proc /proc /mnt/proc [or similar] before changing to the mounted filesystem with chroot, I don’t know if it’s critical but I’ve found an commented option raid1 in /etc/initramfs-tools/modules which I restored before) but without no response, still hung.
The solution (simple but hard to come to if you aren’t a Linux booting procedure insider): when the boot procedure hangs after several minutes it breaks and the Initramfs jumps into its shell (prompt: (initramfs) – I became very desperate to find out!). Just assemble the raid chain manually: mdadm –assemble /dev/md0 /dev/sda1 /dev/sdb1 (or similar) and jump back to the procedure tying exit. It boots. Then, after update-initramfs, update-grub and grub-install have been executed from their booted native system it works again! I haven’t found out what has been the problem up to the present day but I’ve seen that /boot/grub/grub.cfg contains the UUID of the raid where was just /dev/md0 when update-grub was executed in the chroot environment. But it seems that in general there is a problem when you are using a live system with a different architecture.
Epic and Purāṇic Bibliography online
The extreme comprehensive Epic and Purāṇic Bibliography (EPB) is now available online here at the Indology in Göttingen. The database contains even more entries than the original printed version which was compiled in Tübingen (Wiesbaden: Harrassowitz 1992). The datasets include title descriptions resp. summaries, records of reviews, a quoted passages index and even some library signatures. The quoted passages are searchable which makes this tool even more useful.
The LaTeX Notebook 1-3 (repost)
Document classes
The KOMA script (3.0) document classes and packets developed by Markus Kohm and Jens-Uwe Morawski are replacements for the standard LaTeX classes and are widely used and very rich in features. A basic attribute is that in difference to the standard classes of LaTeX they implement typical European typestting defaults like the principle of the Golden Section (Der goldene Schnitt) following the highly influential 20th century typographer legend Jan Tschichold. The developers run a special page for documentation, the documentation is here, and a short reference is to be found here. Read the Practex 3 (2006) article Replacing LaTeX2e standard classes with KOMA-Scipt.
Confproc (0.4f) is a document class for conference proceedings created for the DAFx-06 (9th International Conference on Digital Audio Effects Montréal). A packet like this demonstrates the power of a macro based typesetting system like LaTeX. It features an own BibTeX style and is mend to produce Pdf, so it makes full integrated use of the Hyperref packet. The documentation is here, there is a Report on the making of the DAFX-06 proceedings and finally here are the proceedings. A broad use of tools like this might help speeding up the publishing of conference proceedings in the future. Cf. Vefaille’s A new package for conference proceedings [Confproc] {PracTeX Journal 2007,4}.
Wordlike (1.2b) simply manipulates the standard LaTeX layout in a way that the output looks like made with Word. For whatever reason (being spoiled or the fact that in certain situations something else would be considered as behind), with Wordlike you are able to look like Word but you can use everything else which comes with LateX. Product of the year! The documentation (here) selfevidently is written in Wordlike.
Papertex (1.2a) is a highly customizable class for creating little newspapers, newsletters etc. The developers say that “it is possible to change the aspect of (almost) everything”. There are special environments for news, shortnews etc. Very interesting. Package documentation, example newspaper page here. It seems that the Vidūṣaka was also made with Papertex. Cf. Tortosa/Bleda’s PaperTeX: Creating newspapers using LaTeX 2e {Tugboat 28 (2007), 20-23}.
Exam (2.3) is a class for easy typesetting of exam scripts (Klausuren). There are environments for apropriate headers and footers, fields for student’s name, multiple choice questions environments, answer fields for the master copy etc. etc. Might be very useful for teachers (there are alternative packets Examdesign and Exams. Documentation here.
Refman (2.0e) provides report and article-style classes for classy (technical) references and manuals with the main feature of a wide left margin for notes, inspired by manuals of Adobe (but a wide right margin would be useful, too). There is a demo document Changing the layout with LaTeX, the package documentation is here.
Some minor hacks
⚫ Setting section titles and description label the same font like the rest:
\setkomafont{sectioning}{\normalfont}
\setkomafont{descriptionlabel}{\normalfont}
⚫ Let every new section begin on a fresh page, this can be done with Titlesec:
\usepackage{titlesec}
\newcommand{\sectionbreak}{\clearpage}
⚫ No reset of the footnote counter at a new chapter (book and report classes) is possible with Remreset:
\usepackage{remreset}
\makeatletter
\@removefromreset{footnote}{chapter}
\makeatother
⚫ \pagestyle{empty} for multi-page toc:
\makeatletter
\let\myTOC\tableofcontents
\renewcommand{\tableofcontents}{\begingroup\let\ps@plain%
\ps@empty\pagestyle{empty}\myTOC\clearpage\endgroup}
\makeatother
Footnotes
⚫ Prevent footnotes to be broken to the next page (a standard hack):
\interfootnotelinepenalty=10000
⚫ Proposal for custom footnotes:
\renewcommand{\footnoterule}{\rule{0ex}{0ex}}
\setlength{\footnotesep}{2.5ex}
\deffootnote[1.5em]{0em}{1em}{\textsuperscript%
{\thefootnotemark}
\hspace{0.5em}}
⚫ Continuing (”paragraphed”) footnotes could be done with the Fnpara packet, but the same code is also part of the more versatile Footmisc (option “para”). Multiple levels of footnotes could be realized with the Manyfoot packet (part of the Ncctools bundle by Alexander Rozhenko), but both functions and other features like per page numbering are also provided by the comparatively new Bigfoot packet by David Kastrup. So it’s a good idea to choose Bigfoot until you need even a much more fancier functionality provided only by special much complex critical edition packets like Ledmac (a post on that coming up). For Bigfoot cf. (if available) Kastrup’s Benefits, care and feeding of the bigfoot package {TugBoat 29 (2008), 181 ff.}.
⚫ A useful collection of footnote related packets (usually treated together with endnotes and marginnotes) could be found here
Sloppy typesetting and hack ressources
When typing a lot of Sanskrit LaTeX usually has to deal with comparatively long text blocks while often the system is not able to locate hyphenation spots within an English or German or other non-Sanskrit environments (no to mention that proper hyphenation patterns for romanized Sanskrit are still a desideratum). For this it’s widespread to turn the spacing tolerance to \sloppy even if to turn to sloppypar somewhere and in the preamble in particular is considered to be inappropriate (c.f. Trettin/Fenn – Obsolete commands and packages, 1.8: Should I use \sloppy? ). But there are compromising solutions around slightly changing several linebreaking and spacing parameters in a balanced way to to loose up the normally very strict specifications of LaTeX like the hack invented by Axel Reichert:
\tolerance 1414
\hbadness 1414
\emergencystretch 1.5em
\hfuzz 0.3pt
\widowpenalty=10000
\vfuzz \hfuzz
\raggedbottom
I’ve found that hack on Texnik.de which is generally a very good ressource for hacks resp. workarounds. Another very useful ressource for solutions like this or for finding the right packet is the Tex-faq by the German usergroup DANTE. I also recommend Anselm Lingnau’s LaTeX Hacks (O’Reilly 2007, ISBN 978-3-89721-477-4, also German) and a title can’t be missed is certainly Frank Mittelbach/Michel Goossens’ LaTeX Companion (2nd ed. Addison-Wesley 2004, ISBN 0-201-36299-6).
Parallel typesetting
The parallel typesetting of different texts esp. of text and its translation is common and in Indology there are the famous editions made by Ernst Waldschmidt (1897-1985) for example. There are different packets for LaTeX to deal with parallel typesetting of text streams, basically that means providing and aligning custom boxes.
Parrun by M. Dominci (1.1) provides two environments fframe and sframe, which makes the usage a little bit complicated I think. It seems unless not invoked with the option multicol the packet is mend for vertical parallel typesetting (not tested).
Parcolumns (1.2) is part of the sophisticated Sauerj bundle by J. Sauer. The packet provides an environment parcolumns in which the columns are generated with the command \colchunk. Even more than 2 columns are possible on the same page, it’s possible to customize colwidth and distance, it’s possible to leave out column fills … works fine.
The ‘classic’ for parallel typesetting is Parallel (beta 4) by M. Eckermann. That one has basically the same basic usage using an environment Parallel with subcommands ParallelLText and ParallelRText while only two columns are possible. The names are displeasing to type and even for auto 50/50 width there must be empty braces invoking the environment (\begin{Parallel}{}{}). A nice feature is that it’s possible to arrange the columns on different (odd/even) pages. C.f. Mittelbach/Goossens, LaTeX Companion {2nd ed., Addison-Wesley 2004}, p. 3.5 seq. (3.5.2: parallel – Two text streams aligned).
Generally there are some conspicuities dealing with footnotes in the tested packets. Parcolumns withdraw footnotes as far as I can see it completely (a workaround is the use of the packet Footnote (1.13) being a part of the fabulous Mdwtools by M. Wooding: the command \makesavenoteenv which makes footnotes emerge even in traping environments like tabular [!] and parcolumns or one can wrap the environment savenotes around). Parallel generates an own layer of footnotes and places them immediately after the environment ends (if demanded or not) but employs an option SeparatedFootnotes for columnwise handling of its footnotes.
Another solution is Ledpar (03b patch 0.4) by P.R. Wilson which belongs to the Ledmac package for critical editions. Ledmac is one of the most versatile LaTeX packets for textediting available and will be the issue in this series in the future. If one uses Ledmac and wants additional parallel typesetting support surely Ledpar is going to be the choice because it’s somewhat guaranteed to be compatible. Ledpar runs nested environments for the columns (\begin{pairs} \begin{Leftside} \end{Leftside} \begin{Rightside} \end{Rightside} \end{pairs}) and I think that could be improved in the future, but there are a lot of options incl. setting on facing pages, line enumeration, verse typesetting etc. which makes the packet interesting for users which are interested in parallel typesetting but have demands going beyond what is provided by the other ones described above.
When typesetting poetry resp. verses an ordinary tabular might be just enough because there are always comparatively short single corresponding lines and not text streams which have to be aligned. Custom linewidth wide cells could be done for example with the tabular* environment like:
\begin{tabular*}{\textwidth}[]{p{0.4\textwidth}p{0.6\textwidth}}
Test test test & Test test test \\
Test test test & Test test test
\end{tabular*}
To use a tabular for typesetting parallel verses is a highly customizeable method.

