Experiences in ConTeXt (1)
For a couple of days I am now playing around with ConTeXt and I have to say it’s really great. ConText (developer’s homepage) is, like LaTeX, a macro package for the extension of Donald Knuth’s original (Plain)TeX and so could be seen somewhat as an alternative to LaTeX. It makes use of the Pdf producing engine Pdftex (this team is designated as “ConTeXt Mark II”), but is also able to employ the enhanced, up-to-date Unicode capable engines XeTeX and LuaTeX. Seen from this point ConTeXt is an application belonging to the category “Pdf software”. ConTeXt is to my experience lean, effective and always just on the point. The LuaTeX format files for LaTeX have become available lately (see here), but if you really want to roll LuaTeX incl. working hyphenation etc. it’s better to work with ConTeXt – as a matter of fact it was developed very closely to LuaTeX (this couple is designated as “Mark IV”). With the LuaTeX engine it’s possible to set up customs implementations through Lua code which makes ConTeXt ultra versatile (see here).
Mk IV is available through TeX Live, but there is also a lean distribution, ConTeXt Minimals. Except for a few third party modules (see here) everything is integrated which has its advantages. A challenge for switchers is the fact that ConTeXt seems to be even more arcane than LaTeX. But there is some info especially for LaTeX leavers (see here, here, and here), the ConTeXt guys are running a wiki named ConTeXt garden which helps to establish you, and there is the seems to be newbie friendly mailing list here. The are many excursions, references and manuals by the programmers to be found on the net. A useful resource is the the page context-xml.
On the first run of MkIV you have to integrate the font directories first (on Linux/TeX Live 2009: export OSFONTDIR=”/usr/share/fonts/;/usr/local/share/fonts” or similar) and to rebuild the LuaTeX cache with: luatools –generate. Now you can process a basic document test.tex like: \starttext Hello, world! \stoptext with context test.tex into a Pdf document. Welcome to ConTeXt!
Next there has to be a shopping list of basic features:
- For enabling Unicode input encoding you have to give a \enableregime[utf], the right language/hyphenation option is switched with \mainlanguage[de] resp. [en] (see options here)
- The employment of custom fonts has made been easy through the module Simplefonts, a simple \usemodule[simplefont][size=10pt] or similar together with \setmainfont[xyz] switches it (elements like footnotes are not changed through that, is it that this could be improved or there is a deeper reason for that, but furthermore there has to be a \definesimplefonttypeface[myfont][xyz] and then \setupfootnotes[bodyfont=myfont] – for the page number there has to be given \setuppagenumbering[bodyfont=myfont])
- The whitespace between paragraphases could be manipulated with \setupwhitespace[1.5ex] or something like this (an “ex” is the “high of an x”)
- The interline space could be manipulated simply through \setupinterlinespace[line=1.5\bodyfontsize] or similar
- The Pdf metadata could be passed through with \setupinteraction[state=start, title={My document}, author={me}], a clickable table of contents in blue there you’ll get with \setupcombinedlist[content][interaction=all, color=blue]
That’s great but what’s in here for the philologists? What first caught my attention is the fact that at MkIV there weren’t any problems with multiple combined diacritical marks (Unicode 0300-036F) of the IndUni-H by John Smith (Vedic stuff like r-underdot-macron-acute/grave). I don’t know who made a mistake, but there are still some problems with XeTeX and that font (which is the only free full fledged Helvetica available). Being reliable with minority stuff like this leaves a very confident impression.
Another very interesting development is that of CritTeXt which is planned to be a luxurious functionality for the production of textual editions with ConTeXt. Idris Hamid at the Colorado State University and the Oriental TeX project are the driving force behind CritTeXt. What has been presented at the TUG 2007 talk and that outline of text critical typesetting features (here) looks very promising. Unfortunately that’s all what is on the net towards this and till now I haven’t found out if the project got far enough in the meanwhile to be able to plan with that or even switch over for that. But I’ve been told that everything is going to be implemented into ConTeXt – eat this, Word! Next to the still available TeX critical edition solutions, Ledmac and Ednotes for LaTeX and the “Stammvater” Edmac for PlainTeX (no, not running with ConTeXt), this could be another chance for getting what I know is the wet dream of many of my peers: parallel typesetting in a critical edition environment. I’ll report further discoveries.
New publication on the old Sanskrit manuscripts from Tibet
Another recent publication towards the Sanskrit manuscripts which survived in Tibet:
Sanskrit manuscripts in China. Proceedings of a panel at the 2008 Beijing Seminar on Tibetan Studies, October 13 to 17. Edited by Ernst Steinkellner in cooperation with Duan Qing and Helmut Krasser. Beijing: China Tibetology Publishing House 2009. ISBN 978-7-80253-226-7.
Content:
- Duan Qing: A fragment of the Bhadrakalpasūtra in Buddhist Sanskrit from Xinjiang.
- Fan Muyou: Some grammatical notes on the Advayasamatāvijayamahākalparāja.
- Pascale Hugon: Phya pa Chos kyi seng ge’s synoptic table of the Pramāṇaviniścaya.
- Harunaga Isaacson: A collection of Hevajrasādhanas and related works in Sanskrit.
- Matthew T. Kapstein: Preliminary remarks on the Grub mtha’ chen mo of Bya ’Chad kha ba Ye shes rdo rje.
- Shoryu Katsura: Rediscovering Dignāga through Jinendrabuddhi.
- Helmut Krasser: Original text and (re)translation – a critical evaluation.
- Li Xuezhu: Candrakīrti on dharmanairātmya as held by both Mahāyāna and Hīnayāna – based on Madhyamakāvatāra Chapter 1.
- 李学竹: 月称关于二乘人通达法无我的论证 – 以梵文本《入中论》第 一 章为考察中心.
- Luo Hong: A preliminary report on a newly identified Sanskrit manuscript of the Vinayasūtra from Tibet.
- Luo Zhao: The cataloguing of Sanskrit manuscripts preserved in the TAR: A complicated process that has lasted more than twenty years.
- 罗炤: 西藏梵文贝叶经的编目情况及二十余年的曲折经过.
- Saerji: Sanskrit manuscript of the Svapnādhyāya preserved in Tibet.
- Francesco Sferra: The Manuscripta Buddhica project – alphabetical list of Sanskrit manuscripts and photographs of Sanskrit manuscripts in Giuseppe Tucci’s collection.
- Ernst Steinkellner: Strategies for modes of management and scholarly treatment of the Sanskrit manuscripts in the TAR.
- 恩斯特∙斯坦因凯勒: 西藏自治区梵文手稿的管理模式及学术性处理方面的策略.
- Tsewang Gyurme: Protecting the Sanskrit palm-leaf manuscripts in the Tibetan Autonomous Region – A summary.
- Ye Shaoyong: A preliminary survey of Sanskrit manuscripts of Madhyamaka texts preserved in the Tibet Autonomous Region.
Sky’s the limit: on the background of the Digital Sanskrit Corpus (DCS)
For a couple of days Oliver Hellwig’s Digital Corpus of Sanskrit (DCS) is now online. On the first sight this project doesn’t looks different than all the other sites on the net for which Sanskrit texts have been typed in and in these the lemmas resp. lexemes have been cross-hyperlinked to one or more online dictionaries. As a result of these project often there are alphabetical lists of lexemes including auto-generated text instances like it is the case here with the DCS. But the scope of the Sanskrit Tagger, from which the DCS is a somewhat unspectacular looking form of result, goes much deeper as Hellwig explained in his presentation at the 1st International Sanskrit Computational Linguistics Symposium (1st ISSCL) at the INRIA in Paris 2007 (paper see here, video stream here [great that there is!]).
The Sanskrit Tagger is a device which uses stochastical (statistical) methods for the automatic assigning of grammatical descriptors to the “words” in random Sanskrit text. For that part-of-speech (POS) tagging a Hidden Markov Model (HMM) is brought into charge, a statistical method which is used in computational linguistics but also in bioinformatics which both have in common to be fields of temporal pattern recognition. In plain English that just means that the computer has been brought up to read Sanskrit. The whole process of tagging in its steps is described at length in the paper.
Due to the diligence of Hellwig the database which belongs to the tagger is constantly growing and with that, as always with statistics, the hitting precision increases. I think in the not so far away future it will be possible to leave the tagger processing larger Sanskrit corpuses like the GRETIL for example without producing a mass of errors. Auto-generated word indexes like the ones of the Bodhicaryāvatāra and the Rāmāyaṇa which Hellwig presented on his page a couple of weeks before are just one result of the project, but it is even going further than that. A large corpus of tagged text is ideal for syntactical inquiries for example. But to refine the statistical methods which are applied to Sanskrit text could bring us also to research tools like generated author style fingerprints which would deepen our insight into Sanskrit literature significantly.
Basic literature:
Baldi/Brunak: Bioinformatics – the machine learning approach. 2nd ed. Cambridge (usw.): MIT Press 2001 [165 seq.: Hidden Markov Models: the theory].
Eddy: What is a Hidden Markov Model? In: Nature Biotechnology 22,10 (2004), 1315-16 [doi 10.1038/nbt1004-1315].
Fucks/Lauter: Mathematische Analyse des literatischen Stils. In: Kreuzer/Gunzenhäuser (Ed.): Mathematik und Dichtung. Versuche zur Frage einer exakten Literaturwissenschaft. München: Nymphenburger Verlagshandlung 1969, 107-22.
Hellwig: Sanskrit Tagger – a stochastical lexical and POS tagger for Sanskrit. In: Huet/Kulkarni/Scharf (Ed.): Sanskrit computational linguistics. First and Second International Symposia. Berlin, Heidelberg: Springer 2009, 266-77.
Samuelsson: Statistical methods. In: Mitkov (Ed.): The Oxford Handbook of Computational Linguistics. Oxford University Press 2004, 358-75 [19.3: Hidden Markov Models].
Voutilainen: Part-of-speech tagging. In: Mitkov (Ed.): The Oxford Handbook of Computational Linguistics. Oxford University Press 2004, 219-32.
A few German classics of philology
If it concerns to you – these might not be available everywhere. Enjoy!
- Boeckh, August: Enzyclopädie und Methodenlehre der philologischen Wissenschaften. 2. Aufl. Leipzig: 1886.
- Kanzog, Klaus: Variante und Textentscheidung. Über die Rolle der Textkritik im literaturwissenschaftlichen Studium. In: Jahrbuch der Deutschen Schillergesellschaft 22 (1978), 700-721.
- Kantorowicz, Hermann: Einführung in die Textkritik. Systematische Darstellung der textkritischen Grundsätze für Philologie und Juristen. Leipzig: Dieterich’sche Verlagsbuchhandlung 1921.
- Luschnat, Otto: Zur Editionstechnik der klassischen Philologen. In: Wissenschaftliche Annalen [Berlin] 1,6 (1952), 362-75.
- Maas, Paul: Textkritik. 3., verbesserte und vermehrte Auflage. Leipzig: Teubner Verlagsgesellschaft 1957 [bundle including the reviews of Pasquali (1929) & Erbse (1959)].
- Seiffert, Hans Werner: Untersuchungen zur Methode der Herausgabe deutscher Texte. 2. Auflage. Berlin: Akademie-Verlag 1969 (Veröffentlichungen des Instituts für deutsche Sprache und Literatur; 28).
- Stählin, Otto: Editionstechnik. Ratschläge für die Anlage textkritischer Ausgaben. Leipzig, Berlin: Teubner 1909 [Offprint from: Neue Jahrbücher für das klassische Altertum, Geschichte und deutsche Literatur 12].
- Vogel, Claus: On editing Indian codices unici (with special reference to the Gilgit manuscripts). In: Stietencron (Ed.): Indology in India and Germany. Problems of information, coordination and cooperation. Tübingen 1981, 59-69 [rejoinder: H. Matsumura: On editing Indian codices multi. In: Aligarh Journal of Oriental Studies 3,2 (1986), 93-100.; M Hahn: On editing codices unici. In: F. Grimal (Ed.): Les sources et le temps. Pondichéry: Institut Français 2001, 49-62].
“Thanks, but what the hack is djvu?”
Creating ebooks from book scans …. on Linux
Refined semiprofessional document scanning within Linux: here is a little collection of procedures and hints towards the production of e-books on your Linux system. I assume that your scanner is already running on Sane (see below) and you know how to get the suggested software packages from your repository if available. The running system is a Debian GNU/Linux testing (“Squeeze”) – I’ll give some availability information concerning other distributions (are there?) but not in a comprehensive way. In doubt please check the relevant project homepages and/or the source code hosts for more information on the applications. All the software mentioned is open source, so if nobody thought providing a packet for your distribution there is always the way to compile the code on your own – but it is for sure much more convenient to receive software from the repository through your packet management or at least to get it manually from the programmer or from somewhere else. By the way, most of the packets are available for non-Linux operating systems also. An easy way to get Linux stuff running on your Windows PC is Cygwin.
The e-book which is going to be produced is a single sided, black & white (b/w), OCR layer containing djvu or pdf file (“the containers”).
1. Scanning
Fortunately there is no specific insider knowledge needed for scanning on Linux, Sane is definitely the application which is mostly used for that purpose (libraries at Debian testing currently 1.0.20-13, frontends 1.0.14-9 – quite up-to-date!), and which should be available on most of the everyday Linux systems. The frontend Xsane (0.996-3) is quite convenient for batch scanning. It allows to choose a scan area if the book is smaller than the scanner, the pages could be saved rotated 90°, and you can auto adjust gamma/brightness/contrast after getting the first picture or the preview (the big buttons, the 2nd from the left). Other scanning solutions are surely also possible, the command line frontend scanimage for example is open to be run from a shell loop with custom intervals to save the need to push a button to proceed after every page.
I am scanning at 300 DPI grayscale with the file extension .pgm [1] because my scanner backend doesn’t support b/w scanning currently (I know, I know …), but if with your model it’s possible you could try to scan at b/w and skip the step converting the files to that after scanning. For post processing reasons it’s right to create a sequence 001.pgm, 002.pgm etc. and Xsane takes care of that. Usually there are two book pages in one picture – we are going to work on that next.
To discuss the DPI rate, when you check the 300 DPI outcome with your favourite image viewer you’ll see that the scans are much bigger than it would be necessary for reading on the screen but that’s just the right way because the images will appear shrinked in the containers and also the outcome of the conversion to b/w is better than producing scans with a lower DPI rate. And other as it is the case with grayscales, after converting to b/w different DPI rates have not such a significant effect on the overall file size of the final product so that there is no need to go below 300 to save resources.
[1] .pgm instead of the meta extension .pnm to separate it from .pbm after converting it to b/w (next step), and furthermore not as .tif because the post processing tool Unpaper (see below) couldn’t work with that.
2. Batch postprocessing (1): conversion and manipulation with Imagemagick
Imagemagick is a most versatile Swiss army knife for manipulation images at Linux systems. Like Sane it should be available broadly. Our scans could be easily converted to b/w using a simple shell loop:
for i in *pgm; do convert $i -verbose ${i%pgm}pbm; done
It’s also possible to manipulate the threshold for a pixel if it gets black or white but normally it works pretty well. The often remaining black stripe in the middle of the scan is going to be removed with Unpaper (next step). With Imagemagick resp. convert it’s also possible to rotate the scans (-rotate 90) and to cut out a rectangular region (-crop width x height +x +y) and lot of other manipulations are possible, please check out the command line options.
A hint for batch conversion: it’s always a good idea not to overwrite with the manipulated files but to write the new generation into another directory (like: … -verbose ~/foo/${i%pgm}pbm; done).
3. Batch postprocessing (2): Unpaper
Unpaper written by Jens Gulden (currently 0.3-1, also available for Ubuntu Karmic) is a tool for post processing scanned book pages. It can remove dark areas and corrects misaligned centring and rotation of book pages, removes blur and noise and also is able to split double book page scans into individual images. It’s made for heavy duty tasks dealing with scans even of the most ridiculous book xerocopies. Unpaper is able to perform batch processing jobs and the simple usage would be like:
unpaper --layout double --output-pages 2 %03d.pbm ~/foo/%03d.pbm
–layout double defines the input to carry two book pages on one scan image and –output-pages 2 tells Unpaper to split them up into two individual files, %03d is a shell variable for three-digit numbers. Unpaper is quite versatile and to get acquainted with everything needs some effort. While it does his job it’s a good a idea to constantly monitor the output. In the case unwanted results appear you could break the process and change the settings. Unpaper is very sensitive, for example in most of the cases when text blocks accidentally are removed on single pages you have to manipulate the mask scan setting (try a lower setting like -ms 25,25). The processing could be resumed from any file with –start-input x, but you have to align also –start-output and also to give –overwrite then. A useful user documentation is provided on the project’s homepage (here).
4. Creating djvu (& pdf)
If you haven’t known it already: djvu is a powerful container format for digital images which is faster and better in compression than other solutions and there are viewers available for nearly all the operating systems (see here, djview 4.5-3 at “Squeeze”). Even if djvu reveals its full potential especially at killer tasks like unreduced satellite pictures to my experience the workflow with it is always a little bit more fluent even with b/w book scans. To concatenate our post-processed book pages into djvu is no problem with the Djvulibre collection (3.5.22-7). First of all we have to convert the .pbm images into the djvu file format:
for i in *pbm; do cjb2 $i ${i%pbm}djvu; echo $i; done
After that we have to collect the container:
djvm -c mydjvu.djvu *djvu
That’s it!
As easy as this it is to create a pdf at Linux. First of all you have to convert the .pbms into .tifs (for i in *pbm; do convert $i -verbose ${i%pbm}tif; done), after that you have to create a multi-page tiff from these (tiffcp *tif bundle.tif ), and finally you could create a pdf from that with: tiff2pdf -o mypdf.pdf bundle.tif (Note: tiffcp and tiff2pdf are part of the libtiff-tools, 3.9.2-2 at “Squeeze”. For tiff2pdf the compression method has to be given also, -j (Jpeg) or -z (Zip), see the the manpage here).
5. OCR
The are also solutions available for Linux to derive OCR information from book scans for the text layers of djvu and pdf, and Tesseract seems to be the most mature application so far. The development of it has been taken over by Google and it is described to be “probably one oft the most accurate open source OCR engines available”. Tesseract-ocr is available for Debian testing in version 2.04-2 and there are a few language data files for the software which have to be installed also (tesseract-ocr-eng etc.). Playing around one could get the impression Tesseract is working quite nice especially when the correct language is chosen. Although it has problems with Sanskrit diacritics, but I’ve seen that Tesseract could be also trained (I’ll report when I found out more). It could be applied on individual image files also through batch processing (see some experiences here) but it is more convenient to work with a wrapper which also takes care of to re-combine the OCR output with the image automatically:
Ocrodjvu (0.3.2-1, Ubuntu Lucid) by Jakub Wilk is a foolproof wrapper for working on already djvu concatenated document scans which depends to OCRopus (0.3.1-2), an open source OCR system which is under development by the German Research Center for Artificial Intelligence (DFKI). OCRopus employs Tesseract to extract the textual information from the scanned document and, and that’s the clou here, saves also the page positioning information with every word so that a query at Djview or other viewers results not only in the relevant page but also in the highlighted word instances on these pages (layout analysis) – a feature which could hardly be missed nowadays. Ocrodjvu is easy to apply to the djvu we’ve created so far:
ocrodjvu -o mydjvu_ocr.djvu mydjvu.djvu --language=eng
or similar. Start’n'forget – live is easy.
For pdf e-books it’s a little bit more tricky because there isn’t a fully developed wrapper for OCRopus available so far for pdf (the little tool pdf2ocr which I’ve found in the net I couldn’t bring up to work properly) – so I will left that out here for now.
6. Gscan2pdf
Gscan2pdf (0.9.29-1, Ubuntu Hardy) is actually a very comfortable GUI frontend for the most of the multitude of tools we’ve discussed so far, Sane (scanning), Unpaper (postprocessing) and Tesseract (OCR) and the whole process of producing an e-book, both djvu and pdf, could be produced with this amazing tool. Gscan2pdf employs ports to Tesseract and also to the alternative Gocr, but as far as I’ve seen unfortunately it hasn’t a port to OCRopus nor couldn’t deal with layout analysed output (hocr) so this is a desideratum here.
7. Bookmarks
The final step to refine your e-book would be to apply bookmarks to the document. For djvu custom bookmarks (in the djvu world it is called “outline”) have to be in a form like:
(bookmarks
("Title" "#1")
("Main matter" "#5"
("Chapter 1" "#5")
("Chapter 2" "#15"))
)
After editing such a file, you could name it mydjvu.outline, djvused from the Djvulibre tools can apply the outline to the container:
djvused -e mydjvu.djvu 'set-outline mydjvu.outline' -s
That’s it. By the way, the djvu outline format is Unicode capable.
8. Miscellaneous stuff
If the book you want to scan is bigger than the affordable scanner the is the way to scan single pages at once. If then the lid of the scanner couldn’t be fully removed or for whatever other reason it could be the case that you have a set of even numbered scans on which single pages are rotated 180° in relation to the ones on the even numbered scans or the other way around. There is also a way to rotate any of them, try:
for i in *pbm; do p=`echo $i | cut -c 3`; if [ $(($p%2)) -eq 1 ]; then convert $i -rotate 180 -verbose $i; fi; done
(one line!). This is to rotate the set of odd numbered three-digit long .pbms. For working on the even numbered set, exchange - eq 1 with -eq 0. But if you try you’ll see that scanning such a way takes painful Prussian dicipline.
Unpaper employs batch processing with rising numbers. If you want to re-engineer your already created djvu containers like that you can unpack them with ddjvu which puts out a multi page tif (usage like: ddjvu -format=tiff -pages=1-25 ~/foo/mydjvu.djvu bundle.tif). That again could be bursted with tiffsplit which produces a set of images aaa.tif, aab.tif etc. After converting them to pbm (and then re-enconding them into djvu is so far that I’ve seen the only way to attain a custom page range djvu from djvu), and anyway a continuous numeric sequence of file names for the processing with Unpaper could be restored through this little shell script here.
TeX Live 2009: LuaLaTeX rolls on Debian (and the others)
Previous posting on this issue here.
1. TeX Live 2009 at Debian unstable
Since my Debian Unstable/”Sid” jumped over to TeX Live 2009 a couple of days before (after being backward with 2007 as standard TeX distribution for a long time they skipped 2008 which is really pleasing, see also here) it’s now possible to run LuaLateX without a manual install – the relevant LaTeX format files have been included in Tex Live 2009 and the lualatex executable is now available on the console (LuaTeX version 0.50). The relevant macro packages (see http://tug.ctan.org/tex-archive/macros/luatex/, github collection) are made available most convenient through the new packet texlive-luatex (2009-7). Pretty soon the whole smack is going to be available also at mostly unstable branch Debian derivates like Ubuntu. TeX Live also runs on other operating systems.
2. Hello world!
So now a rudimentary LuaLateX document like:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\begin{document}
Hello world!
\end{document}
… runs. By the way, luainputenc (doc) calls luatextra (doc) which is also included in the texlive-luatex bundle.
3. Running Lua code from within the document
But Luatex is not only an alternative project which you could use to replace the other up-to-date Unicode capable/pdf creating LaTeX engine XeTeX (BTW see an introduction to XeTeX in German here), the advantage of LuaTeX is that the scripting language Lua is implemented into the engine which converts it to a kind of “eierlegende Wollmilchsau” one could say in German (for a collection of advantages of that approach see here). On the professional level that means that the rebuild engine could have been made much more faster in processing and versatile than everything which has been possible before (see Kastrup’s presentation at BachoTeX 2008) – on the user level it just means that it is possible to run Lua code while processing the document. Lua code can be called from the macro package but also from within the document (a for somewhat comparable TeX project is Perltex). Let’s give it a try:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\begin{document}
So it's time to say:
\begin{luacode}
tex.print("Hello world!")
\end{luacode}
\end{document}
Results in:
But more significant:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\begin{document}
A random number:
\begin{luacode}
tex.print(math.random ())
\end{luacode}
\end{document}
Results in:
The one about font selection would be the next question. The availables packets are not quite mature but LuaTeX is under heavy development. The packet luaotfload (doc) is made for the purpose of font switching, the packet loads automatically with luainputenc resp. luatextra. The usage is the same as with the True/Opentype fontloader which is provided for the Plain/ConTeXt sister (see here). The basic usage would be something like:
\documentclass{article}
\usepackage[utf8]{luainputenc}
\font\myfont="FreeSans.ttf"
\begin{document}
\myfont Mahābhārata
\end{document}
The font file has to be in the working directory. An alternative is going to be the LuaTeX implementation of Fontspec, which is known from XeTeX (see here) and which employs a font file lookup. An experimental 2.0 version (.dtx and makefile) is provided by Khaled Hosny at his Github account. But I couldn’t get it to run so far.
New e-vedica found on the net
Awaited to appear: 2nd edition of Aufrecht’s Ṛgveda (Bonn 1877) available (here and here); great: Lubotsky’s edition of Atharvaveda-Paippalāda 5 (Cambridge 2002) available here.
The Tanjur Bodhicaryāvatāra auxiliaries: a few collected bibliographical pointers
Filed under: Bibliography, Bodhicaryāvatāra, Footnotes
There are several works of literature transmitted in the Tanjur related to Śāntideva’s masterpiece Bodhicaryāvatāra (Bca), and unfortunately nearly all of them have been lost in their original Sanskrit versions (on them cf. Dietz: Śāntideva’s Bca – das Weiterwirken des Werkes dargestellt anhand der Überlieferungsgeschichte des Textes und seiner Kommentare [Lecture script] {Buddhismus in Geschichte und Gegenwart 3: Śāntideva’s “Eintritt i.d. Leben zur Erleuchtung.” Hamburg 1999, p. 27-41}, p. 35 sq. [IV. Die Kommentare zum Bca]; Ejima: Nyūbodaigyōron no chūshaku bunken ni tsuite {Indogaku Bukkyōgaku Kenkyū /Journal of Indian and Buddhist Studies 14,2 (1966), p. 644-48}; Williams: On Prakṛtinirvāṇa / Prakṛtinivṛta i.t. Bca {Altruism and reality. Studies i.t. philosophy of the Bca. Richmond 1998, p. 1-28 = *Asiatische Studien / Etudes Asiatiques 46,1 (1992), p. 516-50}, p. 3 sq. [1. Indian commentaries]). There are (in order of appearance):
1. The Bodhicaryāvatāra-Pañjikā (byaṅ chub kyi spyod pa la ‘jug pa’i dka’ ‘grel, Q 5273, 221 (la / mdo ‘grel 26), 45a7-325a5; facsimile ed. vol. 100) [1] by Prajñākaramati (Śes rab ‘byuṅ gnas blo gros) is a extensive commentary on the chapters 1-9 of the Bca. Besides its Tibetan carrier the text also survived in the original Sanskrit through a few very precious antique palm leaf manuscripts kept in Kolkata (cf. this former blog contrib), which has been edited three times: the 9th chapter in 1898 by La Vallée Poussin (1869-1938); the full text by the same in 1901-14 for the Bibliotheca Indica series and later again by Vaidya (1891-1978)[2] in 1960 for the Buddhist Sanskrit Text series; due to folio loss unfortunately there are two large lacunae, 3,22-4,45 and 8,109-186 [end of the 8th chapter] haven’t made it.[3] Prajñākaramati lived at the end of the 1st millennium and was a teacher and a so-called “gatekeeper” (an examiner?) of the famous Vikramaśīla university considered to be located in the Bhagalpur district of modern Bihar (cf. Bose: Indian teachers of Buddhist universities. Adyar 1923, p. 50 sq. [4. Prajñākaramati]; Dutt: Buddhist monks and monasteries of India. London 1962, p. 358 sq. [b. Vikramaśīlā]. Towards the patronage of the Pālas in general cf. the introduction chapter of: Huntington: Leaves from the bodhi tree. The art of Pāla India (8th-12th centuries). Seattle (etc.) 1990; and most recently: Sanderson: The Śaiva age {Einoo (Ed.): Genesis and development of Tantrism. Tokyo 2009, p. 41-349 [!]}, p. 87 sq. [The Pāla emperors and the great monasteries of eastern India]). In the colophons of the survived mss at the Asiatic Society of Bengal the author is called prajñākara (no. 3830) next to prajñākaramati (no. 3829, cf. Hara Prasada Shāstri: Descriptive catalogue of the Sanskrit mss i.t. Government Collection 1: Buddhist mss. Calcutta 1917, p. 49 sq.), which Bose claims to be common (cf. op.cit., p. 52). Furthermore he is titulated paṇḍitabhikṣu, which was an academic title in the university.
2. The Bodhisattvacaryāvatāra[4]-Vivṛttipañjikā (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i rnam par bśad pa’i <bka’i ] dka’> ‘grel, Q 5274, 221, 325a5-396a5) is an anonymous commentary on the shorter version of the Bca [5], and therefore employs 9 chapters while commenting also on the Pariṇāmanā, the 9th chapter of the Bca in its short version.[6] Saito writes that the author comments the text the from a Yogācāra-Mādhyamika point of view (cf. Saito: Śāntideva in the history of Mādhyamka philosophy {Sankarnarayan/Yoritomi/Joshi (Eds.): Buddhism in India and abroad. Mumbai (etc.) 1996, p. 257-63}, p. 259).
3. The Bodhisattvacaryāvatāra-Saṃskāra (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i legs par sbyar ba, Q 5275, 222 (śa / mdo ‘grel 27), 1-106a6) by Kalyāṇadeva (Dge ba’i lha), the “edition” of the Bca, is another commentary on the long version, but not as exhaustive and rich of citations as Prajñākaramati’s Pañjikā.
4. The Bodhisattvacaryāvatāra-Duravabodhanirṇaya-nāmagrantha (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i rtogs par dka’ ba’i gnas gtan la dbab pa źes bya ba’i gźuṅ, Q 5276, 222, 106a7-112b4), “book with the name: discussion of difficult (portions)” by Kṛṣṇapāda (Kṛṣṇa ba) is, like the name suggests, a non-continuous commentary. The author was a Newar brahmin and disciple of Śāntibhadra, a Newar scholar of the 11th century (cf. Lo Blue: The role of Newar scholars in transmitting the Indian Buddhist heritage to Tibet (c. 750-c. 1200) {Karnay/Sagant: Les habitants du toit du monde (Festschrift A.W. MacDonald). Nanterre 1997, p. 629-58}, p. 639).
5. The Bodhisattvacaryāvatāra-Pañjikā (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i dka’ ‘grel, Q 5277, 222, 112b4-186b7) by Vairocanarakṣita (Bai ro tsa na kṣi ta) is by its extend somewhat comparable to Kalyānadeva’s Saṃskāra. The author, also known as Vairocanavajra, was a contemporary of Atiśa (982-1054) and also a inhabitant of the Vikramaśīla university (cf. Schaeffer: The religious career of Vairocanavajra – a 12th century indian Buddhist master from Dakṣina Kośala {Journal of Indian Philosophy 28 (2000), p. 361-84}).
6. The Prajñāparicchedapañjikā (śes rab le’u'i dka’ ‘grel, Q 5287, 222, 186b7-210a5) is, like the name suggests, a commentary of the 9th chapter of the Bca.
7. The Bodhisattvacaryāvatāra-Vivṛtti (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i rnam par bśad pa, Q 5279, 222, 210a5-223b2) is nearly identical with the last two chapters of the Vivṛttipañjikā (Q 5274, see above), and thus also refers to the shorter version of the Bca.
8. The Bodhisattvacaryāvatāra-Ṣaṭtriṃśat-piṇḍārtha (byaṅ chub sems dpa’i spyod pa la ‘jug pa’i don sum cu rtsa drug bsdus pa, Q 5280, 222, 223b2-227b5), and the
9. Bodhisattvacaryāvatāra-Piṇḍārtha (byaṅ chug sems dpa’i spyod pa la ‘jug pa’i don bsdus pa, Q 5281, 222, 227b5-229a7) are abridgements and consist of selected stanzas of the Bca (cf. Eimer: Suvarṇadvipa’s “commentaries” on the Bca {Bruhn/Wetzler: Studien zum Jainismus und Buddhismus (Festschrift Alsdorf). Wiesbaden 1981, p. 73-78}). The author is called in the colophon “Suvarṇadvipī Lama Dharmapāla” (Gser gliṇ gi bla ma chos skyoṅ), but that just refers to Atiśa’s teacher from Sumatra, Suvarṇadvīpa Dharmakīrti (Gser gliṅ pa chos kyi grags pa, cf. Eimer: Berichte über das Leben des Atiśa (Dīpaṃkaraśrījñāna). Wiesbaden 1977, p. 14, fn. 9).
10. The Bodhicaryāvatāra-tātparyapañjikā-Viśeṣadyotanī-nāma (byaṅ chub kyi spyod pa la ‘jug pa’i dgoṅs pa’i ‘grel pa khyad par gsal byed ces bya ba, Q 5282, 222, 229a8-343a4), was written and also translated by Vibhūticandra (rnal ‘byor zla ba). Karunaratne claims that the text is a sub-commentary on Prajñākaramati’s Pañjikā (cf. Bca-tātparyapañjikā-viśeṣadyotanī-nāma {Malalasekera (Ed.): Encyclopedia of Buddhism. Vol. 3: Bhārini-deva – Caura-vidhvaṃsana-n°. Sri Lanka 1971, p. 184 sq.}. In the beginning the text contains a short hagiographic biography of Śāntideva, which also survived in its Sanskrit original (cf. ms no. 9990 again i.t. Government Collection of the Asiatic Society of Bombay [what a precious collection they have got there, it taken scanned and put online would give a glorious major impulse for worldwide studies], no. 52 i.t. catalogue (op.cit., p. 51); examined by Jong as a review of Pezzali’s Śāntideva (Firenze 1968): La légende de Śāntideva {Indo-Iranian Journal 16,3 (1975), p. 161-82 = Buddhist Studies [Collected minor works]. Berkeley 1979, p. 119-40}). Master Vibhūticandra is not a minor figure, he was an important Kālacakra scholar and lived in the monastery of Jagddala until it was invaded in the 1190s. Then he accompanied Śākyaśrībhadra (1127-1225) to Tibet in 1204. The dates of his birth and death are unknown (cf. Das Gupta: Vibhūticandra of the Jagaddala Mahāvihāra {Indian Culture 5 (1938-39), p. 215-17}; Vogel: Lunar eclipses of the early 13th century predicted by the Buddhist master Vibhūticandra {Kollmar-Paulenz/Peter (Eds.): Tractata Tibetica et Mongolica (Festschrift Sagaster). Wiesbaden 2002, p. 305-11}; Stearns: The life and legacy of the Indian Mahāpaṇḍita Vibhūticandra {Journal of the International Association of Buddhist Studies 19,1 (1996), p. 127-71}).
These representations of Sanskrit works that have been lost suggest that there existed a vast literature on the Bca and maybe this again is only a cutout. Dietz points out that in the Duravabodhinirṇaya there are many references to unknown commentaries and subcommentaries (op.cit., p. 38). The Buddhist Sanskrit literature related to the Bca is a fascinating, challenging subject. A closer examination of the auxiliaries i.t. Tanjur remains a desideratum and could make a subject on its own.
Notes:
[1] Cf. the usual catalogues: Suzuki: The Tibetan Tripitaka. Catalogue & Index. Tokyo 1962, p. 639 sq.; *A comparative analytical catalogue of the Tanjur division of the Tibetan Tripitaka kept i.t. Otani Univ. Library. Kyoto 1965 ff. The other Tanjur editions could be located as always through the usual catalogues or a query at the general catalogue in Vienna.
[2] Obituary by B.V. Bapat i.t. Journal of the International Association of Buddhist Studies 1,1 (1978), p. 91 sq.
[3] Sometime some more Sanskrit mss which survived in Tibet are mentioned, cf. Martin: Tibskrit 2008, p. 1666: “KCDS [Microfilm catalogue of the Tibetan Cultural Research Center, Beijing], p. 150. Sanskrit palmleaf manuscript now belonging to the Potala”; Petech: Medieval history of Nepal (c. 750-1482). Roma 1984, p. 98: “23) Ms. of Prajñākaramati’s commentary on the Bca. Ṅor monastery in Tibet”, footnote: “… although the ms. seems to be listed in RS, XXI, 37 (no. 110) [refers to: Sāṅkrityāyana: Sanskrit palm-leaf mss. in Tibet {The Journal of the Bihar and Orissa Research Society 21 (1935), p. 21-43}, but no. 110 (p. 37) refers to a ms of the mūla]. From a hand-copy made by Professor G. Tucci in 1939″ [not listed in: Sferra: Sanskrit mss and photos of Sanskrit mss in Guiseppe Tucci's collection {Balcerowicz / Mejor (Eds.): On the understanding of other cultures (Festschrift Schayer). Warsaw 2000, p. 397-413}].
[4] In Tibetan the title Bodhisattvacaryāvatāra (sometimes refered to as “Bsa”) appears next to Bodhicaryavatāra. Lindtner claims that the longer version is original, Rachewiltz: The Mongolian Tanjur version of the Bodhicaryātāra. Wiesbaden 1966 [review] {Buddhist Studies Review 15,2 (1998), p. 238-40}, p. 239: “Moreover it is more important, the full title of the poem is Bodhisattvacaryāvatāra (rather than Bodhicaryāvatāra). This form is also supported by the Mongolian … Obviously, our poem is not an introduction to the life of bodhi, but to the career of a bodhisattva. The source of the abbreviated title probably the author himself.” But it’s a fact that the title Bca and even not a single time the longer version occurs everywhere in the Sanskrit transmission: India Office Library [incorporated into the British Library] ms no. 7713 colophon (cf. Keith: Catalogue of the Prākrit mss i.t. library of the India Office 2: Brahmanical and Jaina mss. Oxford 1935, p. 1394), Minaev’s own ms “M” (cf. Spasenie po učeniju pozdnějšich buddhistov [Salvation according to the teachings of the late Buddhist] {Zapiski Vostočnago Otdělenija Imperatorskago Russkago Archeologičeskago Obščestva [Memoirs of the Oriental Section of the Imperial Russian Archaeological Society] 4 (1889), p. 153-228}, p. 154); i.t. colophons of the mss no. 78 & 79 of Filliozat’s catalogue of the Sanskrit stocks of the French national library (Catalogue du fonds sanscrit 1: nos. 1 à 165. Paris 1941, p. 63); in no. 8067 i.t. Government Collection of the Asiatic Society of Bengal (op.cit., no. 19, p. 21) etc. etc. That the longer version of title is to be found also in Mongolian for me is no argument, that the Mongolian version of the text (and so its title) derived from the Tibetan has been found out already by Weller (cf. Über den Quellenbezug eines mongolischen Tanjurtextes. Berlin 1950 [Abhandlungen d. Sächsischen Akademie d. Wissenschaften zu Leipzig, phil.-hist. Klasse; 45,2]).
[5] The shorter version of the Bca, also lost in its original Sanskrit version, is transmitted in Tibetan outer-canonically and has been found in Dunhuang on the Silk Road. It employs 9 chapters in 701 verses while the longer version consists of 10 chapters and 913 verses; chapter 2 and 3 of the longer version are combined here, the verses there which are related to the ritual of taking the bodhisattva vow (saptavidhānottarapūjā, cf. Gómez: Bodhicitta (thought of awakening) {Buswell: Encyclopedia of Buddhism. Vol. 1: A-L. New York 2004, p. 54-56}) are still absent here and there are other interesting differences, mainly in the 5th chapter. The short version has been examined and edited by Saito in research projects of Mie University (cf. A study of Akṣayamati’s (=Śāntideva’s) Bodhisattvavaryāvatāra as found in the Tibetan mss from Tun-huang. Project no. 02801005 [Research project report, 1993]; A study of the Dūn-huáng recension of the Bodhisattvacaryāvatāra. Project no. 09610021 [Research project report, 2000]), and there is a vast (mostly Japanese) literature on that subject and its issues like the Akṣayamati hypothesis (that the authors original name is Akṣayamati and not Śāntideva), the Tabo ms of the Bca (cf. Saito: Remarks on the Tabo ms of the Bodhisattvacaryavatāra {Scherrer/Schaub: Tabo Studies II. Manuscripts, texts, inscriptions, and the arts. Roma 1999, p. 175-89}) etc.
[6] That the 10th chapter of the Bca (in its vulgate long version) is not original has been brought forward the first time by La Vallée Poussin in the French translation of the text (Bodhicaryāvatāra. Introduction a la pratique des futurs bouddhas. Paris 1907, p. 143 sq. [Note finale]: “Il entrait dans mon intention de publier la traduction du dixième chapitre du Bca …”). But this has been doubted among others by Ruegg (The literature of the Madhyamaka school of philosophy in India. Wiesbaden 1981, p. 83: “The authenticity of this final chapter has been questioned on the ground that not all commentators have commented on it, but this point does not appear to be decisive.” La Vallée Poussin didn’t recognized that some of the 9-chapter-commentaries are referring to another version of the mūla consisting only of 9 chapters anyway, and furthermore the fact that the Pañjikā omits the 10th chapter of the mūla does not mean necessarily that Prajñākaramati also considers it not to be original (another theory is that the Prajñāpāramitā chapter of the Pañjikā was written first and later the commentary on the chapters 1-8 as an addition, cf. this previous posting here). Dietz seeks to conclude the discussion pointing out that the Pariṇāmanā is to be found in just all known copies and versions of text, even the shorter Tibetan version and also the shorter Chinese version (Taishō no. 1662, Putixingjing (菩提行經), in 782 verses and 8 chapters), and I think that this is a decisive argument (cf. Dietz, op.cit., 30).
A few e-manuscripts from the state library Munich available now
There is some progress at the Bavarian state library / Bayerische Staatsbibliothek (BSB) in the digitalization of items of their Sanskrit manuscripts stocks (“Cod.sanscr.” in the collection Südasiatische Handschriften, a part of their catalogue (the first volume of their catalogue [no. 222 in Janert's Annotated bibliography, no. 693 at Biswas], Aufrecht 1909, which covers the Haug collection [predominantly Vedica], is online here; the second volume, Jolly 1912, is online here).
The items appear in line of their signatures, so that there some continuous scanning of this collection might be going on these days. So far there are the numbers 328-44 (from the Jolly collection) available – check them out here. All items are downloadable in pdf format. Additions could be tracked through this RSS-feed, but unfortunately only among all the other mss scans, more detailed it isn’t getting (cf. their RSS-feed page here).
I’ve got no time to examine anything, but the scans are made quite decent. Among them available so far there is a copy of Īśvarakṛṣṇa’s Sāṃkhyākārikās (342). There are also two scans of mss of Kauṭilya’s Ārthaśāstra (334 & 35) which are obviously the ones in which Jolly and Hillebrandt discovered the text in or about 1908 (Cf. Hillebrandt’s Das älteste Lehrbuch der indischen Politik, das in zwei Handschriften der Kgl. Hof- und Staatsbibliothek in München vorliegt und sich als der lange vermisste Text des Kauṭilya’s erweist. In: Kleine Schriften, pp. 355-84).
Towards the Munich collections in general cf. BSB: Das Buch im Orient. Handschriften und kostbare Drucke aus zwei Jahrtausenden. Ausstellung 16.11.1982-5.2.1983. Wiesbaden: Reichert 1982, esp. pp. 21-29: Kaltwasser: Die orientalischen Sammlungen der Bayerischen Staatsbibliothek (towards the Sanskrit collections p. 25), and this handlist.
Some events in 2010
20th European Association for South Asian Archaeology and Art (EASAA) Conference, Wien, 04.-10.07.2010
3rd International Workshop on Early Tantra (IWET), Hamburg, 15.-23.07.2010 (contact) [1st workshop, 2nd]
“Indo-European verb” – Arbeitstagung der Indogermanischen Gesellschaft, Los Angeles, 13.-15.09.2010
“Spiegelungen, Projektionen, Reflexionen” – 31. Deutscher Orientalistentag (DOT), Marburg, 20.-24.09.2010
“Crossing borders in Southeast Asian archaeology” – 13th International Conference of the European Association of Southeast Asian Archaeologists (EurASEAA13), Berlin, 27.09.-01.10.2010
4th International Sanskrit Computational Linguistics Symposium (4i-SCLS, formerly: ISSCL), New Delhi, 10.-12.12.2010 [1st ISSCL, 2nd, 3rd]
Upcoming:
2nd International Indology Graduate Research Symposium (IIGRS), Cambridge [1st IIGRS]

