<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Sky&#8217;s the limit: on the background of the Digital Corpus of Sanskrit (DCS)</title>
	<atom:link href="http://www.danielstender.com/granthinam/704/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.danielstender.com/granthinam/704/</link>
	<description>Blog on Open source, Digital humanities, and Sanskrit philology</description>
	<lastBuildDate>Tue, 15 Nov 2011 15:57:16 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Aleix Ruiz Falqués</title>
		<link>http://www.danielstender.com/granthinam/704/comment-page-1/#comment-4568</link>
		<dc:creator>Aleix Ruiz Falqués</dc:creator>
		<pubDate>Wed, 26 Jan 2011 19:02:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.danielstender.com/granthinam/?p=704#comment-4568</guid>
		<description>Thanks for the post. And a very interesting discussion indeed. I think the best way to check out any &quot;style analysis&quot; machine is to feed it with modern literature whose authorship is known, and see whether the software is able to detect real differences between authors and literary movements or not. Has it already been done?
Besides, I do think it has some usefulness. There&#039;s no question about that. The problem could be just the same: it is a useful, utilitarist-focused software. Like any other method of analysis. Maybe not suitable for literary studies, but very useful for statistical purposes, indexes, etc. which are themselves a tool for literary criticism. 
The fact that this is not the only way or the only approach does&#039;nt make it useless. 
Thank you all.</description>
		<content:encoded><![CDATA[<p>Thanks for the post. And a very interesting discussion indeed. I think the best way to check out any &#8220;style analysis&#8221; machine is to feed it with modern literature whose authorship is known, and see whether the software is able to detect real differences between authors and literary movements or not. Has it already been done?<br />
Besides, I do think it has some usefulness. There&#8217;s no question about that. The problem could be just the same: it is a useful, utilitarist-focused software. Like any other method of analysis. Maybe not suitable for literary studies, but very useful for statistical purposes, indexes, etc. which are themselves a tool for literary criticism.<br />
The fact that this is not the only way or the only approach does&#8217;nt make it useless.<br />
Thank you all.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Oliver Hellwig</title>
		<link>http://www.danielstender.com/granthinam/704/comment-page-1/#comment-546</link>
		<dc:creator>Oliver Hellwig</dc:creator>
		<pubDate>Fri, 26 Feb 2010 10:03:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.danielstender.com/granthinam/?p=704#comment-546</guid>
		<description>I may add some comments to the ongoing discussion:

(1) I completely agree with Daniel: tagged texts offer completely new opportunities when compared with simple e-texts because methods implemented in GREP and other programs can now be applied on a lexical/semantic level and not only on the phonetic one.
(2) Of course, the computer will not find out anything new &quot;on its own&quot;. You have to tell the computer which pattern you are interested in - but as soon as you have formulated this question, it capacities are clearly superior. Basically, this is not different from what every scholar is doing: Formalize ideas, develop (philological, mathematical, statistical, ...) methods to prove them, and then delegate the unpleasant part of the work to some other instance = the computer!
(3) As I mentioned on the website of the DCS, the corpus is only a rather small extract from the data collected in the SanskritTagger database. In addition, it does not have any of the functionalities of the tagging program (except from searching for single words). So, one should distinguish between the view of the data = DCS and the underlying tagging program.
(4) A final point: The data presented in DCS are not &quot;computer-generated&quot; as Michael suggests. Instead, the computer only proposes solutions, and I select the most appropriate one according to my understanding of the passage. Of course, errors occur, especially depending on the time of the day! And some composites may be analyzed in a different way by other scholars. However, I would like to emphasize that the basic data of DCS are checked by a human philologist and not only processed by a tagging algorithm.

Best, Oliver</description>
		<content:encoded><![CDATA[<p>I may add some comments to the ongoing discussion:</p>
<p>(1) I completely agree with Daniel: tagged texts offer completely new opportunities when compared with simple e-texts because methods implemented in GREP and other programs can now be applied on a lexical/semantic level and not only on the phonetic one.<br />
(2) Of course, the computer will not find out anything new &#8220;on its own&#8221;. You have to tell the computer which pattern you are interested in &#8211; but as soon as you have formulated this question, it capacities are clearly superior. Basically, this is not different from what every scholar is doing: Formalize ideas, develop (philological, mathematical, statistical, &#8230;) methods to prove them, and then delegate the unpleasant part of the work to some other instance = the computer!<br />
(3) As I mentioned on the website of the DCS, the corpus is only a rather small extract from the data collected in the SanskritTagger database. In addition, it does not have any of the functionalities of the tagging program (except from searching for single words). So, one should distinguish between the view of the data = DCS and the underlying tagging program.<br />
(4) A final point: The data presented in DCS are not &#8220;computer-generated&#8221; as Michael suggests. Instead, the computer only proposes solutions, and I select the most appropriate one according to my understanding of the passage. Of course, errors occur, especially depending on the time of the day! And some composites may be analyzed in a different way by other scholars. However, I would like to emphasize that the basic data of DCS are checked by a human philologist and not only processed by a tagging algorithm.</p>
<p>Best, Oliver</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Slouber</title>
		<link>http://www.danielstender.com/granthinam/704/comment-page-1/#comment-540</link>
		<dc:creator>Michael Slouber</dc:creator>
		<pubDate>Tue, 23 Feb 2010 08:37:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.danielstender.com/granthinam/?p=704#comment-540</guid>
		<description>Thanks for your reply.  I am glad you brought this topic up and are welcome to discussion, because I haven&#039;t seen many people really talking about this.  I admit that I haven&#039;t read any of the basic sources you mentioned either; I have tried reading some of this type of thing and I get bogged down by the jargon.  I&#039;m happy we agree about the problems of author fingerprinting. I see what you mean now about those specific types of syntactical inquiries, and you are right about that.  I still have to disagree about the tagger &quot;reading&quot;, but perhaps this is just a semantic issue of what it means to read something.  To me reading means understanding and reflecting, comparing to a life of experience.  For example, I read a love poem and my understanding of the sense the author is trying to convey is filtered through my own experiences of love or loss.  I know that the tagger can be fed a word and based on a database of past human choices make a &quot;decision&quot; about whether it&#039;s a compound and what the sandhi is, and so on.  I don&#039;t consider that reading.  I also worry about it, because many times there is a difficult decision about how to split a compound or dissolve a sandhi and either option gives a feasible but different result.  I wouldn&#039;t want the future Sanskritists to trust the computer-generated decision more than their own critical thinking skills, or worse, let their own skills atrophy because of reliance on the computer.</description>
		<content:encoded><![CDATA[<p>Thanks for your reply.  I am glad you brought this topic up and are welcome to discussion, because I haven&#8217;t seen many people really talking about this.  I admit that I haven&#8217;t read any of the basic sources you mentioned either; I have tried reading some of this type of thing and I get bogged down by the jargon.  I&#8217;m happy we agree about the problems of author fingerprinting. I see what you mean now about those specific types of syntactical inquiries, and you are right about that.  I still have to disagree about the tagger &#8220;reading&#8221;, but perhaps this is just a semantic issue of what it means to read something.  To me reading means understanding and reflecting, comparing to a life of experience.  For example, I read a love poem and my understanding of the sense the author is trying to convey is filtered through my own experiences of love or loss.  I know that the tagger can be fed a word and based on a database of past human choices make a &#8220;decision&#8221; about whether it&#8217;s a compound and what the sandhi is, and so on.  I don&#8217;t consider that reading.  I also worry about it, because many times there is a difficult decision about how to split a compound or dissolve a sandhi and either option gives a feasible but different result.  I wouldn&#8217;t want the future Sanskritists to trust the computer-generated decision more than their own critical thinking skills, or worse, let their own skills atrophy because of reliance on the computer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Stender</title>
		<link>http://www.danielstender.com/granthinam/704/comment-page-1/#comment-539</link>
		<dc:creator>Daniel Stender</dc:creator>
		<pubDate>Tue, 23 Feb 2010 08:18:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.danielstender.com/granthinam/?p=704#comment-539</guid>
		<description>A bucket full of critique not a single of my opinion lasts, I haven&#039;t got if it&#039;s aiming me or the tagger (thanks for the disclaimer at the end) but I am glad to invoke a discussion which is the mother of science.

My replies:

(1) The Tagger *does* read Sanskrit though on a very basic level

(2) Yes we can do syntactical inquiries with grep even without but we couldn&#039;t do things like &quot;give me all sentences where the adjective precedes its referent with 2-3 words in between&quot;  - that&#039;s the difference between a tagged text and an etext - maybe I would had to explain that a little bit clearer

(3) I know that the tagger and software like this is working on the &quot;edition level&quot; which is a little bit problematic, but I think it&#039;s open for alterations when the text improves. But projects like this naturally are working with a &quot;simple concept&quot; of what is the text (haha really a Hamburg style discussion!).

(4) Towards the &quot;author style fingerprint&quot; which I found you&#039;re arguments are solid here: if we are going to probe into a direction like this someday I know we have to be very careful. I wouldn&#039;t be feeling good, you are right, if we are going to leave everything to the computer which can count&#039;n&#039;calculate like nothing but indeed is stupid in a certain way. The computer couldn&#039;t been left to judge too freely because very quick, you are right, we are getting into simple &quot;garbage in - garbage out&quot; situations and other loops like this (like it was the case in the last attempt to decipher the/establish an Indus culture script, Rao et. al.: Entropic evidence for linguistic structure in the Indus script, doi 10.1126/science.1170391). Inquiries like this are great for discussion but basically without substance. No I just mend developments could lead into a situation in which we would be able to get *additional data* for reasoning. I haven&#039;t mend (and haven&#039;t stated nowhere) that we should work to someday give up to the computer totally (which o.k. maybe a too overwhelmed random catch phrase like &quot;sky&#039;s the limit&quot; in the light of today&#039;s one-dimensional &quot;emulate the human&quot; computational linguistics implied in an too uncritical manner).

Thanks for your annotations, Michael!
</description>
		<content:encoded><![CDATA[<p>A bucket full of critique not a single of my opinion lasts, I haven&#8217;t got if it&#8217;s aiming me or the tagger (thanks for the disclaimer at the end) but I am glad to invoke a discussion which is the mother of science.</p>
<p>My replies:</p>
<p>(1) The Tagger *does* read Sanskrit though on a very basic level</p>
<p>(2) Yes we can do syntactical inquiries with grep even without but we couldn&#8217;t do things like &#8220;give me all sentences where the adjective precedes its referent with 2-3 words in between&#8221;  &#8211; that&#8217;s the difference between a tagged text and an etext &#8211; maybe I would had to explain that a little bit clearer</p>
<p>(3) I know that the tagger and software like this is working on the &#8220;edition level&#8221; which is a little bit problematic, but I think it&#8217;s open for alterations when the text improves. But projects like this naturally are working with a &#8220;simple concept&#8221; of what is the text (haha really a Hamburg style discussion!).</p>
<p>(4) Towards the &#8220;author style fingerprint&#8221; which I found you&#8217;re arguments are solid here: if we are going to probe into a direction like this someday I know we have to be very careful. I wouldn&#8217;t be feeling good, you are right, if we are going to leave everything to the computer which can count&#8217;n'calculate like nothing but indeed is stupid in a certain way. The computer couldn&#8217;t been left to judge too freely because very quick, you are right, we are getting into simple &#8220;garbage in &#8211; garbage out&#8221; situations and other loops like this (like it was the case in the last attempt to decipher the/establish an Indus culture script, Rao et. al.: Entropic evidence for linguistic structure in the Indus script, doi 10.1126/science.1170391). Inquiries like this are great for discussion but basically without substance. No I just mend developments could lead into a situation in which we would be able to get *additional data* for reasoning. I haven&#8217;t mend (and haven&#8217;t stated nowhere) that we should work to someday give up to the computer totally (which o.k. maybe a too overwhelmed random catch phrase like &#8220;sky&#8217;s the limit&#8221; in the light of today&#8217;s one-dimensional &#8220;emulate the human&#8221; computational linguistics implied in an too uncritical manner).</p>
<p>Thanks for your annotations, Michael!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Slouber</title>
		<link>http://www.danielstender.com/granthinam/704/comment-page-1/#comment-537</link>
		<dc:creator>Michael Slouber</dc:creator>
		<pubDate>Mon, 22 Feb 2010 21:34:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.danielstender.com/granthinam/?p=704#comment-537</guid>
		<description>I have to say that I am skeptical about the scope of its usefulness.  &quot;The sky is the limit&quot; has been said to me before when I asked the director of another project like this what specific uses it will have.  It seems to me that that is not a very convincing answer.  As far as I can see, you&#039;ve given us two potential uses on the horizon- syntactical inquiries and &quot;generated author style fingerprints.&quot;  We can do syntactical inquiries quite easily with GREP searching.  I&#039;m not sure exactly what is meant by the author-style fingerprint, but I think you mean the computer would be able to tell you whether a given piece of text is authorial or interpolated.  For that I am extremely skeptical, for many reasons.  One, and this is fundamental in my mind, is that we need to stop saying that the computer can read or the computer can judge.  The computer only does what it is designed to do within the limits created for it.  This matters for what you said because a human will have to decide when a given set of characteristics equals the text of an author, right?  Only when that is fed to the program will it apply it to the text you feed it.  Now how is one going to decide what criteria to use to determine authorship?  I think it is safe to say that any given Sanskrit text from the classical period is the work of many hands copying, corrupting, correcting over and over and over again down the centuries.  How is that going to be untangled?  You can&#039;t just say the computer will do it, because again the computer only does what you tell it to do.    

(not offense intended, Dan, I&#039;m just venting my feelings about the topic!)</description>
		<content:encoded><![CDATA[<p>I have to say that I am skeptical about the scope of its usefulness.  &#8220;The sky is the limit&#8221; has been said to me before when I asked the director of another project like this what specific uses it will have.  It seems to me that that is not a very convincing answer.  As far as I can see, you&#8217;ve given us two potential uses on the horizon- syntactical inquiries and &#8220;generated author style fingerprints.&#8221;  We can do syntactical inquiries quite easily with GREP searching.  I&#8217;m not sure exactly what is meant by the author-style fingerprint, but I think you mean the computer would be able to tell you whether a given piece of text is authorial or interpolated.  For that I am extremely skeptical, for many reasons.  One, and this is fundamental in my mind, is that we need to stop saying that the computer can read or the computer can judge.  The computer only does what it is designed to do within the limits created for it.  This matters for what you said because a human will have to decide when a given set of characteristics equals the text of an author, right?  Only when that is fed to the program will it apply it to the text you feed it.  Now how is one going to decide what criteria to use to determine authorship?  I think it is safe to say that any given Sanskrit text from the classical period is the work of many hands copying, corrupting, correcting over and over and over again down the centuries.  How is that going to be untangled?  You can&#8217;t just say the computer will do it, because again the computer only does what you tell it to do.    </p>
<p>(not offense intended, Dan, I&#8217;m just venting my feelings about the topic!)</p>
]]></content:encoded>
	</item>
</channel>
</rss>

