अनिल एकलव्य ⇔ Anil Eklavya

April 16, 2009

Accepted, but not Published

Academicians or researchers list their publications prominently on their home pages. After all, it is supposed to represent the best of their work. They also quite often (especially those who have a large number of publications) categorize them according to some criteria like the venue (workshop, conference, journal or book: in the reverse order of prominence) or peer review (unrefereed and refereed).

In this post we propose that there should be a new category of publications. This category is needed because a lot of researchers (for good or for bad) now come from underprivileged countries. For most of these researchers, traveling abroad to attend a conference, even if their paper has been accepted, is something very hard to do. In some sense even more than getting a paper accepted, which is relatively harder too, given the lack of certain privileges — whether you like the word or not — generous research grants, infrastructure, language resources etc., combined with the prejudice (it is there: I am not inventing it, whoever might be blamed for it). To these problems can be added the problem of compulsory attendance at a conference or a workshop. It is partly these conditions which have prompted suggestions from certain quarters that researchers from these countries should concentrate on journal papers (never mind the delay and difficulties involved or the unfairness of the proposition, even though it has some practical justification).

But you can never be sure while submitting that you certainly won’t be able to attend. Also, hope is said to be a good thing. Therefore, the event of a researcher submitting a paper and hoping to attend but not being able to attend cannot be ruled out.

This bring us to the proposal mentioned earlier. One solution to this problem is that there should be another category of papers: accepted but not published, because the author couldn’t afford to attend the conference or the workshop. (By the way, workshops are the most happening places nowadays: more on that later).

The author of this post must know because he has authored more than one such publications.

Of course, the condition will be that if and when such a paper is resubmitted (with or without modifications, but without any substantial new work), accepted again and finally published, the entry marked as ‘accepted’ should be removed and replaced by an entry marked as ‘published’.

After all, if we are serious about research, then the work (which has been peer reviewed and accepted) should be given somewhat more importance than some pages printed in some proceedings (or attendance in a conference for that matter).

This, of course, doesn’t mean that you can get basically the same thing published (or accepted) in more than one places.

(Sorry for the Gory Details)

P.S.: May be there is no need for the above apology as the depiction of the Gory Details of the Indian Reality is now getting multiple Oscars (The Academy Awards: the keyword is Academy). But may be there is because some researchers have a more (metaphorically) delicate constitution which can be hurt by the Gory Details.

Queen’s P.S.: Off with his head!

February 22, 2009

बाल की खाल

ज्ञान-विज्ञान के विकास में लगे
अति-विशेषज्ञ का काम है
बाल की खाल निकालना
इसके बहुत से लाभ हो सकते हैं
लेकिन तभी तक
जब तक खाल निकाल कर
बाल के अंदर की कोशिका के
अध्ययन में डूबे हुए
यह न भुला दिया जाए
कि इसी बाल में ऐसी
अनेकों कोशिकाएँ हैं
कि इन कोशिकाओं के ऊपर
खाल भी थी
जो निकाल दी गई
और जिसको मिला कर ही
एक पूरा बाल बनता है
कि ऐसे लाखों बालों की जड़
एक सिर पर स्थित है
और यह सिर
कई और अंगों के साथ मिलाकर
एक शरीर बनाता है
और ऐसे अरबों शरीर मौजूद हैं
यही नहीं, तरह-तरह के अन्य शरीर भी हैं
जिनमें से प्रत्येक
बड़ी संख्या में
(लुप्त होती प्रजातियों के अलावा)
पाये जा सकते हैं

ये सभी शरीर
एक बड़े-से (या छोटे-से) गोले पर रहते हैं
जिस पर शरीरों के अतिरिक्त भी बहुत कुछ है
और ऐसे अनगिनत गोले
इधर-उधर चक्कर लगाते फिर रहे हैं
इनमें से बहुतों पर
शरीर हो सकते हैं
जिन पर सिर हो सकते हैं
सिरों पर बाल हो सकते हैं
बालों पर (खाल निकालने के बाद)
कोशिकाएँ भी मिल सकती हैं
जो शायद वैसी ही हों
जैसी का अध्ययन किया जा रहा है
या शायद ना भी हों

बाल के अंदर की कोशिका के
अध्ययन में डूब कर
सब कुछ भुला देने की
ग़लती न करना तो ठीक है
लेकिन यह भुलाना भी
खतरे से ख़ाली नहीं है
कि जिस अनगिनत गोलों के
ब्रह्मांड के बारे में
बात की जा रही है
उसमें से कुछ पर ही
शरीर पाये जाते हैं
जिनके सिर
हो भी सकते हैं, नहीं भी
और सिर पर बाल (यदि हों तो)
उनके अंदर सूक्ष्म कोशिकाएँ
मिल सकती हैं
जिनके अध्ययन से
ऐसे निष्कर्ष निकल सकते हैं
जो ब्रह्मांड (या उसके कुछ भाग)
के बारे में दिए जा रहे
निर्णयों-फ़तवों को
ग़लत साबित कर सकते हैं

 

[1997 या 1998]

October 28, 2008

सांगणिक भाषाविज्ञान

जैसा मैंने पिछली प्रविष्टी (‘पोस्ट’ के लिए यह शब्द इस्तेमाल हो सकता है?) में लिखा था, अगले कुछ हफ्तों में मैं संचय के बारे में लिखने जा रहा हूं।

लेकिन क्योंकि संचय खास तौर पर (आम उपयोक्ताओं के अलावा) सांगणिक भाषाविज्ञान या भाषाविज्ञान के शोधकर्ताओं के लिए बनाया गया है, इस बात को साफ कर देना ठीक रहेगा कि सांगणिक भाषाविज्ञान या भाषाविज्ञान के माने क्या है, या अगर आप इनके माने जानते ही हैं तब भी इनसे मेरा अभिप्राय क्या है। यह दूसरी बात इसलिए कि इन विषयों (सांगणिक भाषाविज्ञान या भाषाविज्ञान) के अर्थ के बारे में आम लोगों में तो तमाम तरह की ग़लतफ़हमियाँ हैं ही, पर इन विषयों के शोधकर्ताओं में भी इनकी परिभाषा पर एक राय नहीं है।

सच तो यह है कि हिंदी जगत में तो अब भी अधिकतर लोग भाषाविज्ञान का अर्थ उस तरह के अध्ययन से लगाते हैं जो पिछली सदी के शुरू में लगाया जाता था। लेकिन बहस की इस दिशा में अभी मैं नहीं जाना चाहूंगा क्योंकि इसके बारे में कहने को इतना अधिक है कि अभी जो उद्देश्य है वो पीछे ही रह जाएगा।

वैसे सांगणिक भाषाविज्ञान या भाषाविज्ञान की परिभाषा या उनकी सीमाओं के बारे में भी कहने को बहुत-बहुत कुछ है, पर फिलहाल थोड़े से ही काम चलाया जा सकता है।

तो छोटे में कहा जाए तो भाषाविज्ञान शोध या अध्ययन का वह विषय है जिसमें किसी एक भाषा के व्याकरण का ही अध्ययन नहीं किया जाता बल्कि नैसर्गिक या मानुषिक (यानी कृत्रिम नहीं) भाषा का वैज्ञानिक रूप से अध्ययन किया जाता है। अब यह धारणा व्यापक रूप से स्वीकृत है कि मानव मस्तिष्क की संरचना का भाषा की संरचना से सीधा संबंध है और क्योंकि सभी मानवों के मस्तिष्क की संरचना मूलतः एक ही जैसी है, तो सभी नैसर्गिक या मानुषिक भाषाओं में भी सतही लक्षणों को छोड़ कर बाकी सब एक ही जैसा है। इसीलिए, जैसा कि इन विषयों के आधुनिक साहित्य में प्रसिद्ध है, अगर किसी अमरीकी के शिशु को जन्म के तुरंत बाद कोई चीनी परिवार गोद ले ले और वह बच्चा चीन में ही पले तो वह उतनी आसानी से चीनी बोलना सीखेगा जितनी आसानी से कोई चीनी परिवार का बच्चा। ऐसी ढेर सारी और बातें हैं, पर मुख्य बात है कि भाषाविज्ञान नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन है।

कम से कम कोशिश तो यही है कि अध्ययन वैज्ञानिक रहे, पर वो वास्तव में रह पाता है या नहीं, यह बहस का विषय है।

अब सांगणिक भाषाविज्ञान पर आएं तो इस विषय में हमारा ध्यान मानवों की बजाय संगणक यानी कंप्यूटर पर आ जाता है, पर पिछली शर्त फिर भी लागू रहती है: नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन। अंतर यह है कि हमारा उद्देश्य अब यह हो जाता है कि कंप्यूटर को इस लायक बनाया जा सके कि वो नैसर्गिक या मानुषिक भाषा को समझ सके और उसका प्रयोग कर सके। जाहिर है यह अभी बहुत दूर की बात है और इसमें कोई आश्चर्य भी नहीं होना चाहिए क्योंकि अभी भाषाविज्ञान में ही (पिछली सदी की असाधारण उपलब्धियों के बाद भी) वैज्ञानिक ढेर सारी बाधाओं में फंसे हैं।

फिर भी, सांगणिक भाषाविज्ञान में काफ़ी कुछ संभव हो चुका है और काफ़ी कुछ आगे (निकट भविष्य में) संभव हो सकता है। लेकिन इसमें कंप्यूटर का मानव जैसे भाषा बोलना-समझना शामिल नहीं है। जो शामिल है वो हैं ऐसी तकनीक जो दस्तावेजों को ज़्यादा अच्छी तरह ढूंढ सकें, उनका सारांश बना सकें, कुछ हद तक उनका अनुवाद कर सकें आदि।

लेकिन हिंदुस्तानी परिप्रेक्ष्य में परेशानी यह है कि हम अभी इस हालत में भी नहीं पहुंचे हैं कि आसानी से कंप्यूटर का एक बेहतर टाइपराइटर की तरह ही उपयोग कर सकें। इस दिशा में कुछ उपलब्धियाँ हुई हैं, पर अंग्रेज़ी या प्रमुख यूरोपीय भाषाओं की तुलना में हम कहीं भी नहीं हैं। जैसा कि आपमें से अधिकतर जानते ही हैं, यह एक लंबी कहानी है जिसे अभी छोड़ देना ही ठीक है।

पर संचय का विकास इसी परिप्रेक्ष्य में किया गया है, जिसके बारे में आगे बात करेंगे।

June 27, 2008

Evolution Doesn’t Like Music

 

 

But we do.

 

 

May 3, 2008

Evolution Doesn’t Have Nuclear Weapons

 

 

But we do.

 

 

April 13, 2008

Two Laws of Reviewing

After a few years in research, I have discovered two laws which the process of reviewing (of research papers) follows. Not very original, but here they are:

  1. You can always find some reasons for accepting any paper.
  2. You can always find some reasons for rejecting any paper.

April 11, 2008

Patent Madness

So we have one more reason in support for the idea that patents are a bad idea. The latest is the news that a company called Digital Reasoning has been awarded a patent on what looks like contextual similarity. What the ‘news report’ says includes:


This breakthrough patent grants broad protection for how artificial intelligence, including neural networks, genetic algorithms, and vector space models can be used to learn the meanings of symbols – such as words, categories, or numerical values. Understanding the subtle meaning of terms in context has been one of the “Holy Grails” of artificial intelligence. Not only is Digital Reasoning® fully able to accomplish this feat, it is now patented.

Here is one comment about this:

Anyone from the ACL/ML/AI community can immediately recognize this and start citing their favorite papers on these topics starting from at least a decade ago. A promotional video from the company on YouTube can be found here. Excerpt from the video: “… We treat the text representation of human language as a signal … “.

I think everyone should stop taking patents seriously. Wishful thinking?

Here is another:

Do the people ‘in-charge’ have any clue about the previous/current reseach done in the related field? How can they accept such stuff? Doesn’t make any sense, whatsoever.

But then they had accepted patents on haldi, neem and basmati. I am worried about jal jeera and pani poori.

Also, ganne ka ras.

Madness.

No need for me to say more as so many others have already talked about this:

In August last year there was a news item about Yoga devices being patented in the US. Small mercy that the Government of India succeeded in cautioning the U.S. Government against granting patents to Yoga postures (asanas).

There was a time (in India) when patents were awarded on processes, not products. That meant that even if some company had patented a method for producing a particular medicine, someone else could come along and find a better way and sell the medicine cheaper. Now, since the patents are granted on products, under orders from the empire that rules the world, that kind of thing can’t happen.

It can a be matter of life and death for millions of people.

I look forward to the day when self-respecting researchers won’t proudly list the patents they have been able to obtain.

Patents are among the most evil inventions of humankind.

March 26, 2008

Evolution Doesn’t Have a Conscience

 

 

But we do.

 

 

March 23, 2008

Mythical Pretensions of Originality (1)

[Disclaimer: This is not a scientific article. It is based on partly objective and partly subjective, but in any case sincere, analysis of the author’s knowledge of and experience in the world of research. No empirical evidence is presented as, in the author’s belief, enough empirical evidence can be presented about this topic to prove whatever you want. This is just a request to look at research honestly and sincerely without self-deception and pretensions.]

There is a very old and much discussed question which has been bothering me for a long time. Like in many other cases, so far I avoided writing about this because:

  1. I didn’t want to repeat things which have already been said.
  2. To say something new on this topic requires a lot of leisure, which I don’t have.
  3. The problem with saying something new about this is the topic itself.
    • What is original and what is not?
    • What is innovation and what is not?
    • What is creativity and what is not?
    • Is there anything in this world which is really original?

But, again like many other things, I have been provoked enough to write this post. I will try to do my best. As much as can be done in a single blog post.

What is the provocation? The provocation is the intensely irritating pretensions of originality from ‘researchers’ who have happened to review my or some others’ papers. They write as if every paper selected in every conference, journal and workshop is a completely original work. This, frankly, has started to get on my nerves. Because I know very well that this is simply not true.

The truth is not that every paper selected in every conference, journal and workshop is worthless or mere repetition of old things. The truth, as usual, lies somewhere between these two extremes.

However, I am quite sure that it lies much nearer to the second extreme than to the first. Even for the top ‘first class’ conferences and journals.

To quote from the article How to do Research At the MIT AI Lab, 1998 by David Chapman (Editor):

At some point you’ll start going to scientific conferences. When you do, you will discover the fact that almost all the papers presented at any conference are boring or silly. (There are interesting reasons for this that aren’t relevant here.)

I will go on to say that most of them have hardly any originality (that’s partly why they are boring). If you have sufficient resources, you can almost follow a recipe to write a paper which will get selected at a conference, workshop or journal. And this is exactly what is done. And it works too. One of the reasons is that it is easier this way for the reviewers. They don’t have to think hard about the originality of the paper. Because, of course, it is very hard to decide whether something shrewdly written and well presented is original or not. Quite often there may not be a clear-cut answer at all.

One of the essential elements of the the most popular recipe is to work on problems which are currently in fashion and do some experiments, any experiments, on that problem and present the results. If you practice enough, it can hardly go wrong. That’s how a great number of papers get published. No originality needed. Just be fast enough to do the experiments (which someone else would anyway have done in the near future) and write a paper. It’s somewhat like buying stock. Beat others by being the first to buy the stock as soon as it comes out. You just have to know how to fill up the form and complete the transactions. This applies even more to top conferences than to workshops.

If you think I am talking nonsense, I would request you refresh your Chomsky (in case you are a linguist) or refresh your Jurafsky-Martin (in case you are, as the term goes, an NLPer or a computational linguist).

If you do the above carefully, you will find that almost all the elements of Chomskian Linguistics can be traced back to some linguist, writer, philosopher or thinker of the past. (By the way, this applies to the ‘Theory of Evolution’ too). Similarly, you will find in Jurafsky-Martin that almost every discovery has been made by more than one scientist or thinker, including this one.

And if you go back to the top conference and journal papers, you will again find that most of the papers don’t really have anything really new to say.

So do I mean that all research is nonsense and useless? Certainly not. Why would I be in research if that was so? What’s the catch? The catch is that the emphasis on originality is highly misplaced.

What I am saying certainly doesn’t imply that there is nothing ‘original’ in the Chomskian Linguistics. But it does probably mean that we are looking for originality in the wrong place. I hope some day I will be able to say this with more clarity and preciseness.

But we would definitely be much better off if we dropped the mythical pretensions of the originality of every published paper. Originality is just one of the goals of research. Most of the research is routine research. Incremental research. That doesn’t make it useless. Really original papers can be expected only once in a long while. The rest should be seen as attempts to advance the state of the art marginally. Without much originality. Most of research is plain hard work. Rigorous work. Results of experiments which by themselves do not really matter much, but a small fraction of them could, just could, provide some insight for someone else to come up with something which is ‘original’. This (at best) is the purpose which more than 99 percent of the published papers serve and we better realize this instead of indulging in rampant self-deception about originality.

Coming to NLP and CL or even Linguistics, it is even more important to realize and accept the above mentioned fact. The reason is that research in these disciplines depends to a great extent on creation of resources (language resources as well as tools) which may not be very ‘original’ in nature as the word is usually understood. A lot of papers should and do report just the development of these resources and they are published. The trouble is that everyone is forced to create a false facade of originality and creativity which is not really there. You have to falsely claim the worth of your papers in terms of originality and ‘novelty’ when actually the worth is just in plain hard work. But if you don’t put up that facade, you are out.

Have you considered the fact that a lot of the Great Discoveries were accidental discoveries? Was there so much originality in those discoveries? I don’t know. It may sound cliched, but it does depend on how you define originality. Perhaps the better way is to emphasize less on (true or false or anything in between) originality and more on usefulness. At least in disciplines like NLP and CL where, if you ask most researchers, they won’t even be able to give a coherent answer about what exactly they are trying to achieve through their research. And where we don’t even know for sure whether there is anything really scientific to be achieved. Even after the great linguistic revolution, we hardly know anything about language that can be termed as scientific as the laws of Physics or the theorems of Mathematics. At most we can say that we are trying to build machines which can give better practical results. We need a LOT of hard work and only a little bit of originality. And this originality, like in other disciplines, is hard to come by.

I, for one, am not going to insist on a facade of ‘originality’ for the description of the hard work to be accepted for publication. Of course, there should not be verbatim repetition, but I don’t have any illusions about the originality of papers published anywhere. Further, I am going to prefer papers describing intelligent hard work over almost worthless but seemingly innovative cooked-to-recipe papers.

May be this is an empty declaration because I may not get to be in a position to insist or not to insist, but I can still make the statement at least.

It is my informal personal blog after all. I can afford to be as honest and direct here as I want.

That doesn’t mean I am not aware of the possible consequences.

February 29, 2008

English is Language Independent

It’s the Global Language, right? So how can it be language dependent? You propose a theory based on English. It has to apply to all languages. You propose a Natural Language Processing (NLP) or Computational Linguistics (CL) technique for a particular problem. For English. It applies to all languages. You build a software for some purpose. For English. It has to be useful for all languages. You build a dictionary…

Never mind.

But the vice versa is not true. You propose a theory based on Hindi. It is language specific. It doesn’t count for much. You propose an NLP technique for a particular problem. For Hindi. It is language specific. It doesn’t count for much. You build a software for some purpose. For Hindi. It is language specific. It doesn’t count for much.

That’s how it works in practice, if not theory. Or may be even in theory, with some help from the (very valid) idea of Universal Grammar (except that the UG may be the UG of English).

Even today I have got a review of a paper on a problem which is like one of the holy grails of NLP or CL. One of the comments is that the approach has been evaluated on Hindi so it can’t be compared to other techniques that already exist. True. But what is the number of papers published in the ‘first class’ NLP/CL conferences and journals in which the approach has been tried only on English? Doesn’t matter, because English is language independent. If you only evaluate your technique on English, that’s OK. But if you evaluate on only Hindi, that’s not acceptable. Because Hindi is language specific.

We know this very well in India. The Elite talks about (Indian) literature. And sometimes the Elite magnanimously (or dismissively) talks about (Indian) literature in languages. The first, of course, refers to literature in English. The second refers to literature in other languages. Indian languages.

The Elite talks of media. And the Elite (rarely and mostly negatively) talks of language media.

Hindi is a language. English is not a language.

Pardon me.

Hindi is a language. English is the language.

English is above being merely a language.

That’s why all the work done in English is language independent. Not just research. Not just in NLP/CL. Anything. Movies, literature, music.

I am guilty of the sin of indulging too much in mere languages. I should be working mostly on English. Not just writing blog posts in English. Sometimes, of course, I can bestow a bit of my attention on languages. Like Hindi.

But I won’t do that. I will do the opposite. I am incurable.

Next Page »

Create a free website or blog at WordPress.com.