अनिल एकलव्य ⇔ Anil Eklavya

April 16, 2009

Accepted, but not Published

Academicians or researchers list their publications prominently on their home pages. After all, it is supposed to represent the best of their work. They also quite often (especially those who have a large number of publications) categorize them according to some criteria like the venue (workshop, conference, journal or book: in the reverse order of prominence) or peer review (unrefereed and refereed).

In this post we propose that there should be a new category of publications. This category is needed because a lot of researchers (for good or for bad) now come from underprivileged countries. For most of these researchers, traveling abroad to attend a conference, even if their paper has been accepted, is something very hard to do. In some sense even more than getting a paper accepted, which is relatively harder too, given the lack of certain privileges — whether you like the word or not — generous research grants, infrastructure, language resources etc., combined with the prejudice (it is there: I am not inventing it, whoever might be blamed for it). To these problems can be added the problem of compulsory attendance at a conference or a workshop. It is partly these conditions which have prompted suggestions from certain quarters that researchers from these countries should concentrate on journal papers (never mind the delay and difficulties involved or the unfairness of the proposition, even though it has some practical justification).

But you can never be sure while submitting that you certainly won’t be able to attend. Also, hope is said to be a good thing. Therefore, the event of a researcher submitting a paper and hoping to attend but not being able to attend cannot be ruled out.

This bring us to the proposal mentioned earlier. One solution to this problem is that there should be another category of papers: accepted but not published, because the author couldn’t afford to attend the conference or the workshop. (By the way, workshops are the most happening places nowadays: more on that later).

The author of this post must know because he has authored more than one such publications.

Of course, the condition will be that if and when such a paper is resubmitted (with or without modifications, but without any substantial new work), accepted again and finally published, the entry marked as ‘accepted’ should be removed and replaced by an entry marked as ‘published’.

After all, if we are serious about research, then the work (which has been peer reviewed and accepted) should be given somewhat more importance than some pages printed in some proceedings (or attendance in a conference for that matter).

This, of course, doesn’t mean that you can get basically the same thing published (or accepted) in more than one places.

(Sorry for the Gory Details)

P.S.: May be there is no need for the above apology as the depiction of the Gory Details of the Indian Reality is now getting multiple Oscars (The Academy Awards: the keyword is Academy). But may be there is because some researchers have a more (metaphorically) delicate constitution which can be hurt by the Gory Details.

Queen’s P.S.: Off with his head!

October 5, 2008

Good News and Bad News on the CL Front

First, as the saying goes, the bad news. We had submitted a proposal for the Second Workshop on NLP for Less Privileged Languages for the ACL-affiliated conferences. That proposal has not been accepted. Total proposals submitted were 41 and 34 out of them were accepted. Ours was among the not-accepted seven (euphemisms can be consoling).

Was is that bad? I hope not.

Don’t those capital letters look silly in the name of a rejected proposal?

Now the good news. The long awaited new version of Sanchay has been released on Sourceforge. (Well, at least I was awaiting). This version has been named (or numbered?) 0.3.0.

The new Sanchay is a significant improvement over the last public version (0.2). It now has one main GUI from which all the applications can be controlled. There are twelve (GUI based) applications which have been included in this version. These are:

  • Sanchay Text Editor that is connected to some other NLP/CL components of Sanchay.
  • Table Editor with all the usual facilities.
  • A more intelligent Find-Replace-Extract Tool (can search over annotated data and allows you to see the matching files in the annotation interface).
  • Word List Builder.
  • Word List FST (Finite State Transducer) Visualizer that can be useful for anyone working with morphological analysis etc.
  • One of the most accurate Language and Encoding Identifier that is currently trained for 54 langauge-encoding pairs, including most of the major Indian languages. (Yes, I know there is a number agreement problem in the previous sentence).
  • A user friendly Syntactic Annotation Interface that is perhaps the most heavily used part of Sanchay till now. Hopefully there will be an even more user friendly version soon.
  • A Parallel Corpus Annotation Interface, which is another heavily used component. (Don’t take that ‘heavily’ too seriously).
  • An N-gram Language Modeling Tool that allows you to compile models in terms of bytes, letters and words.
  • A Discourse Annotation Interface that is yet to be actually used.
  • A more intelligent File Splitter.
  • An Automatic Annotation tool for POS (Part Of Speech) tagging, chunking and Named Entity Recognition. The first two should work reasonably well, but the last one may not be that useful for practical purposes. This is a CRF (Conditional Random Fields) based tool and it has been trained for Hindi for these three purposes. If you have annotated data, you can use it to train your own taggers and chunkers.

All these components use the customizable language-encoding support, especially useful for South Asian languages, that doesn’t need any support from the operating system or even the installation of any fonts, although these can still be used inside Sanchay if they are there.

More information is available at the Sanchay Home.

The capitals don’t look so bad for a released version.

The downside of even this good news is that my other urgent (to me) work has got delayed as I was working almost exclusively on bringing out this version for the last two weeks or so.

But then you need a reason to wake up and Sanchay is one of my reasons. And I can proudly say that a half-hearted attempt to generate funding for this project by posting it on Micropledge has generated 0$.

Sanchay is still alive as a single parent child without any welfare but with a lot of responsibilities.

Now I can have nightmares about the bugs.

March 23, 2008

An Example of Gory Details

I have been familiar with the phrase ‘gory details’, as anyone has been who has read newspapers or watched TV.

However, today I saw this phrase with a completely new meaning. It was quite a revelation. This is how it goes:

Even if you have severe constraints on resources due to funding (I sympathize…), I recommend not discussing them in quite as gory detail as you do. A very brief mention of the amount of effort invested to date is sufficient.

Gee, thanks for the sympathy. Now I will be able to run my next project on this great resource.

And these are the gory details (complete and unabridged) to which the above quote refers:

Since x has so far mostly been the result of individual effort and it is a non-funded project being undertaken on part-time basis, there were the most stringent resource (financial, temporal, etc.) constraints.

(Only the names have been changed).

Quite a lesson in Semantics. Or is it Pragmatics? Perhaps both. Great. Very original.

By the way, another lesson I have learnt over the years is that your project is not a project unless it is funded.

Without funding, your work is illegitimate, at least in the research community.

Oops! Sorry for the gory details. Obscene. Vulgar. Indecent. Pervert. Lewd. Salacious. Detestable. Repulsive. Repugnant. Abhorrent.

 

 

(I will add more context for this post later).

Blog at WordPress.com.