अनिल एकलव्य ⇔ Anil Eklavya

September 29, 2007

When Encoded Convenience Gets Decoded as Frustration

Filed under: Linguistics et al.,Things As They Are,Work — anileklavya @ 8:32 am

Almost since the first time I got long time access to a computer (that would be around one and half decade ago), I have been struggling to use computers for Indian languages. That was much before I managed to reach a place where I could do research in language+computers. There are many who say that India is an ‘IT super power’ and I am living in a city which has almost become the IT capital of this ‘super power’. But I still can’t use computers easily for Indian languages for all the purposes for which I can use computers for English. Very easily.

Much of this has to do with the way language and encoding support is provided on computers. A lot also has to do with the simple fact that somewhere, someone (should I say manywhere, manyone?) preferred a convenient method over a much much better one. That convenience got encoded into something which was to be a solution to some problem or some information which I needed. When I tried to decode that, I got enough frustration to make me think about doing something.

Yes, this post is provoked by a fresh downpour of decoded frustration.

So I have been trying to do something to reduce the amount of such encoded convenience in the Universe, but I have trouble even in convincing others that there is a problem. (Digression: You could say that one of the ways I am using to prolong the heat life of the Universe is entropy itself. Food for thought. But the Second Law of Thermodynamics ensures that even as I try this, I add my own share of, what else, entropy.)

If there is in fact a problem, many others might also be facing the same problem, right? Then why am I unable to convince others that there is a problem? Simply because the size of intersection of the sets of people who I have to convince to address this problem and of those who face this problem is very small. These are different people. Those who can address the problem don’t face the problem because they don’t really want (or need?) to be able to use computers for Indian languages with the same ease with which they can use computers for English. They, at most, use computer for Indian languages for very limited purposes and are quite content with ad-hoc solutions. On the other hand, those who want the Indian languages* to be equally privileged with some other languages spoken by the same number of people, are usually not the ones who can address this problem.

* I will repeat here for the Nth time that MANY of these languages are natively spoken by HUNDREDS OF MILLIONS of people. They may be less privileged languages, but it is not quite appropriate to call them ‘minority languages’. Of course, there are also real minority languages in India…

More coming…

Advertisements

September 27, 2007

The Relevance of ‘Shared Tasks’ in NLP

Filed under: Articles,Linguistics et al.,NLP,Work — anileklavya @ 2:14 pm

Even after centuries of studies, we still have very little hard scientific knowledge about natural languages (NLs). Unlike in other branches of engineering, we don’t know the exact physical or mathematical laws which NLs follows, or even whether they do. So, at least for the time being, we can only rely on empirical techniques for solving practical problems in Natural Language Processing (NLP). Even after some general approach seems to hold promise for solving a problem, a lot of practical work remains to be done in refining the methods and in tuning the systems for the best possible performance. This is why once some initial breakthrough has been made, a lot of people have to try the techniques under different conditions to figure out what is the best setup, i.e., the best selection of parameter values, features, etc. What has come to be called a ‘shared task’ is one way of ensuring that this gets done.

Shared tasks are contest like events where many researchers or even developers working on a particular problem or a set of similar problems try to come up with the best systems. All the systems are evaluated on the same data to provide a fair, competition like setting. All the participants also have to submit papers describing their systems. The major goals of a shared task are:

  • To find out what is the state of the art in a specific area
  • To simultaneously advance the state of the art, even if slightly
  • To bring together researchers so that they can interact and perhaps argue and discuss
  • To act as an incentive for the researchers to build proper systems, some of which may become available for use by others

It was in view of this that the NLP Association of India (NLPAI) started conducting an annual event called the NLPAI Machine Learning Contest in which researchers, including students, are invited to participate and compete in solving a specific problem which is considered relevant. Last year the topic of the shared task was Shallow Parsing for South Asian languages. A workshop was also organized as an extension of this event as part of the IJCAI conference, which was held in Hyderabad, India. The topic this year was Named Entity Recognition for South and South East Asian languages. This year’s event will also have an extended version in the form of a workshop as part of the IJCNLP conference, which is also going to be held in Hyderabad, India.

In the context of South Asian languages, conducting a shared task has its own problems. This is because funding for them is usually unlikely. Without funding it is difficult to prepare the reference data which is usually essential for a shared task. Those who have annotated data are often unwilling to share it with others. IIIT has taken a lead in preparing annotated data for various purposes and also sharing it with others. Since the data is prepared under difficult conditions, sometimes there are problems with the data, but let’s hope things will improve. In any case, data with some errors is better than no data.

Another problem is that the number of full time researchers in NLP is quite small in South Asia, which affects the quality of submissions, but the shared tasks are meant to get over this situation by creating awareness and interest.

It needs to be emphasized that the goal is not just to show good performace on the data provided but also to build practically usable systems that perform well in general. This implies that the participants are supposed to go beyong being mere competitors in a contest. And the idea is to go further than just being the first in the race. Participation in a shared task should be a milestone, not the final destination.

I feel compelled to end this write up by saying that shared tasks with focus on South Asia can only succeed if there is collaboration and sharing of resources by researchers working in South Asia. We are still far from that situation.

The IJCNLP NER workshop site is located here.

(This write up was originally written for the NLPAI newsletter called Spandan, but it was taking a lo…ng time, I became impatient and so you find it here)

September 11, 2007

Faces of Dignity (Contd.)

Filed under: Movies,Things As They Are — anileklavya @ 11:52 am

I talked about how dignity can be ‘maintained’ in the face of two different extreme conditions which can easily destroy the kind of dignity I am referring to: extreme wealth and power as well as extreme deprivation.

This is true, of course (that’s why I said it: I wouldn’t lie, would I?). However, what I called extremes are not really extremes. We Indians shouldn’t have difficulty in understanding it. Rosetta’s poverty is relatively much better (what a word to use!) than that of tens (hundreds?) of millions of Indians. Rural as well as urban. It seems to me that Life in a Metro should also include life in metropolitan, even somewhat cosmopolitan, dwellings of the poor called slums. I may be wrong. Anyone can be wrong. Nothing is absolutely right or wrong. We all know that, of course.

Still, since it’s beyond my capabilities to rise above the notions of right and wrong, I do wonder how hard it can be for a person in a typical Indian slum to maintain (as in maintain lifestyle?) the basic human dignity I am so stuck up on. Sounds incredible to me, but it may just be true that Rosetta is lucky. And the princess is, of course, much more luckier. Not just because she falls in safe hands.

Just to make it simple to understand, and it is amazing how difficult it can be to understand such things, I can cite the example of a man (leave aside women) being tortured in a police station. Any police station anywhere in the world where torture is still an acceptable method of ‘interrogation’. Can a man being tortured ‘maintain’ his dignity? Fantasies will tell you that he can. Perhaps that’s true. But people from George Orwell to Khushwant Singh (not to mention our actual executioners, beg pardon, executives of the Law and Justice System) have pointed out, everyone has a limit. Where you lose your capacity to retain (that sounds better) your basic human dignity. Because it is snatched away from you and you can’t even fight back. You are trapped. (Is anyone calling a psychoanalyst to ask why I use this word so much?).

When I originally planned to write this post, I had thought I would write about the characters of Ann and Rosetta. About the technical aspects of the movie. About acting and direction. And, most importantly, about specific incidents in the movies which are not talked about by your usual reviewers. Like the scene of the princess doing her shopping with precious little (borrowed) money. Or like the scene in the barber shop. Or even about the dignity of the (comic) photographer: an unlikely candidate. Or why Rosetta leaves her job which she got after doing something which cost her the favor of many viewers and reviewers. Or about her apparent stomach aches. Or about the only time in the movie when she has an (awkwardly) good time.

It has turned out differently because what I wrote today is what I wanted to write today. No stylistic effect intended. No explanation intended. No protestation intended. No apologies intended. No pun intended either. Sometimes simple truth alone can be quite stylistic. I hope (or fear?) it often is.

So what’s the point? Well, the point can’t always be expressed in a punch line. You can know it if you want to.

Enough! No more waste of my philosophical profundities on a mere blog post.

September 6, 2007

Faces of Dignity at Two Extremes

Filed under: Movies,Things As They Are — anileklavya @ 5:27 pm

For me, the single most important thing for acceptable human life is basic human dignity. Animals also have their own kind of dignity, but since they, perhaps, don’t have any self-consciousness, they automatically get all the dignity they need. Of course, humans have changed this situation, but that is a different story. For humans, on the other hand, dignity – basic human dignity, not the dignity associated with power, rank etc. – is something very hard to get or maintain. This is partly because it depends to a great extent on what is outside you: other people, the society you live in, the environment.

This may all be true, but it’s very abstract. What do I mean by ‘basic human dignity’? I can either give an academic sounding definition, or I can explain by example. I will do the latter here. The former can be reserved for some later academic work :-)

So if you want to see what basic human dignity means, you can watch two movies. You can see what dignity means at two ends of the socio-economic spectrum. The two movies are Roman Holiday and Rosetta.

The first shows you how dignity can be maintained even when you are deluged by wealth, a kind of power, pretentiousness and all the masked menace it means. How a princess can be so dignified that a down-and-out journalist looking for a scoop that will allow him to escape the situation he is trapped in, is moved to drop the scoop and his chances of escape, even when the princess is an easy ‘fair game’.

The other movie shows you how a down-and-out very poor girl trapped in a hell because of her poverty can still be so dignified that you can’t help feeling respect and awe for human life. And, not quite incidentally, disgust for the system that has created her hell and forces her to live in it with hardly any chances of escape.

Wait for more…

September 1, 2007

Bollywood Growing Up

Filed under: Movies — anileklavya @ 1:43 am

I would never have seen this movie had Kalpana Sharma not written an article about it. That’s because it has one of the worst names a movie can have. Believe it or not, this movie is a good one, even though it’s called (Ugh!) Chak De India.

As they say in such situations, no points for guessing. That there are several servings of plenty of patriotism. There are some other usual Bollywood ingredients too, but not too many. The movie is of a surprisingly grown up kind for hard core mainstream Bollywood fare.

What did I like in the movie? For one thing, as the title of Kalpana Sharma’s article says, the celebration of difference. The run of the mill reviews may tell you that the star of this movie is Shahrukh Khan, but actually there are many stars. All the girls who played the roles of hockey players. That’s right, the film is about women’s hockey, in a country which is mad about cricket, but whose national game is (men’s) hockey. We are after all very good at such, well, duplicities.

So the movie is about how a rag-tag team of real (mostly) desi girls from all corners of India is inspired to win the (women’s) hockey World Cup. By a coach who is a former disgraced (men’s) hockey star. The fact of his disgrace is closely bound to the fact that he is a Muslim who was the captain of a team which lost, no points again, to Pakistan when his deciding penalty stroke ended up being a missed chance.

But the above summary doesn’t do justice to the film, because there are many other things which I liked. One being the language(s) used by the players from (desi) ‘states’. Another one is that there is no girl who is shown to be the Hero’s girl: quite a bold thing for a Bollywood movie which has been made for one of the most male chauvinistic societies in the world. Can you imagine a Hero who is without a girl, even a bewafa one? And that too when he is surrounded by girls all day. Who are almost at his mercy. Amazing! How would an Indian male be able to digest this fact. Crazy! (Is someone asking whether he is …?).

So, the movie is bold about representation of the minorities, stereotypes of tribals from Jharkhand (‘junglees’), girls from the North East (‘chinks’), cricket being a real career and hockey being ‘just a stupid game’, women’s career versus men’s career etc.

The scenes of games also look quite authentic, perhaps with some help from the CGI people. The director sure seems to know something about hockey. May I say that it is one of the best sports movies made in India, including Iqbal.

There is one thing though which is very odious about the film: the coach seems to be acting like a gentler version of the trainer in Full Metal Jacket. And this brings us back to the overdose of patriotism, which often threatens to make the movie unwatchable. Perhaps the director thought that the bitter anti-stereotypic medicine can only be given with the sugar coating of patriotism. The coating has become quite thick. Perhaps we will have to wait for some more time to have movies without such things. Till the protesters on the margins have struggled enough and sacrificed enough and till what they struggled for becomes mainstream and can be openly accepted by the Yash Chopras of the world.

If it ever does.

Create a free website or blog at WordPress.com.