अनिल एकलव्य ⇔ Anil Eklavya

December 31, 2011

A Challenge for RTI Activists in India

There is a major issue that most people, including activists in India have not given as much attention as it merits. That issue is of surveillance of ordinary people, especially within offices, gated societies, campuses and in some cases even independent houses. The use of electronic devices for surveillance is far more widespread than the occasionally reported phone tapping cases. Potentially, and I think in reality too, this is hampering all kind of normal activities that people can indulge in, including acts of dissidence and protest, which I think are the special target of such practices. It has come to the point where any kind of protest activity in India is being ‘nipped in the bud’, at least in urban areas. This is making all the talk about there being democracy in India a joke.

Whether or not I am wrong in saying the above, there is sufficient evidence about the potential and real misuse of surveillance devices. This is part of a worldwide trend that has intensified in the last ten years and many such cases have been reported in various countries, including by the mainstream media, which usually avoids such topics these days. One concrete, practical action that can be taken in this regard is to demand information about this under the Right to Information Act. Since I am not competent enough to do this on my own and I have no contacts of any sort whose help I can take, I challenge (or appeal, whichever way you like to see it) the RTI activists to demand this information from the government as well as corporations.

I list below some specific points which I think should form the basis for such a demand. I only write them down here as rough indicators.

  1. Has the government sanctioned the use of electronic surveillance devices against ordinary people? It yes, who gives authorisation in specific cases and on what basis? What guidelines are followed? Who verifies that these guidelines are followed? Is there any mechanism through which the targeted person can ask for justification for any such surveillance?
  2. Are these devices being used in hotels, hostels, campuses and offices? What safeguards are there against their misuse? Who looks after this? On what basis are these places identified? Are they also being used in independent houses? If yes, what are the details?
  3. Are local administrators or managers or private security agencies allowed to make their own policies regarding this, ignoring any consideration for privacy of individuals? What is the mechanism through which information can be obtained about this and how can any redressal be sought?
  4. Are there any constraints about sharing the information collected through these means? Who decides about such things? Has it become a complete free for all where any administrator or manager or private security company can collect and disseminate such information?
  5. What is the role of IT companies in this, especially outsourcing companies such as TCS, Wipro, Infoys, who have huge numbers of employees, many of whom at any given time are not engaged in productive work? Are these employees being involved in unauthorised and illegal surveillance on ordinary people? What are the details about this, how can they be obtained? If this is happening, does the government know about it and was this officially sanctioned by the government?
  6. Is the information (or any falsified/distorted version of it) collected through surveillance (by whichever agency) being used for punitive purposes against people who are seen to be (rightly or wrongly, with justification or without justification) indulging in some kind of dissidence activity such as opposing the policies of privatisation and corporatisation of everything? If yes, what is the legal basis for this?
  7. Is such information being used to disrupt services such as Internet access and electricity supply for people who are being targeted by the surveillance policies?
  8. Is such information being used to launch smear campaigns against people seen as opposed to the official or corporate policies?
  9. Is such information being used to generally “make life impossible” (as one think tank writer proudly mentioned in one of his articles: on a dissident media website, no less) for the targeted people?
  10. Is such information being given to shopkeepers, hair dressers etc., with the instructions to not provide proper services (or deny providing services) to the targeted people?
  11. Is such information being used to ensure that the targeted people are denied jobs that they apply for? Is it being used to form a kind of (formal or informal) blacklist for employment and related purposes? Is it also being used to create hindrances in the work of these people, if they do get a job.
  12. What is extent of the use of surveillance of any kind in academics? What is the purpose of such surveillance? Are students being involved in such activities as developers, system administrator and informers in general? What are the details of surveillance related projects sanctioned by the government specifically for academic institutions?
  13. To what extent are the communications service providers being used for surveillance, whether for the government or for corporations or for any other organisations?
  14. Does the government know about the use of surveillance devices by the large right-wing organisations and corporations/institutions sympathetic to them? If yes, have any steps being taken to stop this? Has there been any investigation into this?
  15. In case the answer to most of the questions above is negative, is there any mechanism to take action in case evidence is made available that would indicate that the answer to at least a few of these questions may be affirmative?

I have written the above only as initial notes. These can be refined and improved and extended. I would welcome any suggestions.

Full Disclosure: I am writing this as a person who believes that he has been a target of such practices for the last many years, although I don’t even claim to have indulged in much protest of any major significance. I am writing this almost as a last resort, having tried to ignore this issue for a long time, hoping that it would cease in due course. I don’t know what else I can do about this. Please note that being part of the ‘IT community’ in India, I am both more prone to it and also more likely to notice it.

I know how some people are going to react to it, but unless I thought it absolutely necessary (a matter of life and death), I wouldn’t have written it. I am generally not given to stick my head out easily, though I do try to call a spade a spade. I am no Bradley Manning. But I guess my head is already out.

September 5, 2011

The Missing Clause

There is a legal agreement written in very legal language that I had to read today. It’s called Mutual Confidentiality Agreement and is required to be signed by two parties who plan to collaborate on some commercial product or service.

After having plodded through the legalese and having understood most of it (I have an advantage in this regard), I found that there was one clause that was glaringly missing from it.

The document lists all the conditions that apply when the Disclosing Party discloses something to the Recipient. It has a section euphemistically titled ‘Injunctive Relief’ that might send the shivers down the Recipient’s spine, depending on the power balance. It also lists all the exceptions under which these conditions may not apply. Such conditions include “court order” and “as required by law”.

What is missing is something that should be included in all such documents post-9/11, in all countries that went for the security Gold Rush, which practically means all countries, (almost) period.

That missing clause should go something like this:

An (unintended) disclosure by the Recipient to any number of third parties of any of the Disclosing Party’s Confidential Information will not be considered a breach of the agreement if it happens under any of the following conditions:

  1. As part of surveillance operations carried out by the State and any of its agencies, the institution in which the Recipient works or any part thereof, the Local Version of the Truman Show, the Connectivity Service Providers, the Private Security Companies, the Local Quasi-authorised Vigilante Organisations or any other such agencies added to the list till the eve of the day the breach is considered for scrutiny.
  2. [Talking of eve] As a result of eavesdropping by the agencies and organisations listed in 1.
  3. As a result of disclosure by the people involved in (a) surveillance and (b) eavesdropping by the agencies and organisations listed in 1 to any of their superiors, colleagues, sub-ordinates, business associates, friends, relatives, family members or strangers.

The clause sounds very reasonable in the post-9/11 world and makes perfect legal sense. After all, any disclosure made (unintentionally) under conditions listed in this clause would not be the fault of the Recipient and it would only be for The Good of The Country and The World and The Humanity (as everyone knows and agrees to).

I have one doubt, however. Won’t the addition of this clause almost nullify everything else in this agreement to mutual confidentiality?

But the clause is required. Isn’t it?

And what about that poor thing, The Market?

Is it already being forgotten in favour of other things?

May 19, 2011

Sicilian Grand Prix Attack?

Filed under: Absurd,Logging,Network,So It Goes,Technology,Work — anileklavya @ 4:24 pm

There is a website that I have, which has been inoperative for some time. There was not much content on it anyway. However, while working on another site located on the same server, I noticed that the site was being accessed heavily, but since it is inoperative, the web server is logging the errors.

This started on May 11th, 2011. The error log has become huge by the standards of any website that I maintain. It’s size is 8 MB. It has more than 60000 entries, most of them being for the inoperative site I mentioned. And the total number of *distinct* IPs from which the site has been accessed is nearly 20000: way beyond the traffic that I get for even those sites which are operative and regularly used.

Two of the entries in log file indicated that someone had posted a link to download a free book called ‘Starting Out: The Sicilian Grand Prix Attack’. But there has been no facility to add comments for this site on this server, although there was on the server where the site was earlier installed. So perhaps the cached post was from the earlier server.

The important thing is, there were only two requests for this post or this link to the book.

But then I searched for it on Google and saw the cached post about an hour ago.

From a few minutes after that, there is a flood of requests for the same link to the book on Sicilian Grand Prix Attack, even though the site is still inoperative. There are also more attempts to add new comments.

The ‘attack’ seems to continue and the size of the error log file is growing even now.

Meaning what? You tell me.

[Some information that might perhaps be relevant: The site was about a query language that I have designed. I had submitted a paper about this language to a workshop at a very prestigious conference. The paper was rejected. I received the notification on the 7th. Over the next two days, I had an exchange of emails with the PC chairs and the organizers of the workshop about my dissatisfaction with the reviews and the reviewing process. I also asked them to forward my comments to the reviewers. I could be identified from my comments, even if my name had been removed.]

(There are many simple explanations of the above. One of them is that the writer is a moron.)

November 25, 2010

Drones, Aerial and Otherwise

[This was meant to be a comment in reply to an article on the ZNet by Pervez Hoodbhoy about aerial drones and what he calls ‘human drones’.]

I feel very strange, in fact disturbed, to have to make this comment, as this comment is critical of the ideas of someone with whom I have a lot in common, whereas I have almost nothing in common with those he proposes should be killed by any means possible. The strangeness also comes from the fact that the author not only recognizes but has actually been writing about the grounds on which I will put forward my criticism.

I am not sure whether Pervez Hoodbhoy is one or not, but I am an unapologetic atheist and have almost the worst possible opinion about religious fundamentalism of any kind, especially when it is of the organised kind or has organisational support. I also have no hesitation in stating that there IS something that can be called Islamic Fascism and it should be called by its proper name. But I also recognize that often things get mixed up and we can have a resistance movement that is also a Fascist movement. That makes it difficult to analyze them, let alone judge them. We can, however, still analyze and judge specific facts and events and be mostly right about them if we have sufficient evidence and we make sure that we keep our intellectual integrity intact.

Thus, for example, the people who are being targeted by the American drones (excluding those caught in the ‘collateral damage’) have been doing things which no sensible human being can support. These include the horrible terrorist acts, but more importantly (as the author rightly points out) they include their atrocities on their own people: women, protesters of any kind, ‘blasphemers’ etc. I can very well see what would happen to me if I were living in that kind of society.

I also share most of what the author has been saying. The trouble is that, he also makes some leaps of logic or conclusions which seem patently wrong to me and I think I have to register my disagreement with them, because they are far too important to be ignored.

I could, perhaps, write a longer article about it, but for now I will try to say a few things which matter more to me.

The first problem is that the author mixes up the literal and the metaphorical and this logical error leads him to atrocious conclusions. We can surely talk about ‘human drones’ where we are using the word drone metaphorically and the usage is justified as he has eloquently explained by comparing them with the non-human aerial drones. But the comparison itself is metaphorical. And the justification does not remain valid when he goes on to establish a straightforward literal equivalence. The ‘human drones’ might be brutal, unthinking, destructive, (metaphorical) killing machines and so on. They might be, in a sense, inhuman or anti-human, but they still are not non-human. They do have bodies, minds and thoughts. To say otherwise is to abandon one’s thinking in a fit of rage. What they deserve or not may be a matter of debate, but it has to be based on a vision that does not ignore the fact that they still are human beings, however detestable and dangerous they may be.

I am sure the author is aware of some of the history of the world which seems to indicate that there were a lot other people – and still are – who might also be justifiably called ‘human drones’ and who might be considered as bad as the ones he is talking about. That definitely can’t justify their actions, but it has a bearing on what those taking up the task of judging them should think and do.

If you agree with my contention here, then the analysis will lead to different directions. What those directions exactly should be, I won’t go into, because I don’t claim to have the answers, but they would lead to conclusions different from those of the author.

Even the metaphorical comparison here has some problems, which can, as I said, be guessed from what the author himself has been writing. There are some similarities, but there are also many differences. The ‘human drones’ still come from a certain society and they are part of it. The aerial drones are just machines, they don’t come from any society. The ‘human drones’ come from societies which have seen destruction of the worst kind for ages, whereas the aerial drones are (literally) remote controlled by those who played the primary role in bringing about this destruction, as the author himself has written and said elsewhere. If you ignore these facts, you will again be lead to very risky (and I would say immoral and unfair) conclusions.

With just a little dilution of the metaphor, haven’t most of the weapon laden humans (soldiers, commandos etc.) been kinds of ‘human drones’? The ones author talks about may be deadlier, but the situation is more drastic too. On the one hand you have an empire that is more powerful than any in history and on the other you have an almost primitive society that thinks it is defending itself, just as the empire says it is defending itself. Will it be improper to ask who has got more people killed? What about the ‘human drones’ of the empire: thinking of, say Iraq?

As far as I can see, the use of aerial drones to kill people, whoever they may be, is simply indefensible. Because if their use is justified on the grounds of the monstrosities of the Taliban ideologues and operators, what about chemicals? If some people were to form an anti-Taliban group and they were to infiltrate the ‘affected’ villages and towns and if they were to use poisoning of the water supply or something similar to kill people in the areas where these monsters are suspected to be, would that be justifiable? The aerial drones are, after all, just a technological device. There can be other such technologies and devices.

There must have been some very solid reasons why the whole world agreed to ban the use of chemical and biological weapons after the first world war and stuck to that ban (with a few universally condemned exceptions), though they were very effective and the Nazis were very evil.

The other big problem I have with the author’s opinions on this matter is that he suggests that the American aerial drones are one of the unsavoury weapons we should use against the fundamentalist Islamic militants. This is a logical error as well as a moral one. The logical error is that ‘we’ are not using the weapons at all, the empire is using them. And it is the same empire that created the problem in the first place, once again as the author himself has said. We have no control over how these drones are or will be used and who they will be used against in the future. Can’t they, some day in the future, be used against ‘us’? Why not? Perhaps the empire won’t use them directly, but it can always outsource their use: think again of Iraq. Iraq of the past and Iraq of the present. The author, in fact, knows very well the other examples that I could give.

To put it in Orwell’s words, make a habit of imprisoning Fascists without trial, and perhaps the process won’t stop at Fascists.

The use of aerial drones, they being just a technological device, might perhaps be justifiable for certain purposes, for example in managing relief work during large scale natural disasters, e.g. the wild fires in Russia or the frequent floods in India and China (but not as just a cover for their more sinister use). Their use for killing humans is, however, of a completely different nature.

The moral error is that the author’s conclusions unambiguously imply that ends justify the means. As long as these monsters producing (or becoming) ‘human drones’ are killed, it doesn’t matter whether the weapons are, to use the author’s word, unsavory. It also doesn’t matter that they are being used by an empire ‘we’ are opposed to and which started the mess. (Actually, the mess was started long ago by another empire, but then we could say there were even older empires who played a role in creating this mess, so let’s not go into that).

I even sort of agree with the author’s idea that recommending the standard left meta-technique of “mobilizing” people (actually, it is not just leftists who use such techniques) may not be very practical under the conditions prevalent in this case. But, as I said, though I understand the severity of the problems, I don’t have the solutions. I only want to say that the kind of errors that the author makes can lead us to a worse situation. We should not forget (I am sure the author knows this too) that it is not just a case of some bad apples. Even if these were to be removed by using ‘unsavory’ forces and weapons, the problems are not going to be solved so easily. Because there is not just one clearcut problem but many problems which are all meshed together and the meshing is too complex and barely visible.

At the risk of making an unpalatable statement, I would say that if any party in conflicts like this has to be excused for using unsavory weapons or tactics, it will have to be the much weaker party, not the strongest party in history. But I don’t think I would include suicide bombing among those weapons or tactics. And I also realize the limits to which I can be entitled to sit in judgement over people living under such conditions.

The author need not offer me (business class or mere economy) tickets to Waziristan. I am scared to even go to places in India.

One more problem that I have with the author’s writings is that he seems to have assigned blame to most parties involved in the conflict: the Army, the militants, the Taliban, the Americans etc., but has he (I haven’t read everything written by him) considered, equally critically, the role of the Pakistani elite (not just the leftists) and the somewhat ‘secular’ middle class? He seems to have hinted at their role, but it seems to me that their role was, is and will be far more critical in determining what is happening and what will happen. After all, the rise (if we can call it that) of the Taliban closely parallels the Islamisation of the Pakistani society in general. How did the Pakistani elite (intellectual, feudal and official) help in this and what can they do to solve this problem?

That, it seems to me, is the crucial question to ask (though it won’t lead to a quick fix), apart from what people around the world can do about those controlling the aerial drones, towards whom, as the author earlier wrote, “we still dare not point a finger at”. After going on to point a finger at them, the author seems to have now moved to the position of accepting their support in terms of killings by the aerial drones in order to contain the ‘human drones’, which (to be a bit harsh) doesn’t make sense to me.

Related to this is another question: does the natural antipathy of the Pakistani elite towards these ‘primitive’ tribal communities has something to do with the position that the author has taken and which he says many others (‘educated people’) share?

There are, of course, other actors. The author has mentioned Saudi Arab, but Iran has a role. Even India has (or at least wants to have) a role.

But I want to end on a positive note. It’s heartening to see that the ZNet allows this kind of a dissenting view to be presented on its platform. That should be a good sign for the discussion.

[Unfortunately, I have to end on a slightly negative note. As I was going to add the comment to the article, I realized that I have to be a ‘sustainer’ even to post a comment. And I have not been able to become a sustainer for reasons I won’t go into here. Hence I post it here.]

October 28, 2008

सांगणिक भाषाविज्ञान

जैसा मैंने पिछली प्रविष्टी (‘पोस्ट’ के लिए यह शब्द इस्तेमाल हो सकता है?) में लिखा था, अगले कुछ हफ्तों में मैं संचय के बारे में लिखने जा रहा हूं।

लेकिन क्योंकि संचय खास तौर पर (आम उपयोक्ताओं के अलावा) सांगणिक भाषाविज्ञान या भाषाविज्ञान के शोधकर्ताओं के लिए बनाया गया है, इस बात को साफ कर देना ठीक रहेगा कि सांगणिक भाषाविज्ञान या भाषाविज्ञान के माने क्या है, या अगर आप इनके माने जानते ही हैं तब भी इनसे मेरा अभिप्राय क्या है। यह दूसरी बात इसलिए कि इन विषयों (सांगणिक भाषाविज्ञान या भाषाविज्ञान) के अर्थ के बारे में आम लोगों में तो तमाम तरह की ग़लतफ़हमियाँ हैं ही, पर इन विषयों के शोधकर्ताओं में भी इनकी परिभाषा पर एक राय नहीं है।

सच तो यह है कि हिंदी जगत में तो अब भी अधिकतर लोग भाषाविज्ञान का अर्थ उस तरह के अध्ययन से लगाते हैं जो पिछली सदी के शुरू में लगाया जाता था। लेकिन बहस की इस दिशा में अभी मैं नहीं जाना चाहूंगा क्योंकि इसके बारे में कहने को इतना अधिक है कि अभी जो उद्देश्य है वो पीछे ही रह जाएगा।

वैसे सांगणिक भाषाविज्ञान या भाषाविज्ञान की परिभाषा या उनकी सीमाओं के बारे में भी कहने को बहुत-बहुत कुछ है, पर फिलहाल थोड़े से ही काम चलाया जा सकता है।

तो छोटे में कहा जाए तो भाषाविज्ञान शोध या अध्ययन का वह विषय है जिसमें किसी एक भाषा के व्याकरण का ही अध्ययन नहीं किया जाता बल्कि नैसर्गिक या मानुषिक (यानी कृत्रिम नहीं) भाषा का वैज्ञानिक रूप से अध्ययन किया जाता है। अब यह धारणा व्यापक रूप से स्वीकृत है कि मानव मस्तिष्क की संरचना का भाषा की संरचना से सीधा संबंध है और क्योंकि सभी मानवों के मस्तिष्क की संरचना मूलतः एक ही जैसी है, तो सभी नैसर्गिक या मानुषिक भाषाओं में भी सतही लक्षणों को छोड़ कर बाकी सब एक ही जैसा है। इसीलिए, जैसा कि इन विषयों के आधुनिक साहित्य में प्रसिद्ध है, अगर किसी अमरीकी के शिशु को जन्म के तुरंत बाद कोई चीनी परिवार गोद ले ले और वह बच्चा चीन में ही पले तो वह उतनी आसानी से चीनी बोलना सीखेगा जितनी आसानी से कोई चीनी परिवार का बच्चा। ऐसी ढेर सारी और बातें हैं, पर मुख्य बात है कि भाषाविज्ञान नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन है।

कम से कम कोशिश तो यही है कि अध्ययन वैज्ञानिक रहे, पर वो वास्तव में रह पाता है या नहीं, यह बहस का विषय है।

अब सांगणिक भाषाविज्ञान पर आएं तो इस विषय में हमारा ध्यान मानवों की बजाय संगणक यानी कंप्यूटर पर आ जाता है, पर पिछली शर्त फिर भी लागू रहती है: नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन। अंतर यह है कि हमारा उद्देश्य अब यह हो जाता है कि कंप्यूटर को इस लायक बनाया जा सके कि वो नैसर्गिक या मानुषिक भाषा को समझ सके और उसका प्रयोग कर सके। जाहिर है यह अभी बहुत दूर की बात है और इसमें कोई आश्चर्य भी नहीं होना चाहिए क्योंकि अभी भाषाविज्ञान में ही (पिछली सदी की असाधारण उपलब्धियों के बाद भी) वैज्ञानिक ढेर सारी बाधाओं में फंसे हैं।

फिर भी, सांगणिक भाषाविज्ञान में काफ़ी कुछ संभव हो चुका है और काफ़ी कुछ आगे (निकट भविष्य में) संभव हो सकता है। लेकिन इसमें कंप्यूटर का मानव जैसे भाषा बोलना-समझना शामिल नहीं है। जो शामिल है वो हैं ऐसी तकनीक जो दस्तावेजों को ज़्यादा अच्छी तरह ढूंढ सकें, उनका सारांश बना सकें, कुछ हद तक उनका अनुवाद कर सकें आदि।

लेकिन हिंदुस्तानी परिप्रेक्ष्य में परेशानी यह है कि हम अभी इस हालत में भी नहीं पहुंचे हैं कि आसानी से कंप्यूटर का एक बेहतर टाइपराइटर की तरह ही उपयोग कर सकें। इस दिशा में कुछ उपलब्धियाँ हुई हैं, पर अंग्रेज़ी या प्रमुख यूरोपीय भाषाओं की तुलना में हम कहीं भी नहीं हैं। जैसा कि आपमें से अधिकतर जानते ही हैं, यह एक लंबी कहानी है जिसे अभी छोड़ देना ही ठीक है।

पर संचय का विकास इसी परिप्रेक्ष्य में किया गया है, जिसके बारे में आगे बात करेंगे।

October 26, 2008

संचय का परिचय

पिछली पोस्ट (शर्म के साथ कहना पड़ रहा है कि पोस्ट के लिए कोई उपयुक्त शब्द नहीं ढूंढ पा रहा हूं) में मैंने (अंग्रेज़ी में) संचय के नये संस्करण के बारे में लिखा था। मज़े की बात है कि संचय के बारे में मैंने अभी हिंदी में शायद ही कुछ लिखा हो। इस भूल को सुधारने की कोशिश में अब अगले कुछ हफ्तों में संचय के बारे में कुछ लिखने का सोचा है।

तो संचय कौन है? या संचय क्या है?

पहले सवाल का तो जवाब (अमरीकी शब्दावली में) यह है कि संचय एक सिंगल पेरेंट चाइल्ड है जिसे किसी वेलफेयर का लाभ तो नहीं मिल रहा पर जिस पर बहुत सी ज़िम्मेदारियाँ हैं।

दूसरे सवाल का जवाब यह है कि संचय सांगणिक भाषाविज्ञान (कंप्यूटेशनल लिंग्विस्टिक्स) या भाषाविज्ञान के क्षेत्र में काम कर रहे शोधकर्ताओं के लिए उपयोगी सांगणिक औजारों का एक मुक्त (मुफ्त भी कह सकते हैं) तथा ओपेन सोर्स संकलन है। पर खास तौर से यह कंप्यूटर पर भारतीय भाषाओं का उपयोग करने वाले किसी भी व्यक्ति के काम आ सकता है। इसकी एक विशेषता है कि इसमें नयी भाषाओं तथा एनकोडिंगों को आसानी से शामिल किया जा सकता है। लगभग सभी प्रमुख भारतीय भाषाएं इसमें पहले से ही शामिल हैं और संचय में उनके उपयोग के लिए ऑपरेटिंग सिस्टम पर आप निर्भर नहीं है, हालांकि अगर ऑपरेटिंग सिस्टम में ऐसी कोई भी भाषा शामिल है तो उस सुविधा का भी आप उपयोग संचय में कर सकते हैं। यही नहीं, संचय का एक ही संस्करण विंडोज़ तथा लिनक्स/यूनिक्स दोनों पर काम करता है, बशर्ते आपने जे. डी. के. (जावा डेवलपमेंट किट) इंस्टॉल कर रखा हो। यहाँ तक कि आपकी भाषा का फोंट भी ऑपरेटिंग सिस्टम में इंस्टॉल होना ज़रूरी नहीं है।

संचय का वर्तमान संस्करण 0.3.0 है। इस संस्करण में पिछले संस्करण से सबसे बड़ा अंतर यह है कि अब एक ही जगह से संचय के सभी औजार इस्तेमाल किए जा सकते हैं, अलग-अलग स्क्रिप्ट का नाम याद रखने की ज़रूरत नहीं है। कुल मिला कर बारह औजार (ऐप्लीकेशंस) शामिल किए गए हैं, जो हैं:

  1. संचय पाठ संपादक (टैक्सट एडिटर)
  2. सारणी संपादक (टेबल एडिटर)
  3. खोज-बदल-निकाल औजार (फाइंड रिप्लेस ऐक्सट्रैक्ट टूल)
  4. शब्द सूची निर्माण औजार (वर्ड लिस्ट बिल्डर)
  5. शब्द सूची विश्लेषण औजार (वर्ड लिस्ट ऐनेलाइज़र ऐंड विज़ुअलाइज़र)
  6. भाषा तथा एनकोडिंग पहचान औजार (लैंग्वेज ऐंड एनकोडिंग आइडेंटिफिकेशन)
  7. वाक्य रचना अभिटिप्पण अंतराफलक (सिन्टैक्टिक ऐनोटेशन इंटरफेस)
  8. समांतर वांगमय अभिटिप्पण अंतराफलक (पैरेलल कोर्पस ऐनोटेशन इंटरफेस)
  9. एन-ग्राम भाषाई प्रतिरूपण (एन-ग्राम लैंग्वेज मॉडेलिंग टूल)
  10. संभाषण वांगमय अभिटिप्पण अंतराफलक (डिस्कोर्स ऐनोटेशन इंटरफेस)
  11. दस्तावेज विभाजक (फाइल स्प्लिटर)
  12. स्वचालित अभिटिप्पण औजार (ऑटोमैटिक ऐनोटेशन टूल)

अगर इनमें से अधिकतर का सिर-पैर ना समझ आ रहा हो तो थोड़ा इंतज़ार करें। आगे इनके बारे में अधिक जानकारी देने की कोशिश रहेगी।

शायद इतना और जोड़ देने में कोई बुराई नहीं है कि संचय पिछले कुछ सालों से इस नाचीज़ के जिद्दी संकल्प का परिणाम है, जिसमें कुछ और लोगों का भी सहयोग रहा है, चाहे थोड़ा-थोड़ा ही। उन सभी लोगों के नाम संचय के वेबस्थल पर जल्दी ही देखे जा सकेंगे। ये लगभग सभी विद्यार्थी हैं (या थे) जिन्होंने मेरे ‘मार्गदर्शन’ में किसी परियोजना – प्रॉजेक्ट – पर काम किया था या कर रहे हैं।

उम्मीद है कि संचय का इससे भी अगला संस्करण कुछ महीने में आ पाएगा और उसमें और भी अधिक औजार तथा सुविधाएं होंगी।

October 5, 2008

Good News and Bad News on the CL Front

First, as the saying goes, the bad news. We had submitted a proposal for the Second Workshop on NLP for Less Privileged Languages for the ACL-affiliated conferences. That proposal has not been accepted. Total proposals submitted were 41 and 34 out of them were accepted. Ours was among the not-accepted seven (euphemisms can be consoling).

Was is that bad? I hope not.

Don’t those capital letters look silly in the name of a rejected proposal?

Now the good news. The long awaited new version of Sanchay has been released on Sourceforge. (Well, at least I was awaiting). This version has been named (or numbered?) 0.3.0.

The new Sanchay is a significant improvement over the last public version (0.2). It now has one main GUI from which all the applications can be controlled. There are twelve (GUI based) applications which have been included in this version. These are:

  • Sanchay Text Editor that is connected to some other NLP/CL components of Sanchay.
  • Table Editor with all the usual facilities.
  • A more intelligent Find-Replace-Extract Tool (can search over annotated data and allows you to see the matching files in the annotation interface).
  • Word List Builder.
  • Word List FST (Finite State Transducer) Visualizer that can be useful for anyone working with morphological analysis etc.
  • One of the most accurate Language and Encoding Identifier that is currently trained for 54 langauge-encoding pairs, including most of the major Indian languages. (Yes, I know there is a number agreement problem in the previous sentence).
  • A user friendly Syntactic Annotation Interface that is perhaps the most heavily used part of Sanchay till now. Hopefully there will be an even more user friendly version soon.
  • A Parallel Corpus Annotation Interface, which is another heavily used component. (Don’t take that ‘heavily’ too seriously).
  • An N-gram Language Modeling Tool that allows you to compile models in terms of bytes, letters and words.
  • A Discourse Annotation Interface that is yet to be actually used.
  • A more intelligent File Splitter.
  • An Automatic Annotation tool for POS (Part Of Speech) tagging, chunking and Named Entity Recognition. The first two should work reasonably well, but the last one may not be that useful for practical purposes. This is a CRF (Conditional Random Fields) based tool and it has been trained for Hindi for these three purposes. If you have annotated data, you can use it to train your own taggers and chunkers.

All these components use the customizable language-encoding support, especially useful for South Asian languages, that doesn’t need any support from the operating system or even the installation of any fonts, although these can still be used inside Sanchay if they are there.

More information is available at the Sanchay Home.

The capitals don’t look so bad for a released version.

The downside of even this good news is that my other urgent (to me) work has got delayed as I was working almost exclusively on bringing out this version for the last two weeks or so.

But then you need a reason to wake up and Sanchay is one of my reasons. And I can proudly say that a half-hearted attempt to generate funding for this project by posting it on Micropledge has generated 0$.

Sanchay is still alive as a single parent child without any welfare but with a lot of responsibilities.

Now I can have nightmares about the bugs.

September 16, 2008

Dried Access Denied after Dinner

Filed under: Access Denied,Dried,Network,Technology,Things As They Are,Work — anileklavya @ 3:32 am
Acccess Denied to WordPress

September 2, 2008

Fried Access Denied at Breakfast

Filed under: Access Denied,Fried,Network,Technology,Things As They Are,Work — anileklavya @ 9:14 am
Access Denied to IIIT Access Denied to LTRC Access Denied to The Hindu

July 29, 2008

V for Vodot’s Vendetta

Vodot hasn’t arrived, but I seem to have got (t)his vindictive message:

V for Vodot\'s Vendetta

But I thought Vodot was being held by someone.

Is this some kind of self-sponsored hijacking? Or is the world not so bad (or not so good) and someone is telling me that he (or they) has (or have) hijacked Vodot and I will be getting some message for ransom?

But why should I pay ransom for Vodot?

My shoes don’t have strings.

Next Page »

Blog at WordPress.com.