There is a legal agreement written in very legal language that I had to read today. It’s called Mutual Confidentiality Agreement and is required to be signed by two parties who plan to collaborate on some commercial product or service.
After having plodded through the legalese and having understood most of it (I have an advantage in this regard), I found that there was one clause that was glaringly missing from it.
The document lists all the conditions that apply when the Disclosing Party discloses something to the Recipient. It has a section euphemistically titled ‘Injunctive Relief’ that might send the shivers down the Recipient’s spine, depending on the power balance. It also lists all the exceptions under which these conditions may not apply. Such conditions include “court order” and “as required by law”.
What is missing is something that should be included in all such documents post-9/11, in all countries that went for the security Gold Rush, which practically means all countries, (almost) period.
That missing clause should go something like this:
An (unintended) disclosure by the Recipient to any number of third parties of any of the Disclosing Party’s Confidential Information will not be considered a breach of the agreement if it happens under any of the following conditions:
- As part of surveillance operations carried out by the State and any of its agencies, the institution in which the Recipient works or any part thereof, the Local Version of the Truman Show, the Connectivity Service Providers, the Private Security Companies, the Local Quasi-authorised Vigilante Organisations or any other such agencies added to the list till the eve of the day the breach is considered for scrutiny.
- [Talking of eve] As a result of eavesdropping by the agencies and organisations listed in 1.
- As a result of disclosure by the people involved in (a) surveillance and (b) eavesdropping by the agencies and organisations listed in 1 to any of their superiors, colleagues, sub-ordinates, business associates, friends, relatives, family members or strangers.
The clause sounds very reasonable in the post-9/11 world and makes perfect legal sense. After all, any disclosure made (unintentionally) under conditions listed in this clause would not be the fault of the Recipient and it would only be for The Good of The Country and The World and The Humanity (as everyone knows and agrees to).
I have one doubt, however. Won’t the addition of this clause almost nullify everything else in this agreement to mutual confidentiality?
But the clause is required. Isn’t it?
And what about that poor thing, The Market?
Is it already being forgotten in favour of other things?
First, as the saying goes, the bad news. We had submitted a proposal for the Second Workshop on NLP for Less Privileged Languages for the ACL-affiliated conferences. That proposal has not been accepted. Total proposals submitted were 41 and 34 out of them were accepted. Ours was among the not-accepted seven (euphemisms can be consoling).
Was is that bad? I hope not.
Don’t those capital letters look silly in the name of a rejected proposal?
Now the good news. The long awaited new version of Sanchay has been released on Sourceforge. (Well, at least I was awaiting). This version has been named (or numbered?) 0.3.0.
The new Sanchay is a significant improvement over the last public version (0.2). It now has one main GUI from which all the applications can be controlled. There are twelve (GUI based) applications which have been included in this version. These are:
- Sanchay Text Editor that is connected to some other NLP/CL components of Sanchay.
- Table Editor with all the usual facilities.
- A more intelligent Find-Replace-Extract Tool (can search over annotated data and allows you to see the matching files in the annotation interface).
- Word List Builder.
- Word List FST (Finite State Transducer) Visualizer that can be useful for anyone working with morphological analysis etc.
- One of the most accurate Language and Encoding Identifier that is currently trained for 54 langauge-encoding pairs, including most of the major Indian languages. (Yes, I know there is a number agreement problem in the previous sentence).
- A user friendly Syntactic Annotation Interface that is perhaps the most heavily used part of Sanchay till now. Hopefully there will be an even more user friendly version soon.
- A Parallel Corpus Annotation Interface, which is another heavily used component. (Don’t take that ‘heavily’ too seriously).
- An N-gram Language Modeling Tool that allows you to compile models in terms of bytes, letters and words.
- A Discourse Annotation Interface that is yet to be actually used.
- A more intelligent File Splitter.
- An Automatic Annotation tool for POS (Part Of Speech) tagging, chunking and Named Entity Recognition. The first two should work reasonably well, but the last one may not be that useful for practical purposes. This is a CRF (Conditional Random Fields) based tool and it has been trained for Hindi for these three purposes. If you have annotated data, you can use it to train your own taggers and chunkers.
All these components use the customizable language-encoding support, especially useful for South Asian languages, that doesn’t need any support from the operating system or even the installation of any fonts, although these can still be used inside Sanchay if they are there.
More information is available at the Sanchay Home.
The capitals don’t look so bad for a released version.
The downside of even this good news is that my other urgent (to me) work has got delayed as I was working almost exclusively on bringing out this version for the last two weeks or so.
But then you need a reason to wake up and Sanchay is one of my reasons. And I can proudly say that a half-hearted attempt to generate funding for this project by posting it on Micropledge has generated 0$.
Sanchay is still alive as a single parent child without any welfare but with a lot of responsibilities.
Now I can have nightmares about the bugs.
I have been familiar with the phrase ‘gory details’, as anyone has been who has read newspapers or watched TV.
However, today I saw this phrase with a completely new meaning. It was quite a revelation. This is how it goes:
Even if you have severe constraints on resources due to funding (I sympathize…), I recommend not discussing them in quite as gory detail as you do. A very brief mention of the amount of effort invested to date is sufficient.
Gee, thanks for the sympathy. Now I will be able to run my next project on this great resource.
And these are the gory details (complete and unabridged) to which the above quote refers:
Since x has so far mostly been the result of individual effort and it is a non-funded project being undertaken on part-time basis, there were the most stringent resource (financial, temporal, etc.) constraints.
(Only the names have been changed).
Quite a lesson in Semantics. Or is it Pragmatics? Perhaps both. Great. Very original.
By the way, another lesson I have learnt over the years is that your project is not a project unless it is funded.
Without funding, your work is illegitimate, at least in the research community.
Oops! Sorry for the gory details. Obscene. Vulgar. Indecent. Pervert. Lewd. Salacious. Detestable. Repulsive. Repugnant. Abhorrent.
(I will add more context for this post later).