December 1, 2011

The Original Mark Twain

A day or two ago Google put on its search engine interface what they call a doodle. It was for celebrating the 176th birthday of Samuel Langhorne Clemens, otherwise known as Mark Twain. I used to have trouble recalling his real name, so commonly known and popular his pen name has become, something like that of George Orwell, who, by the way, wrote an essay about him titled ‘The Licensed Jester’ (note this down as evidence of contradiction).

I had read Huckleberry Finn during my first college degree days. At that time I was aware of the fact that Mark Twain was a famous writer. I had read a few short things by him in English text books. I had also read a part of Tom Sawyer, but couldn’t finish it because it had to be returned. But I did not know about this book, Huck Finn. I didn’t know that it was considered the first Great American Novel. But even before finishing that shortish novel, I had no doubt that it was one of the best American novels ever written.

Note the self-referentiality and pomposity and keep it in mind while reading the rest of this article.

But this article is going to be more of a cut-and-paste (copy-and-paste, to be exact) job. That’s because this is the only way to do justice to what I want to say here. And there is no editor and a board of reviewers to look over my shoulder, so that makes it easy. The source is also in public domain, so no legal problems. If you are a fair use fanatic, go read something else.

If even people like me have trouble recalling his real name, it can be expected that few people (other than literary scholars and may be some other literary geeks) know the story of the origin of his pen name. Those who do know, only know a part of it, and that too the part that is less interesting.

Now I can add here that there is a theory among scholars that this story is perhaps not factual. I am not aware of their arguments and since Mark Twain himself explained in detail why he became Mark Twain, and I also know him to be one of most honest people in literature or elsewhere, I will ignore that theory and get on with the one that I like.

In fact, when I first read this story it made such a great impression on me that I have been aching ever since to write about it. The story forms Chapter 50 of another of his great books, Life on the Mississippi. I read it some years after I had read Huck Finn and this time I had borrowed the book (from the British Library, if I remember correctly: note this down for your later judgement). Since I had it in my own name and was ready to pay the fine for late fees (which I did very frequently and they were substantial sums for me at that time), I was able to finish this much longer book (I was as busy as anyone can be in those days: note it down). I liked it almost as much as Huck Finn. For the record, I completed reading Tom Sawyer much later and didn’t like it that much. No match for Huck Finn.

The story, or the part of the story that is commonly presented and known, is also given on the Wikipedia page about Mark Twain:

He maintained that his primary pen name came from his years working on Mississippi riverboats, where two fathoms, a depth indicating safe water for passage of boat, was measured on the sounding line. A fathom is a maritime unit of depth, equivalent to two yards (1.8 m); twain is an archaic term for “two.” The riverboatman’s cry was mark twain or, more fully, by the mark twain, meaning “according to the mark [on the line], [the depth is] two [fathoms],” that is, “The water is 12 feet (3.7 m) deep and it is safe to pass.”

The Wikipedia page goes on to say that he “claimed that his famous pen name was not entirely his invention” and that “In Life on the Mississippi, he wrote:”

Captain Isaiah Sellers was not of literary turn or capacity, but he used to jot down brief paragraphs of plain practical information about the river, and sign them “MARK TWAIN,” and give them to the New Orleans Picayune. They related to the stage and condition of the river, and were accurate and valuable; … At the time that the telegraph brought the news of his death, I was on the Pacific coast. I was a fresh new journalist, and needed a nom de guerre; so I confiscated the ancient mariner’s discarded one, and have done my best to make it remain what it was in his hands – a sign and symbol and warrant that whatever is found in its company may be gambled on as being the petrified truth; how I have succeeded, it would not be modest in me to say.

As I said, the complete story forms a full chapter of the said book. The title of the chapter is “The ‘Original Jacobs'”.

Mark Twain was not faultless, of course, and he was also not one of those who only seem to become faultless by adopting the current orthodoxy about political and social correctness, taking no risks of their own, and having done that, they entitle themselves to judge and sentence anyone from the present or the past, say, for having shown a little bit of racist tendencies in the seventeenth century or of being a little sexist in the first half of the 20th century.

That is not to say that he did not do some nasty things in his time. In fact, the interesting part of the story is about just that. Then there is also the fact that he displayed considerable literary/stylistic prescriptivism in blasting some writers and critics of his time, but I am not going to go into that.

The introduction to the story is that there was another man who had used the pen name Mark Twain. He wasn’t a literary writer, but he was something impressive: impressive enough for Mark Twain to say that it was an honor to be the only one hated by him.

So here comes the copy-and-paste of the 50th chapter of Life on the Mississippi (I have left out the final paragraph, which is not relevant to the story):

Chapter 50 The ‘Original Jacobs’

WE had some talk about Captain Isaiah Sellers, now many years dead. He
was a fine man, a high-minded man, and greatly respected both ashore and
on the river. He was very tall, well built, and handsome; and in his old
age–as I remember him–his hair was as black as an Indian’s, and his
eye and hand were as strong and steady and his nerve and judgment as
firm and clear as anybody’s, young or old, among the fraternity of
pilots. He was the patriarch of the craft; he had been a keelboat pilot
before the day of steamboats; and a steamboat pilot before any other
steamboat pilot, still surviving at the time I speak of, had ever turned
a wheel. Consequently his brethren held him in the sort of awe in
which illustrious survivors of a bygone age are always held by their
associates. He knew how he was regarded, and perhaps this fact added
some trifle of stiffening to his natural dignity, which had been
sufficiently stiff in its original state.

He left a diary behind him; but apparently it did not date back to his
first steamboat trip, which was said to be 1811, the year the first
steamboat disturbed the waters of the Mississippi. At the time of his
death a correspondent of the ‘St. Louis Republican’ culled the following
items from the diary–

‘In February, 1825, he shipped on board the steamer “Rambler,” at
Florence, Ala., and made during that year three trips to New Orleans and
back–this on the “Gen. Carrol,” between Nashville and New Orleans. It
was during his stay on this boat that Captain Sellers introduced the tap
of the bell as a signal to heave the lead, previous to which time it was
the custom for the pilot to speak to the men below when soundings were
wanted. The proximity of the forecastle to the pilot-house, no doubt,
rendered this an easy matter; but how different on one of our palaces of
the present day.

‘In 1827 we find him on board the “President,” a boat of two hundred and
eighty-five tons burden, and plying between Smithland and New Orleans.
Thence he joined the “Jubilee” in 1828, and on this boat he did his
first piloting in the St. Louis trade; his first watch extending from
Herculaneum to St. Genevieve. On May 26, 1836, he completed and left
Pittsburgh in charge of the steamer “Prairie,” a boat of four hundred
tons, and the first steamer with a STATE-ROOM CABIN ever seen at St.
Louis. In 1857 he introduced the signal for meeting boats, and which
has, with some slight change, been the universal custom of this day; in
fact, is rendered obligatory by act of Congress.

‘As general items of river history, we quote the following marginal
notes from his general log–

‘In March, 1825, Gen. Lafayette left New Orleans for St. Louis on the
low-pressure steamer “Natchez.”

‘In January, 1828, twenty-one steamers left the New Orleans wharf to
celebrate the occasion of Gen. Jackson’s visit to that city.

‘In 1830 the “North American” made the run from New Orleans to Memphis
in six days–best time on record to that date. It has since been made in
two days and ten hours.

‘In 1831 the Red River cut-off formed.

‘In 1832 steamer “Hudson” made the run from White River to Helena, a
distance of seventy-five miles, in twelve hours. This was the source of
much talk and speculation among parties directly interested.

‘In 1839 Great Horseshoe cut-off formed.

‘Up to the present time, a term of thirty-five years, we ascertain, by
reference to the diary, he has made four hundred and sixty round trips
to New Orleans, which gives a distance of one million one hundred and
four thousand miles, or an average of eighty-six miles a day.’

Whenever Captain Sellers approached a body of gossiping pilots, a chill
fell there, and talking ceased. For this reason: whenever six pilots
were gathered together, there would always be one or two newly fledged
ones in the lot, and the elder ones would be always ‘showing off’ before
these poor fellows; making them sorrowfully feel how callow they were,
how recent their nobility, and how humble their degree, by talking
largely and vaporously of old-time experiences on the river; always
making it a point to date everything back as far as they could, so as to
make the new men feel their newness to the sharpest degree possible,
and envy the old stagers in the like degree. And how these complacent
baldheads WOULD swell, and brag, and lie, and date back–ten, fifteen,
twenty years,–and how they did enjoy the effect produced upon the
marveling and envying youngsters!

And perhaps just at this happy stage of the proceedings, the stately
figure of Captain Isaiah Sellers, that real and only genuine Son of
Antiquity, would drift solemnly into the midst. Imagine the size of the
silence that would result on the instant. And imagine the feelings of
those bald-heads, and the exultation of their recent audience when the
ancient captain would begin to drop casual and indifferent remarks of a
reminiscent nature–about islands that had disappeared, and cutoffs that
had been made, a generation before the oldest bald-head in the company
had ever set his foot in a pilot-house!

Many and many a time did this ancient mariner appear on the scene in the
above fashion, and spread disaster and humiliation around him. If one
might believe the pilots, he always dated his islands back to the misty
dawn of river history; and he never used the same island twice; and
never did he employ an island that still existed, or give one a name
which anybody present was old enough to have heard of before. If you
might believe the pilots, he was always conscientiously particular about
little details; never spoke of ‘the State of Mississippi,’ for instance
–no, he would say, ‘When the State of Mississippi was where Arkansas
now is,’ and would never speak of Louisiana or Missouri in a general
way, and leave an incorrect impression on your mind–no, he would say,
‘When Louisiana was up the river farther,’ or ‘When Missouri was on the
Illinois side.’

The old gentleman was not of literary turn or capacity, but he used
to jot down brief paragraphs of plain practical information about the
river, and sign them ‘MARK TWAIN,’ and give them to the ‘New Orleans
Picayune.’ They related to the stage and condition of the river, and
were accurate and valuable; and thus far, they contained no poison.
But in speaking of the stage of the river to-day, at a given point, the
captain was pretty apt to drop in a little remark about this being the
first time he had seen the water so high or so low at that particular
point for forty-nine years; and now and then he would mention Island
So-and-so, and follow it, in parentheses, with some such observation
as ‘disappeared in 1807, if I remember rightly.’ In these antique
interjections lay poison and bitterness for the other old pilots, and
they used to chaff the ‘Mark Twain’ paragraphs with unsparing mockery.

It so chanced that one of these paragraphs–{footnote [The original MS.
of it, in the captain’s own hand, has been sent to me from New Orleans.
It reads as follows–

VICKSBURG May 4, 1859.

‘My opinion for the benefit of the citizens of New Orleans: The water
is higher this far up than it has been since 8. My opinion is that the
water will be feet deep in Canal street before the first of next June.
Mrs. Turner’s plantation at the head of Big Black Island is all under
water, and it has not been since 1815.

‘I. Sellers.’]}

became the text for my first newspaper article. I burlesqued it broadly,
very broadly, stringing my fantastics out to the extent of eight hundred
or a thousand words. I was a ‘cub’ at the time. I showed my performance
to some pilots, and they eagerly rushed it into print in the ‘New
Orleans True Delta.’ It was a great pity; for it did nobody any worthy
service, and it sent a pang deep into a good man’s heart. There was no
malice in my rubbish; but it laughed at the captain. It laughed at a man
to whom such a thing was new and strange and dreadful. I did not know
then, though I do now, that there is no suffering comparable with that
which a private person feels when he is for the first time pilloried in

Captain Sellers did me the honor to profoundly detest me from that day
forth. When I say he did me the honor, I am not using empty words. It
was a very real honor to be in the thoughts of so great a man as Captain
Sellers, and I had wit enough to appreciate it and be proud of it. It
was distinction to be loved by such a man; but it was a much greater
distinction to be hated by him, because he loved scores of people; but
he didn’t sit up nights to hate anybody but me.

He never printed another paragraph while he lived, and he never again
signed ‘Mark Twain’ to anything. At the time that the telegraph brought
the news of his death, I was on the Pacific coast. I was a fresh new
journalist, and needed a nom de guerre; so I confiscated the ancient
mariner’s discarded one, and have done my best to make it remain what it
was in his hands–a sign and symbol and warrant that whatever is found
in its company may be gambled on as being the petrified truth; how I
have succeeded, it would not be modest in me to say.

The captain had an honorable pride in his profession and an abiding love
for it. He ordered his monument before he died, and kept it near
him until he did die. It stands over his grave now, in Bellefontaine
cemetery, St. Louis. It is his image, in marble, standing on duty at
the pilot wheel; and worthy to stand and confront criticism, for it
represents a man who in life would have stayed there till he burned to a
cinder, if duty required it.

I find it interesting that the part that this chapter focuses on is always left out from the usual accounts, as far as I know (I am not a Mark Twain scholar, so I am only talking about what I have read).

I also feel that there is a lesson somewhere in this story for those who are receptive. How many would be receptive to such a lesson is something depressing to think about these days.

As a bonus for having read thus far, I invite you to read this, which was not published in his lifetime and about which he said, “I don’t think the prayer will be published in my time. None but the dead are permitted to tell the truth.”.

May 19, 2011

Sicilian Grand Prix Attack?

There is a website that I have, which has been inoperative for some time. There was not much content on it anyway. However, while working on another site located on the same server, I noticed that the site was being accessed heavily, but since it is inoperative, the web server is logging the errors.

This started on May 11th, 2011. The error log has become huge by the standards of any website that I maintain. It’s size is 8 MB. It has more than 60000 entries, most of them being for the inoperative site I mentioned. And the total number of *distinct* IPs from which the site has been accessed is nearly 20000: way beyond the traffic that I get for even those sites which are operative and regularly used.

Two of the entries in log file indicated that someone had posted a link to download a free book called ‘Starting Out: The Sicilian Grand Prix Attack’. But there has been no facility to add comments for this site on this server, although there was on the server where the site was earlier installed. So perhaps the cached post was from the earlier server.

The important thing is, there were only two requests for this post or this link to the book.

But then I searched for it on Google and saw the cached post about an hour ago.

From a few minutes after that, there is a flood of requests for the same link to the book on Sicilian Grand Prix Attack, even though the site is still inoperative. There are also more attempts to add new comments.

The ‘attack’ seems to continue and the size of the error log file is growing even now.

Meaning what? You tell me.

[Some information that might perhaps be relevant: The site was about a query language that I have designed. I had submitted a paper about this language to a workshop at a very prestigious conference. The paper was rejected. I received the notification on the 7th. Over the next two days, I had an exchange of emails with the PC chairs and the organizers of the workshop about my dissatisfaction with the reviews and the reviewing process. I also asked them to forward my comments to the reviewers. I could be identified from my comments, even if my name had been removed.]

(There are many simple explanations of the above. One of them is that the writer is a moron.)

June 16, 2009

Walls have Fears

On walls live creatures
They don’t just have ears
They have eyes and they have teeth
And they sure don’t have tears

What adds to their terrors
Is that they can’t be easily seen
But you can feel their presence
If you are one of their victims

They can communicate with each other
With a system more sophisticated
Than that of elephants or whales
It’s so sophisticated that only
Intelligent Design can explain them

They have concrete manifestations
But they are mostly abstract
No wonder so is their food
They don’t eat your meat
They eat your lives and your work and your protestations

You can be safe from them if you want
It’s all a matter of belief and loyalty and obedience
As it has always been through the ages
With other kinds of fearsome creatures

The question is whether you accept
The benevolent supremacy of the Intelligent Designer
Who put them there to watch over you

Just believe and abide and salvation can be yours
Don’t and you, with your work and your life
Can be completely mucked up, inside and outdoors

May 3, 2009

Rhetorical Questions on Ownership

If I compose a poem
While visiting your home
And having a post-meal nap
In your home
Does the poem belong to you?

If I write a poem
On the last page of the notebook
That you gave me and
Which contains the addresses
Of the people to whom I deliver
Items of furniture
As a means of survival
Does the poem belong to you?

If I live in a small room
Crammed with all my current
And parts of my old life
And I pay the standard rent
Regularly for the room
Like everyone else
Does a poem written in that room
Belong to you
Because I used a room owned by you?

If I burn my blood
Day and night, apart from
Doing my work under your pay
And manage to finish
A life sapping and lifespan reducing epic
Does the epic belong to you
Because I wrote it while working for you
And sometimes using your pen and ink?

But you didn’t pay me for writing it
You didn’t even ask me to write it
Most probably you didn’t even want me to
Because you don’t care for things
Written by nobodies who are working for you
And which are not worth much in the market

It may be a two penny epic
But does it belong to you?

If it happens to become a million dollar one
Does it then belong to you?

If I sit on the railway station
While waiting for a train
In the station restaurant
And write a poem on the tissue paper
Provided to me by the restaurant owner
Does the poem belong to the restaurant?

If my laptop is not working
And I borrow yours
And while I am using it
I write a poem using your laptop
Does the poem belong to you?

What if I even used
One or two words written
On the calendar hanging on your wall
Written on the cover of the notebook of addresses
Or on the hoarding visible
Only from the window of the room
Rented by me and owned by you
What if I referred to images
I see on the railway station
Or flashing on the T.V. in the restaurant
Something on the screensaver of your laptop
Or a line written on the notes
With which you paid me
Does the poem belong to you?

The poem that you keep reading
And may be keep damning
But don’t have to pay me extra for
Does it belong to you?

It does, does it?
Well, as a reader
Or as a property owner?

April 16, 2009

Accepted, but not Published

Academicians or researchers list their publications prominently on their home pages. After all, it is supposed to represent the best of their work. They also quite often (especially those who have a large number of publications) categorize them according to some criteria like the venue (workshop, conference, journal or book: in the reverse order of prominence) or peer review (unrefereed and refereed).

In this post we propose that there should be a new category of publications. This category is needed because a lot of researchers (for good or for bad) now come from underprivileged countries. For most of these researchers, traveling abroad to attend a conference, even if their paper has been accepted, is something very hard to do. In some sense even more than getting a paper accepted, which is relatively harder too, given the lack of certain privileges — whether you like the word or not — generous research grants, infrastructure, language resources etc., combined with the prejudice (it is there: I am not inventing it, whoever might be blamed for it). To these problems can be added the problem of compulsory attendance at a conference or a workshop. It is partly these conditions which have prompted suggestions from certain quarters that researchers from these countries should concentrate on journal papers (never mind the delay and difficulties involved or the unfairness of the proposition, even though it has some practical justification).

But you can never be sure while submitting that you certainly won’t be able to attend. Also, hope is said to be a good thing. Therefore, the event of a researcher submitting a paper and hoping to attend but not being able to attend cannot be ruled out.

This bring us to the proposal mentioned earlier. One solution to this problem is that there should be another category of papers: accepted but not published, because the author couldn’t afford to attend the conference or the workshop. (By the way, workshops are the most happening places nowadays: more on that later).

The author of this post must know because he has authored more than one such publications.

Of course, the condition will be that if and when such a paper is resubmitted (with or without modifications, but without any substantial new work), accepted again and finally published, the entry marked as ‘accepted’ should be removed and replaced by an entry marked as ‘published’.

After all, if we are serious about research, then the work (which has been peer reviewed and accepted) should be given somewhat more importance than some pages printed in some proceedings (or attendance in a conference for that matter).

This, of course, doesn’t mean that you can get basically the same thing published (or accepted) in more than one places.

(Sorry for the Gory Details)

P.S.: May be there is no need for the above apology as the depiction of the Gory Details of the Indian Reality is now getting multiple Oscars (The Academy Awards: the keyword is Academy). But may be there is because some researchers have a more (metaphorically) delicate constitution which can be hurt by the Gory Details.

Queen’s P.S.: Off with his head!

October 28, 2008

सांगणिक भाषाविज्ञान

जैसा मैंने पिछली प्रविष्टी (‘पोस्ट’ के लिए यह शब्द इस्तेमाल हो सकता है?) में लिखा था, अगले कुछ हफ्तों में मैं संचय के बारे में लिखने जा रहा हूं।

लेकिन क्योंकि संचय खास तौर पर (आम उपयोक्ताओं के अलावा) सांगणिक भाषाविज्ञान या भाषाविज्ञान के शोधकर्ताओं के लिए बनाया गया है, इस बात को साफ कर देना ठीक रहेगा कि सांगणिक भाषाविज्ञान या भाषाविज्ञान के माने क्या है, या अगर आप इनके माने जानते ही हैं तब भी इनसे मेरा अभिप्राय क्या है। यह दूसरी बात इसलिए कि इन विषयों (सांगणिक भाषाविज्ञान या भाषाविज्ञान) के अर्थ के बारे में आम लोगों में तो तमाम तरह की ग़लतफ़हमियाँ हैं ही, पर इन विषयों के शोधकर्ताओं में भी इनकी परिभाषा पर एक राय नहीं है।

सच तो यह है कि हिंदी जगत में तो अब भी अधिकतर लोग भाषाविज्ञान का अर्थ उस तरह के अध्ययन से लगाते हैं जो पिछली सदी के शुरू में लगाया जाता था। लेकिन बहस की इस दिशा में अभी मैं नहीं जाना चाहूंगा क्योंकि इसके बारे में कहने को इतना अधिक है कि अभी जो उद्देश्य है वो पीछे ही रह जाएगा।

वैसे सांगणिक भाषाविज्ञान या भाषाविज्ञान की परिभाषा या उनकी सीमाओं के बारे में भी कहने को बहुत-बहुत कुछ है, पर फिलहाल थोड़े से ही काम चलाया जा सकता है।

तो छोटे में कहा जाए तो भाषाविज्ञान शोध या अध्ययन का वह विषय है जिसमें किसी एक भाषा के व्याकरण का ही अध्ययन नहीं किया जाता बल्कि नैसर्गिक या मानुषिक (यानी कृत्रिम नहीं) भाषा का वैज्ञानिक रूप से अध्ययन किया जाता है। अब यह धारणा व्यापक रूप से स्वीकृत है कि मानव मस्तिष्क की संरचना का भाषा की संरचना से सीधा संबंध है और क्योंकि सभी मानवों के मस्तिष्क की संरचना मूलतः एक ही जैसी है, तो सभी नैसर्गिक या मानुषिक भाषाओं में भी सतही लक्षणों को छोड़ कर बाकी सब एक ही जैसा है। इसीलिए, जैसा कि इन विषयों के आधुनिक साहित्य में प्रसिद्ध है, अगर किसी अमरीकी के शिशु को जन्म के तुरंत बाद कोई चीनी परिवार गोद ले ले और वह बच्चा चीन में ही पले तो वह उतनी आसानी से चीनी बोलना सीखेगा जितनी आसानी से कोई चीनी परिवार का बच्चा। ऐसी ढेर सारी और बातें हैं, पर मुख्य बात है कि भाषाविज्ञान नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन है।

कम से कम कोशिश तो यही है कि अध्ययन वैज्ञानिक रहे, पर वो वास्तव में रह पाता है या नहीं, यह बहस का विषय है।

अब सांगणिक भाषाविज्ञान पर आएं तो इस विषय में हमारा ध्यान मानवों की बजाय संगणक यानी कंप्यूटर पर आ जाता है, पर पिछली शर्त फिर भी लागू रहती है: नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन। अंतर यह है कि हमारा उद्देश्य अब यह हो जाता है कि कंप्यूटर को इस लायक बनाया जा सके कि वो नैसर्गिक या मानुषिक भाषा को समझ सके और उसका प्रयोग कर सके। जाहिर है यह अभी बहुत दूर की बात है और इसमें कोई आश्चर्य भी नहीं होना चाहिए क्योंकि अभी भाषाविज्ञान में ही (पिछली सदी की असाधारण उपलब्धियों के बाद भी) वैज्ञानिक ढेर सारी बाधाओं में फंसे हैं।

फिर भी, सांगणिक भाषाविज्ञान में काफ़ी कुछ संभव हो चुका है और काफ़ी कुछ आगे (निकट भविष्य में) संभव हो सकता है। लेकिन इसमें कंप्यूटर का मानव जैसे भाषा बोलना-समझना शामिल नहीं है। जो शामिल है वो हैं ऐसी तकनीक जो दस्तावेजों को ज़्यादा अच्छी तरह ढूंढ सकें, उनका सारांश बना सकें, कुछ हद तक उनका अनुवाद कर सकें आदि।

लेकिन हिंदुस्तानी परिप्रेक्ष्य में परेशानी यह है कि हम अभी इस हालत में भी नहीं पहुंचे हैं कि आसानी से कंप्यूटर का एक बेहतर टाइपराइटर की तरह ही उपयोग कर सकें। इस दिशा में कुछ उपलब्धियाँ हुई हैं, पर अंग्रेज़ी या प्रमुख यूरोपीय भाषाओं की तुलना में हम कहीं भी नहीं हैं। जैसा कि आपमें से अधिकतर जानते ही हैं, यह एक लंबी कहानी है जिसे अभी छोड़ देना ही ठीक है।

पर संचय का विकास इसी परिप्रेक्ष्य में किया गया है, जिसके बारे में आगे बात करेंगे।

October 26, 2008

संचय का परिचय

पिछली पोस्ट (शर्म के साथ कहना पड़ रहा है कि पोस्ट के लिए कोई उपयुक्त शब्द नहीं ढूंढ पा रहा हूं) में मैंने (अंग्रेज़ी में) संचय के नये संस्करण के बारे में लिखा था। मज़े की बात है कि संचय के बारे में मैंने अभी हिंदी में शायद ही कुछ लिखा हो। इस भूल को सुधारने की कोशिश में अब अगले कुछ हफ्तों में संचय के बारे में कुछ लिखने का सोचा है।

तो संचय कौन है? या संचय क्या है?

पहले सवाल का तो जवाब (अमरीकी शब्दावली में) यह है कि संचय एक सिंगल पेरेंट चाइल्ड है जिसे किसी वेलफेयर का लाभ तो नहीं मिल रहा पर जिस पर बहुत सी ज़िम्मेदारियाँ हैं।

दूसरे सवाल का जवाब यह है कि संचय सांगणिक भाषाविज्ञान (कंप्यूटेशनल लिंग्विस्टिक्स) या भाषाविज्ञान के क्षेत्र में काम कर रहे शोधकर्ताओं के लिए उपयोगी सांगणिक औजारों का एक मुक्त (मुफ्त भी कह सकते हैं) तथा ओपेन सोर्स संकलन है। पर खास तौर से यह कंप्यूटर पर भारतीय भाषाओं का उपयोग करने वाले किसी भी व्यक्ति के काम आ सकता है। इसकी एक विशेषता है कि इसमें नयी भाषाओं तथा एनकोडिंगों को आसानी से शामिल किया जा सकता है। लगभग सभी प्रमुख भारतीय भाषाएं इसमें पहले से ही शामिल हैं और संचय में उनके उपयोग के लिए ऑपरेटिंग सिस्टम पर आप निर्भर नहीं है, हालांकि अगर ऑपरेटिंग सिस्टम में ऐसी कोई भी भाषा शामिल है तो उस सुविधा का भी आप उपयोग संचय में कर सकते हैं। यही नहीं, संचय का एक ही संस्करण विंडोज़ तथा लिनक्स/यूनिक्स दोनों पर काम करता है, बशर्ते आपने जे. डी. के. (जावा डेवलपमेंट किट) इंस्टॉल कर रखा हो। यहाँ तक कि आपकी भाषा का फोंट भी ऑपरेटिंग सिस्टम में इंस्टॉल होना ज़रूरी नहीं है।

संचय का वर्तमान संस्करण 0.3.0 है। इस संस्करण में पिछले संस्करण से सबसे बड़ा अंतर यह है कि अब एक ही जगह से संचय के सभी औजार इस्तेमाल किए जा सकते हैं, अलग-अलग स्क्रिप्ट का नाम याद रखने की ज़रूरत नहीं है। कुल मिला कर बारह औजार (ऐप्लीकेशंस) शामिल किए गए हैं, जो हैं:

  1. संचय पाठ संपादक (टैक्सट एडिटर)
  2. सारणी संपादक (टेबल एडिटर)
  3. खोज-बदल-निकाल औजार (फाइंड रिप्लेस ऐक्सट्रैक्ट टूल)
  4. शब्द सूची निर्माण औजार (वर्ड लिस्ट बिल्डर)
  5. शब्द सूची विश्लेषण औजार (वर्ड लिस्ट ऐनेलाइज़र ऐंड विज़ुअलाइज़र)
  6. भाषा तथा एनकोडिंग पहचान औजार (लैंग्वेज ऐंड एनकोडिंग आइडेंटिफिकेशन)
  7. वाक्य रचना अभिटिप्पण अंतराफलक (सिन्टैक्टिक ऐनोटेशन इंटरफेस)
  8. समांतर वांगमय अभिटिप्पण अंतराफलक (पैरेलल कोर्पस ऐनोटेशन इंटरफेस)
  9. एन-ग्राम भाषाई प्रतिरूपण (एन-ग्राम लैंग्वेज मॉडेलिंग टूल)
  10. संभाषण वांगमय अभिटिप्पण अंतराफलक (डिस्कोर्स ऐनोटेशन इंटरफेस)
  11. दस्तावेज विभाजक (फाइल स्प्लिटर)
  12. स्वचालित अभिटिप्पण औजार (ऑटोमैटिक ऐनोटेशन टूल)

अगर इनमें से अधिकतर का सिर-पैर ना समझ आ रहा हो तो थोड़ा इंतज़ार करें। आगे इनके बारे में अधिक जानकारी देने की कोशिश रहेगी।

शायद इतना और जोड़ देने में कोई बुराई नहीं है कि संचय पिछले कुछ सालों से इस नाचीज़ के जिद्दी संकल्प का परिणाम है, जिसमें कुछ और लोगों का भी सहयोग रहा है, चाहे थोड़ा-थोड़ा ही। उन सभी लोगों के नाम संचय के वेबस्थल पर जल्दी ही देखे जा सकेंगे। ये लगभग सभी विद्यार्थी हैं (या थे) जिन्होंने मेरे ‘मार्गदर्शन’ में किसी परियोजना – प्रॉजेक्ट – पर काम किया था या कर रहे हैं।

उम्मीद है कि संचय का इससे भी अगला संस्करण कुछ महीने में आ पाएगा और उसमें और भी अधिक औजार तथा सुविधाएं होंगी।

October 5, 2008

Good News and Bad News on the CL Front

First, as the saying goes, the bad news. We had submitted a proposal for the Second Workshop on NLP for Less Privileged Languages for the ACL-affiliated conferences. That proposal has not been accepted. Total proposals submitted were 41 and 34 out of them were accepted. Ours was among the not-accepted seven (euphemisms can be consoling).

Was is that bad? I hope not.

Don’t those capital letters look silly in the name of a rejected proposal?

Now the good news. The long awaited new version of Sanchay has been released on Sourceforge. (Well, at least I was awaiting). This version has been named (or numbered?) 0.3.0.

The new Sanchay is a significant improvement over the last public version (0.2). It now has one main GUI from which all the applications can be controlled. There are twelve (GUI based) applications which have been included in this version. These are:

  • Sanchay Text Editor that is connected to some other NLP/CL components of Sanchay.
  • Table Editor with all the usual facilities.
  • A more intelligent Find-Replace-Extract Tool (can search over annotated data and allows you to see the matching files in the annotation interface).
  • Word List Builder.
  • Word List FST (Finite State Transducer) Visualizer that can be useful for anyone working with morphological analysis etc.
  • One of the most accurate Language and Encoding Identifier that is currently trained for 54 langauge-encoding pairs, including most of the major Indian languages. (Yes, I know there is a number agreement problem in the previous sentence).
  • A user friendly Syntactic Annotation Interface that is perhaps the most heavily used part of Sanchay till now. Hopefully there will be an even more user friendly version soon.
  • A Parallel Corpus Annotation Interface, which is another heavily used component. (Don’t take that ‘heavily’ too seriously).
  • An N-gram Language Modeling Tool that allows you to compile models in terms of bytes, letters and words.
  • A Discourse Annotation Interface that is yet to be actually used.
  • A more intelligent File Splitter.
  • An Automatic Annotation tool for POS (Part Of Speech) tagging, chunking and Named Entity Recognition. The first two should work reasonably well, but the last one may not be that useful for practical purposes. This is a CRF (Conditional Random Fields) based tool and it has been trained for Hindi for these three purposes. If you have annotated data, you can use it to train your own taggers and chunkers.

All these components use the customizable language-encoding support, especially useful for South Asian languages, that doesn’t need any support from the operating system or even the installation of any fonts, although these can still be used inside Sanchay if they are there.

More information is available at the Sanchay Home.

The capitals don’t look so bad for a released version.

The downside of even this good news is that my other urgent (to me) work has got delayed as I was working almost exclusively on bringing out this version for the last two weeks or so.

But then you need a reason to wake up and Sanchay is one of my reasons. And I can proudly say that a half-hearted attempt to generate funding for this project by posting it on Micropledge has generated 0$.

Sanchay is still alive as a single parent child without any welfare but with a lot of responsibilities.

Now I can have nightmares about the bugs.

September 16, 2008

Dried Access Denied after Dinner

Acccess Denied to WordPress

September 2, 2008

Fried Access Denied at Breakfast

Access Denied to IIIT Access Denied to LTRC Access Denied to The Hindu
