May 22, 2009

How Many Grams?

There is an automatically (intelligently) generated blog which I have read recently.

It appears to be (let’s give ‘seems’ some rest) quite a popular one in a certain section.

I know the corpus on which it was trained.

And the corpus on which it was retrained.

(Including most of the quotes and the comments, especially the long ones).

But I wonder whether the order of n-grams was five or six.

It is definitely better than four grams.

It could even be Se7en.

This brings up a new idea.

What about writing a paper on automatically guessing the order of n-grams, given some generated text?

It may be difficult in the general case, but in our case we know the corpus on which it was trained.

Any takers?

December 9, 2008

Talking about The Invaders

I am down with (relatively high) fever after a long time. This blog (before this one) had 99 posts. It seems nice to have the 100th. Round figures. The Decimal System of Indian Origin. A milestone. You get the picture. The number. The destination.

Or may be you don’t. What can I do about that?

I am still not sure why Catch 22. Or why Room 101 for that matter.

But I don’t feel like writing a post. So what I will do is, I will reproduce (with some proof reading of my comments) a post by someone else to which I had made many comments. Why do I reproduce? Can’t I simply provide a link to it (I already have)? Well, the reason is that I had a long exchange of comments on the same blog earlier on a matter that seemed important to me. But the post as well as the comment are now gone from that blog.

So, just in case something like that happens again, the exchange can be available here.

The Invaders

By Arfi

They met deep in the jungle almost every other weekend.

They were a motley group of men and women of all ages and professions who had found each other over the Internet. And over time, through discussions in forums or by way of certain books they had all been drawn to the movement. A movement that promised to restore to them, what they believed, had once rightfully been theirs. They met in the small forested area that lay on the outskirts of the city – away from prying eyes and curious onlookers.

That particular morning, at the edge of the forest, about 15 of them had turned up. They had all been sent e-mails in advance intimating them about the time and place of the meet. For some in the group this was their very first meeting and it showed in the nervous twitches that afflicted their fingers. Looking at them, you would not be wrong if you concluded that they seemed overtly secretive about these gatherings. The leader of the group – a gaunt bearded man somewhere in his fifties and clad in old jeans and a khadi kurta – had the air of the old revolutionary about him. He carefully scanned their faces, perhaps looking for signs that could tell him which of them would make good foot-soldiers for the movement. But even he seemed jumpy and constantly looked over his shoulders, as if he couldn’t wait to get out of the open and into the woods.

After waiting a few more minutes for any stragglers that might still show up, he signalled them and they all filed silently behind him. The group started moving into the forest. He had asked them to walk in silence and make as little noise as possible. The morning mist hung in the canopy of trees and the whole atmosphere oozed, of mystery – if not of revolution, as yet. Bird calls could be heard, and now and then, the sharp sound of a dry twig snapping under a pair of purposeful feet would pierce the morning air.

After a fifteen minute trek, they reached a natural clearing in the forest encircled by trees on all sides. The filtered morning light that fell into this clearing had a strange ethereal quality to it. The more spiritually inclined amongst them took it as a sign that their cause was just. In the middle of the clearing, the remains of a dead fire could be seen. The leader deep in thought and running his hand through his beard circled it a few times and poked at the ashes with a twig. He then looked around and suspiciously sniffed at the winter air. The others, looked at each-other in turn. A mixture of fear and excitement played on their faces.

The leader motioned them to form a circle around him. He then pulled out a sheaf of papers from the jhola he was carrying and was engrossed in them for a few minutes. They all waited in silence, nervously shifting on their feet. The leader then stepped onto the small mound of charred wood and ash which had inadvertently become the centre of this human circle, and though hardly a few inches overground, now acted as his pedestal.

He waved the sheets of paper in his hand and addressed them in an impassioned voice.

“Do you know what this tells me ? It tells me we have been invaded. If you read this, you would realise the level of threat we are under. And I am not talking about something that can be left for the government to deal with. They would never acknowledge this and they have already branded us as troublemakers anyway. We need affirmative action and we need it now because what I am talking about is nothing less than the threat of extinction. Extinction from our land. The invasion of our country. And it is time that those of us who understand this, step up and deal with it. Let me read out to you.”

He then read out the summary of the report to them. Having finished, he put the papers back into his jhola and picked up the twig instead. He started waving it around like a conductor, to the ebb and flow of his own rage and continued.

“They came to this country in waves. You could say they were even brought here by our own people in some cases and now look all around you. They have taken over this land, have pushed back the natives. These aliens, aggressive by nature and forever sucking the earth dry, have spread and multiplied right under our noses and what have we ever done about it. Nothing. They are vicious and cunning, quick to adapt and blend in, but do not be fooled because with every passing moment they are forcing the natives out. They breed – if I can even call it breeding – like rats and change the entire balance of the place they show up in, in a few years. They have polluted our environment and now even threaten our backyards. But it’s still not too late. Because now we have awakened. Now we know. And now is the time that we push them out and reclaim and replant what is ours. Trees like Acacia farnesiana and Acacia mearnsii have no place in our ecosystem. We must correct the past mistakes or these alien species; not only of trees, but herbs and shrubs too, would irreversibly change the climate and environment of our land. We must at once begin the process of eco-restoration. We must secure this land for our children and for our future generations.”

The leader stepped down from the mound of ash to a round of applause. The gathering then broke into smaller groups and started studying the flora around them.

Comment by Banno:

The language of inclusion and exclusion remains the same whatever one is talking about, isn’t it? Liked it much.

Comment by Arfi:

Yes, strange but true. Was reading an obscure report on this and later something about Krishnen and it was the language that struck me – the way it was used.

Glad you like it.

Comment by me:

Do you actually realize what you are talking about?

You are in serious danger of becoming something like a Madhur Bhandarkar.

Comment by Arfi:

Hmm.. Madhur Bhandrakar – I hope not. Though in serious danger does sound almost irrevocable.

If you have read the labels with the post you would have noticed that I have labeled it as a caricature.

The point I wanted to make was about the use of language – which is so malleable that it can lend itself to any ideology, even if they stand at opposite ends. The entirely exaggerated narrative, atleast to me, clearly reads as such.

Comment by me:

I saw the caricature label, but I would still say that what you have written translates simply as this:

‘Left is equal to right and both are equally bad. Therefore centre is the best.’

And what is not stated but is usually the de facto meaning in such cases is that whatever is the status quo is the centre. Therefore whatever is, let it be, because that’s the best you can get.

This is the fashionable view in these days of clearly visible across-the-spectrum right-shift. In fact, this view (intentionally or unintentionally) serves to mask the shift.

The problem is that you can only write as well as you can read and, to be a bit harsh again, you don’t seem to read so well. But you are not alone in this. People who are really good at reading are much rarer than is usually assumed. Most people (and here I only talk about the intellectual type) are bad readers.

This is criticism. But it can be taken as an advice because reading skills can be improved. And I am sure you anyway didn’t expect a false pat on the back from me.

It would be a sad thing if, in spite of your writing skills, your writing doesn’t go where you wanted it to go because you can’t clearly see where you are going.

Comment by Arfi:

I welcome criticism, even more so coming from you. It helps unravel the thinking process – possibly at both ends.

The way one reads anything, as you correctly point out, reflects in our writing. And when we approach a text we bring to it our own world-view and politics which act as a sort of filtering mechanism or a highlighter – depending on whether you are trying to avoid or enforce certain beliefs – so one ends up glossing over some things and re-enforcing others. But this too is an evolving process, as we know from reading good literature – as to how it reads differently and leaves you with more each time you revisit it. This tells me that all hope is not yet lost and I might still become a good reader.

Now coming to the post itself, I dont know what exactly disappoints you. Is it that it does not take any stand – as I see it; or that it advocates maintaining a status quo – as you seem to have read it. It surely cannot be that I invoked Krishnen’s name :) (nothing and no one should be sacred, right ?)

Now why I wrote it the way I did was because of certain things coming together. I had gone on a nature walk in Uttaranchal with some local people, who are doing some really wonderful work related to eco-restoration and self-management of forested areas, and the politics of that movement would (and has) greatly stretched the right-centre-left spectrum that you have talked about. It’s quite obvious to which end and to whose discomfort.

But again like I said earlier what I found deeply ironical was the use of language when I was talking about some of those issues with them. It made me smile not in a derisive way but the way we smile when we realise, that strange though it is, the joke somehow is upon us. And that’s where this post comes from.

I cannot go ahead and declare – even though I would like to – that this here is my political stand; simply because I don’t have a one word label to express it. The labeling of views as centrist, rightist and left-leaning doesn’t help because even the connotations of these labels change depending
on the platform and the issues under discussion. Yes, right is centre now and forever pushing across, and yet the left doesn’t move away ? Old story.

But in the end, the fault perhaps lies in the post itself if it translates for you, to a one line false-hood of Left is equal to right and both are equally bad. Therefore centre is the best.

So I guess, it’s time for me to start reading in earnest, though even then I suspect that it would be difficult to know for sure, as to where everything is headed. :)

Good to have you here after a long gap.


Comment by me:

I knew what you were trying to say and also the fact that you were interested in the language (so am I).

The process of writing indeed evolves. But the problem is that once you write something, there is unconscious pressure on you (from yourself, your ego etc., if not from others) to then defend and stand by what you have written. This can come in the way of evolution, especially when your writing gets ahead of your reading, as I think is happening in your case.

I am glad that you are prepared to consider my suggestion. Actually, for people who restrict themselves to very narrow domains, this is less of a problem, but for people like you and me who want to write about almost everything, there is a serious risk of getting trapped in a net of our own making. (To digress, that is what seems to have happened with Ram Guha, among others). That’s why it’s very important to be a good reader so that you can read your own writing and decide whether it is expressing just what you wanted to say.

About the language, it is important to note that you can’t really look at such language of politics in isolation and ‘impartially’. Even if you explicitly don’t side with anyone, you are actually siding with the currently dominant party and, in a way, you are supporting the status quo. That the ‘language of inclusion or exclusion’ remains the same doesn’t change the fact that inclusion and exclusion can be very real. Therefore, the use of the same language can be valid in some cases and completely invalid in some other cases. To complicate this, there is the fact that there may be gray areas and partially valid cases or even cases where more than one parties have valid grievances with respect to inclusion or exclusion. Treating the language in isolation and supposedly impartially is thus a very political statement itself (whether you intend it to be or not).

But anyway, since you got my meaning, I hope I will have less (or no) reason for complaint in future.

And, no, Pradip Krishnen is not the issue. I am not even sure which Pradip Krishnen you mean. Perhaps you mean Pradip Krishen the movie maker and of the Trees of Delhi fame. I don’t know much about him. And I don’t think we should treat him or anyone else as too sacred to be criticized.

My main concern is that you have potential for good writing, so you should be writing in a way to realize that potential. You know that I don’t comment too often or at too many places.

Comment by Arfi:

I do concede the point that the use of language does not stand in isolation. Infact a writer steps into a virtual minefield, especially in the realm of fiction, when he dares to venture beyond the traditional fault-lines. He goes there because those spaces – the gray areas – need to be addressed, but at the same time, also require an extremely nuanced handling.

What also interests me is the unraveling and composition of layers, and the ambiguity that a well written text offers; where the reader shapes the meaning which entirely depends on what he brings to it. His interpretation says a lot – both about himself and the writer – and this ambiguity is quite difficult to achieve.

Guha’s is an interesting case. He is currently being heckled down by both sides. It would be amusing to see how it all unfolds.

Yes, I meant Pradip Krishen, not Krishnen. And I do realize that re-reading and re-drafting one’s work is almost a never ending process.

Comment by me:

>> “the ambiguity that a well written text offers; where the reader shapes the meaning which entirely depends on what he brings to it. His interpretation says a lot – both about himself and the writer – and this ambiguity is quite difficult to achieve.”

Is is true?

Partly true, but the meaning can’t completely depend on the reader, can it? And yes, the interpretation says a lot about the reader as much as the writer. That’s part of the reason why I talked about a good reader. The writer is, in fact, the first reader.

Also, what the interpretation can say about the reader includes the fact that the reader correctly understood the meaning. Or one or more of the meanings. After all there are people who know more and who understand more and there are those who know less and understand less, even if there is no objective way of finding out who is which in what case. But over a period of time, once you know someone well enough you might be able to decide whether to rely on someone’s judgment or not. We all rely more on the judgment of some people and less on others’.

About Ram Guha’s article, what he writes there is almost exactly more or less word for word what I used to secretly (as I had no one to actually say things to and I didn’t, of course, have a blog then) argue with the ‘left intellectuals’ about 10-15 years ago (perhaps influenced by the writings of people like Ram Guha who are given very generous space in the mainstream media and who, by the way, don’t talk nonsense most of the time: they are good enough writers). For example, I would (silently) say at that time that it is wrong to call the BJP or Shiv Sena etc. fascists. And I would give the same reasons as he has given in this article. It may be new to you, but it’s pretty stale stuff for me (I can’t help it if it sounds arrogant).

Now I know better. BJP may not be technically a classical fascist organization, but it is definitely a part of a network which has very strong fascistic tendencies. What we are seeing right now is corrupt fascism in somewhat slow motion. Whether it is better or worse than classical pure fascism is a matter of debate.

As for the again-and-again repeated diatribe by Ram Guha against the communist faction headed by Ranadive, how many people today know that the Nehru government had carried out systematic atrocities in suppressing these communists who believed that the independence that we had got was fake. In one of his articles in the Hindu as well as in a long article in the Outlook, Ram Guha ridiculed Ranadive for saying roughly ‘yeh aazaadi jhooti hai’. But was he the only lunatic extremist to say that? Do you remember the most famous poem by Faiz? And all this is very well documented and portrayed in the post-independence Indian literature (in Indian languages, perhaps that’s why the need to keep the literature in these languages down), though not much known to the general public. Just to give one example, Manohar Shyam Joshi, the writer of Hum Log and Buniyad etc., who was also a great writer in the true literary sense, wrote one novel which describes this in quite detail, as an allegory of modern India.

I like to read articles by Ram Guha, but be sure that I know perfectly well where he stands. At the very centre of centre (as the author of that article about Bhimsen Joshi said). He has no problem in saying that the left exactly equals the right. The funny thing is that he seems to be claiming that he is a leftist. And many people do think he is a leftist.

And, as I said earlier, the centre is shifting to the right. Hardly an original observation.

But I still like his articles most of the time. He is not very boring and he does give you a lot of background information about certain things and I want to read about everything. At least so far he doesn’t support the far right.

Comment by me:

And as for saying that Ram Guha is being ‘heckled down’, I don’t think you need to worry about him. He is a very privileged and respected person right in the middle of the mainstream.

It was he who had started the attack against Arundhati Roy, not vice-versa. I just hope that it was a misguided venture, not something deeper.

I don’t have much patience for card-carrying communists, what with their rigid ideology, but I do know that, on the whole, they fare better than most of the others.

Comment by Arfi:

>>”Is is true?

Partly true, but the meaning can’t completely depend on the reader, can it?”

Yes, I think it is true and something I am interested in exploring further. Of course I don’t claim that an entire text (any piece of fiction), can be that ambiguous. But, for example, the use of pronouns or initials (like Roberto Bolano’s B.) instead of a name in a third person narrative might go someway in achieving that ambiguity, if one consciously leaves open the narrative by not establishing the background or cultural influences of a particular character. There would still be other clues for the reader but what would be interesting is how he ‘fleshes out’ the character based on his own views when he reads the text.

Like I said, difficult but something worth experimenting with.

As for Guha’s article, I find his entire logic convoluted. First he applies certain ‘tests of fascism’ to the BJP, to let it off on a technicality and then later advises caution when borrowing terms generated from a different historical context – the very terms that he himself used to argue otherwise. Does he not realize that he cannot have it both ways.

I, for sure, am not going to worry about him anytime soon.

Comment by J.:

Today we were reading Derrida in class. Last couple of weeks Foucault. This in-depth discussion is very funny in this light. Funny in the sense that any talk of meaning is, post poststructuralist deconstruction.

Don’t read Derrida if you’ve managed to avoid him in your (lack of) reading so far. He may put you off reading forever.

Tongue firmly in cheek,

Your ardent fan,



Comment by me:

So much for Ram Guha. There is something very ugly about discussing individuals. The only time it can be necessary is with respect to their public, professional or political stances, which is what I hopefully focused on. As an individual, I am sure he is great guy.

I have read tid-bits of Derrida and am familiar with his general ideas, but I most surely don’t apply his ideas because I wouldn’t know how to (TFIC).

For me, reading well is very much like appreciating music or appreciating cinema. It’s a mix of nature and nurture. The latter can often compensate for the former to a great extent. And if there is one thing I am very confident of, that is to differentiate good writing from bad writing, and good music from bad music and good cinema from bad cinema etc. So, though I can’t explain exactly why I think Madhur Bhandarkar is a classic pseudo, I am sure he is by watching several of his movies. Similarly, I know who to rely on more if I am in doubt. For example, I would rely on Orwell much more than I would rely on, say, Dan Brown. And I have been proved right innumerable times (sometimes wrong also, as No-One-Is-Perfect).

About your pronoun example, of course, that is true. You must be knowing that I know that much, don’t you? What I said was about the text as a whole, with the help of ‘clues’ in the text.

So I don’t have any objective arguments in support of my evaluation of your article, but you can either rely on me or not, depending on whether you place me nearer (in terms of my examples) to Orwell or to Dan Brown.

It has turned out to be an interesting discussion. I don’t even mind it being funny.

Comment by Arfi:


I have not read any Derrida except what surfaced in his obituary. (To be honest even that proved too dense for me.)

And I really have no idea what is meant by post post-structuralist deconstruction (you lost me after post-structuralism) but it does sound funny. ;)

But to be serious, what I am worried about is becoming overly conscious when writing if I venture too deep into literary theory. There is a long way to go and I am not even sure if I really want to or can go there.


Yes, I am sure you know about the use of pronouns and initials in a narrative. I was only trying to further elaborate on the point I made earlier.

I rely on your judgement and look forward to further criticism. Indeed it has been an interesting discussion.

Comment by me:

To end on a lighter note, here are two excerpts from the book I mentioned:

(Caution: Hindi text ahead).

A kind of prologue

A popular hilarious passage

His writings, in general, are also very interesting from the language (if not linguistic) point of view.

By the way, I have left out one comment by someone because it was completely unrelated to my comments.

July 29, 2008

V for Vodot’s Vendetta

Vodot hasn’t arrived, but I seem to have got (t)his vindictive message:

V for Vodot\'s Vendetta

But I thought Vodot was being held by someone.

Is this some kind of self-sponsored hijacking? Or is the world not so bad (or not so good) and someone is telling me that he (or they) has (or have) hijacked Vodot and I will be getting some message for ransom?

But why should I pay ransom for Vodot?

My shoes don’t have strings.

Waiting for Vodot

Instead of Access Denied, I often, instead, get these:

Waiting for Vodot

Vodot doesn’t arrive, of course. He is held up somewhere by someone.

(The hourglass cursor is not visible in the image).

January 20, 2008

On Blind Reviewing

This is something about which I have wanted to write for a long time. Since, like many other things about which I want to write, it is quite an important matter, I didn’t want to write in a hurry. Which meant that I had to wait for a time when I could write at enough leisure to be able to write at enough length with enough time for making it rigorous enough. Now, since it is very difficult (for me at least) to get enough of all these, this effectively meant that writing about this topic was postponed indefinitely.

But I don’t want this to be postponed indefinitely. I want to write about this now. So, I would just write and try to be as rigorous as it is possible to be in a blog post written in one or two short sittings. This applies to many other posts, whether written already or to be written in future. You can take it as an apology or you can take it as a disclaimer.

What is the problem? Well, the problem, or rather the question, is whether what is called ‘blind reviewing’ is a good thing or not. And, of course, this is in the context of peer reviewing of scientific (or claimed to be scientific) research papers or articles for the purpose of selection for inclusion in the proceedings of a conference or workshop or for inclusion in a journal.

Excuse the legal sounding language.

First of all, let me list all the reasons in favour (‘favor’ for the dominant party) of the so-called ‘blind reviewing’ process, so that no one can jump and dismiss the whole affair as trivialization by saying you don’t know what you are talking about:

  1. Human beings can be biased. So, if a reviewer knows that a research paper is written by a person she doesn’t like or has strong disagreement with, she can get biased against the paper and will not be able to review the paper fairly.
  2. Apart from the above kind of biases, there can be the bias in terms of the weights associated with the names of the authors, their institutions, their countries, their group, even their academic background. Most of the people who have been working in NLP/CL[1] for some time know about the linguistics vs. statistics or machine learning bias. This kind of bias increases the chance of your paper being rejected or accepted depending on whether you seem to be in favour (or favor) of a linguistics heavy approach to NLP/CL or of a statistics (or machine learning) heavy approach. There are variants of this bias in other fields too. For the closest example, we can consider Linguistics. Where your paper is perceived to be situated along the Chomskyan or Empiricist or Cognitive or Computational axes with respect to the chosen position of the reviewer can have a large impact on the decision about your paper, irrespective of what else your paper says. And the chances of such a perception can be increased if the identities are known.
  3. Human beings can be unduly confrontational and they can also be unduly wary of confrontation. So, if the identity of the reviewer is not withheld, the author(s) may be offended by the reviewer and they may also become confrontational and carry on this confrontation with the reviewer, thus making the process of reviewing difficult and something which a lot of people would like to run away from. Also, the reviewer may avoid making adverse comments, especially if the reviewer doesn’t want to offend the author(s).
  4. If the author(s) don’t know who the reviewer is and vice versa, the whole reviewing process may be more fair for the above specified reasons and because of the general association between anonymity and fairness. If you don’t know who is criticizing and the person criticizing also doesn’t know who is being criticized, then you can expect more fairness.
  5. If the Program Committee (PC) chair(s) also don’t know who the authors are and who the reviewers are, then they can assign equal weight to all the reviews for making the final decision about a paper.
  6. If the author(s) don’t know who the reviewer is, then they won’t have any reason to attribute bias or prejudice to the comments made or ratings given by the reviewer.
  7. Peer reviewing of research papers, like the administration of justice, should not just be fair, but seen to be fair. And this can only happen with blind reviewing.
  8. Blind reviewing, through the use of the device of anonymity, gives a true meaning to the idea of ‘peer reviewing’, because if the identities are not known, all the people involved can be treated as peers, even if some of them are senior most pioneering researchers or Directors of first class institutions in first world countries, while some others are graduate students in second class institutions in third world countries.
  9. If the identities are not known, both the reviewer and the author can focus on the content of the paper and the review, respectively.
  10. Finally, the very practical reason that blind reviewing provides a reasonably fair mechanism to ensure the selection of the best research papers such that everyone can be more or less satisfied with the outcome and no one will have valid reasons to complain.

I think the above list makes as strong a case for blind reviewing as can be made. I mean in a blog post, not in a book.

Now, in the next post (that means in some future post) I will discuss what is or can be wrong with blind reviewing and will try to draw some conclusions. You must have guessed that the reason I am writing all this is that I am not sure whether blind reviewing is the best thing possible. But by writing all this, I am also trying to get things straight in my own mind.

[1]: With apologies to Martin Kay and others, I am using NLP and CL as interchangeable terms because I think my arguments in this matter are not affected by the distinction between the two, a distinction which may be important in many but not all contexts (i.e., in my opinion).

January 3, 2008

