[Disclaimer: This is not a scientific article. It is based on partly objective and partly subjective, but in any case sincere, analysis of the author’s knowledge of and experience in the world of research. No empirical evidence is presented as, in the author’s belief, enough empirical evidence can be presented about this topic to prove whatever you want. This is just a request to look at research honestly and sincerely without self-deception and pretensions.]
There is a very old and much discussed question which has been bothering me for a long time. Like in many other cases, so far I avoided writing about this because:
- I didn’t want to repeat things which have already been said.
- To say something new on this topic requires a lot of leisure, which I don’t have.
- The problem with saying something new about this is the topic itself.
- What is original and what is not?
- What is innovation and what is not?
- What is creativity and what is not?
- Is there anything in this world which is really original?
But, again like many other things, I have been provoked enough to write this post. I will try to do my best. As much as can be done in a single blog post.
What is the provocation? The provocation is the intensely irritating pretensions of originality from ‘researchers’ who have happened to review my or some others’ papers. They write as if every paper selected in every conference, journal and workshop is a completely original work. This, frankly, has started to get on my nerves. Because I know very well that this is simply not true.
The truth is not that every paper selected in every conference, journal and workshop is worthless or mere repetition of old things. The truth, as usual, lies somewhere between these two extremes.
However, I am quite sure that it lies much nearer to the second extreme than to the first. Even for the top ‘first class’ conferences and journals.
To quote from the article How to do Research At the MIT AI Lab, 1998 by David Chapman (Editor):
At some point you’ll start going to scientific conferences. When you do, you will discover the fact that almost all the papers presented at any conference are boring or silly. (There are interesting reasons for this that aren’t relevant here.)
I will go on to say that most of them have hardly any originality (that’s partly why they are boring). If you have sufficient resources, you can almost follow a recipe to write a paper which will get selected at a conference, workshop or journal. And this is exactly what is done. And it works too. One of the reasons is that it is easier this way for the reviewers. They don’t have to think hard about the originality of the paper. Because, of course, it is very hard to decide whether something shrewdly written and well presented is original or not. Quite often there may not be a clear-cut answer at all.
One of the essential elements of the the most popular recipe is to work on problems which are currently in fashion and do some experiments, any experiments, on that problem and present the results. If you practice enough, it can hardly go wrong. That’s how a great number of papers get published. No originality needed. Just be fast enough to do the experiments (which someone else would anyway have done in the near future) and write a paper. It’s somewhat like buying stock. Beat others by being the first to buy the stock as soon as it comes out. You just have to know how to fill up the form and complete the transactions. This applies even more to top conferences than to workshops.
If you think I am talking nonsense, I would request you refresh your Chomsky (in case you are a linguist) or refresh your Jurafsky-Martin (in case you are, as the term goes, an NLPer or a computational linguist).
If you do the above carefully, you will find that almost all the elements of Chomskian Linguistics can be traced back to some linguist, writer, philosopher or thinker of the past. (By the way, this applies to the ‘Theory of Evolution’ too). Similarly, you will find in Jurafsky-Martin that almost every discovery has been made by more than one scientist or thinker, including this one.
And if you go back to the top conference and journal papers, you will again find that most of the papers don’t really have anything really new to say.
So do I mean that all research is nonsense and useless? Certainly not. Why would I be in research if that was so? What’s the catch? The catch is that the emphasis on originality is highly misplaced.
What I am saying certainly doesn’t imply that there is nothing ‘original’ in the Chomskian Linguistics. But it does probably mean that we are looking for originality in the wrong place. I hope some day I will be able to say this with more clarity and preciseness.
But we would definitely be much better off if we dropped the mythical pretensions of the originality of every published paper. Originality is just one of the goals of research. Most of the research is routine research. Incremental research. That doesn’t make it useless. Really original papers can be expected only once in a long while. The rest should be seen as attempts to advance the state of the art marginally. Without much originality. Most of research is plain hard work. Rigorous work. Results of experiments which by themselves do not really matter much, but a small fraction of them could, just could, provide some insight for someone else to come up with something which is ‘original’. This (at best) is the purpose which more than 99 percent of the published papers serve and we better realize this instead of indulging in rampant self-deception about originality.
Coming to NLP and CL or even Linguistics, it is even more important to realize and accept the above mentioned fact. The reason is that research in these disciplines depends to a great extent on creation of resources (language resources as well as tools) which may not be very ‘original’ in nature as the word is usually understood. A lot of papers should and do report just the development of these resources and they are published. The trouble is that everyone is forced to create a false facade of originality and creativity which is not really there. You have to falsely claim the worth of your papers in terms of originality and ‘novelty’ when actually the worth is just in plain hard work. But if you don’t put up that facade, you are out.
Have you considered the fact that a lot of the Great Discoveries were accidental discoveries? Was there so much originality in those discoveries? I don’t know. It may sound cliched, but it does depend on how you define originality. Perhaps the better way is to emphasize less on (true or false or anything in between) originality and more on usefulness. At least in disciplines like NLP and CL where, if you ask most researchers, they won’t even be able to give a coherent answer about what exactly they are trying to achieve through their research. And where we don’t even know for sure whether there is anything really scientific to be achieved. Even after the great linguistic revolution, we hardly know anything about language that can be termed as scientific as the laws of Physics or the theorems of Mathematics. At most we can say that we are trying to build machines which can give better practical results. We need a LOT of hard work and only a little bit of originality. And this originality, like in other disciplines, is hard to come by.
I, for one, am not going to insist on a facade of ‘originality’ for the description of the hard work to be accepted for publication. Of course, there should not be verbatim repetition, but I don’t have any illusions about the originality of papers published anywhere. Further, I am going to prefer papers describing intelligent hard work over almost worthless but seemingly innovative cooked-to-recipe papers.
May be this is an empty declaration because I may not get to be in a position to insist or not to insist, but I can still make the statement at least.
It is my informal personal blog after all. I can afford to be as honest and direct here as I want.
That doesn’t mean I am not aware of the possible consequences.