IILC_jan03

Language and Inference Technology Group analyses Dutch election campaign.

Ivar Vermeulen, Jaap Kamps and Michael Masuch are interviewed by Reinder Rustema about a pilot project that has already served to analyse political discussion on the internet for the general public.

http://www.science.uva.nl/~kamps/stemming/

What do you do?

Jaap: “We are working on a project to infer views and opinions from texts.”

How does it work?

Jaap: “We are using a database called ‘WordNet’, which basically is a dictionary with about 5000 words. Unlike an ordinary dictionary, it is not sorted alphabetically. Thanks to the computer you have the freedom to do all kinds of sorting. WordNet has distances for the relations between the concepts words represent, which can be vertical relations, from specific concepts to more general concepts or the other way round, and cross-relations like synonyms an antonyms. We focus on the shortest route between the words which already can say a lot about the texts. The adjectives have very little ‘vertical’ relations while nouns often have many. Take for example the category of words related to car. From the tiniest screw in the car to roads cars drive on, there are vertical relations. For adjectives like good or bad there is not really a category while there are many synonyms. Words with no relation to each other have value zero.”

How did you get there?

Ivar: "With Michael I was working on a program to analyse stocks. Stock prices tend to behave erratically and the program failed to predict it. If one stock would be very low for example, we knew why because we read news articles about the company the program was ignorant of. We wanted to feed these texts to the program so it could take that information in account when calculating ‘the mood’ for, in this case, stocks. At first we would just look for adjectives with a simple kind of thesaurus, an amateur WordNet sort of. Then Jaap suggested to use WordNet, which allows much more complex relations.”

What do you do with it now?

Jaap: “In a pilot project we are now using WordNet to look for sentiments in texts. We first analysed English political discussions around the UK parliamentary elections in 2001. Later on, after the elections in the Netherlands, we did the same with Dutch political discussion on the internet after we translated the words into Dutch. They are only 5000 words, ranking from a positive to a negative sentiment, so you can pick the odd translations out very easily. We just had it running analysing internet discussions on politics since last August. Then our cabinet fell and you could suddenly see very interesting and credible patterns in the sentiment and the intensity.

This little project we had running on the side is now growing
bigger, there is a lot of interesting things happening. Rob Mokken
has been a driving force througout the project for us and luckily Maarten de Rijke is always very encouraging, otherwise this would not have come this far."

What will be the future for your project?

Jaap: "We are trying to obtain funding from NWO to continue this project on a larger scale, focusing on a scientific approach to evaluation and further development. Unfortunately, it is always a struggle to get funding for research of a multi-disciplinary nature. This research will go through in some way or another."

Ivar: “This is really the kind of research of which people assume it exists for decades, while in fact there is still very little of it. It is very much applied research, the use for it is evident. On a dedicated page on the website of the national daily ‘NRC Handelsblad’ I show the results of what is talked about in internet discussions and explain it. The statement in that same newspaper a month earlier claiming it is impossible to take internet discussions into account in politics is now already falsified. You can see what people are talking about and in which way without wading through thousands of postings every day. It even gives an extremely fast, near instant, insight in the current sentiments, without polling, surveys or any of the existing instruments. The data is spontaneously supplied by the people, without them even being aware of it. Instead of asking specific questions in a telephone survey or a panel, you just take a look at what the people come up with themselves on a topic you choose.

Jaap: ‘We should take the lead in it now. The computing power you need for it is readily available today, as well as the data. The texts in a digital format on the internet we use now are growing daily without us doing anything for it. It is right there on the internet, waiting for us to tap into. This prototype seems to work and it is exciting to find out for sure. To be able to ask more complicated questions that go further this needs to be transformed in a new, though less applied perhaps, research design.”

ReindeR Rustema