



My teammates and I have a very challenging new project to do, and we are supposed to submit it next week. We don't have a single clue about how to do it, and really need help. We are undergraduate students, new to Information Retrieval and AI, and really need your ideas.

The project is roughly:

When an expert is cited in a document, find an expert with an opposing opinion & find out what he/she says about that topic.

We are free to use any programming language, but we are not concerned with the programming. We would like help to get us started. Please give us a rough idea on how to design such a system and how to retrieve information on the internet. How should we get his opinion, then find an opposite opinion?


Sounds like an NLP problem to me. As for the information about documents and cites, should be a good starting point.

For each paper, there are several citations which refer to the paper. At the very minimum, you have to scan the abstract of the paper and that of the citations and run your own algorithm to figure if any citation is of the opposing opinion. Maybe your professor can give you hints on some approximate heuristic, but as far as I know it is a really hard problem.

I would be watching this thread for more interesting approaches.

Joy Dutta

Automatically submit a Google search request similar to "*expert_name* sucks", "*expert_name* wrong", or something like that. Find the first result that has "PhD" with a document link in the same sentence and return the link.

Doug Knesek
+1  A: 

Simple: use Amazon's Mechanical Turk.

Without that (or an equivalent) you're in trouble. If there are no further constraints on the problem then you will need a full-blown AI, the kind that doesn't yet exist. If there are severe restraints then you might have a chance of doing this in a week. If the expert can be in any field (medicine, politics, history, fashion, science, comic books, etc.) then there will be no single, well-organized repository of essays. You'll have to use Google to find Dr. X's opinion. Once you find Dr. X's writing (and let's pray it's text, not audio) you'll have to do some kind of natural language processing to get the thrust of it, even if you're lucky enough to find a descriptive title ("Digital Photography Is Absolutely Great"). Then you have to figure out it's opposite. What's the opposite of "Neil Gaiman draws on folklore for his story ideas"? Figuring out what opinion you're looking for will be a serious problem. After that, things actually get easier: you can google for the subject and use the same magic tools to find the one you're looking for.

So what do have a chance of solving? A search for opinions that someone else has already organised into "pro" and "con". Some online political forums are organised that way. Wikipedia cites opposing views in a special section in some of its articles. Science journals print letters of rebuttal. Look around, you might find a site even more cut-and-dried. Choose a small enough arena and you'll have a tractible problem.

EDIT: Damn, Ben Dunlap beat me to all my major points in a comment. Sigh


I think you might be blowing this up a little too big... as an undergraduate project, I would approach it a little more small scale.

Unless your specification says you must use actual internet resources, you would be better off creating your own database of custom short documents. Add metadata to each document stating the points they make about certain topics.

Next, I would create a list of citations which link to each document and add some metadata representing that experts stance on the topic. When someone reads a document, I would augment the list of citations with lists of links to documents which have alternative views on that topic.

Basically it would consist of these tables:

Document (id, data)
DocumentPoints (documentId, topic, stance)
Citation (documentId, topic, stance)

And when someone loads up a document, the citations are pulled up as well. For each citation, you search DocumentPoints for the same topics with different stances. The most difficult part of this project would be creating the 5 or 6 documents you need to have data in your database. After that the solution is trivial.

On a side note, most of these other answers are telling you to use some existing solution... don't do that unless the assignment tells you to. You'll be much better off understanding the problem and various ways to solve it (this is definitely not the only/best one) if you work through the entire problem yourself. When the teacher asks you to do something not supported by whatever product you chose to implement your solution on, you wouldn't be able to fix it. If you had just written it yourself, you could just as easily implement to the new spec as well.
