views:

364

answers:

3

I find Google's In Quotes a really nifty application, and as a CS guy, I have to understand how it works. How do you think it turns news articles into a list of quotes attributed to specific persons? Sure, there are some mistakes, but their algorithm seems to be smarter than just a simple heuristic or multiple regular expressions. For example, a quote can be attributed to someone even though his/her name was only mentioned in the last paragraph.

Any ideas? Any known paper on the subject?

A: 

I do not have any paper but some idea. Google take quotes from a set of person. Easy for them with Google News and other media access.

They have an other set of subject. Google match subject set with the set of person (both set are finite). The last set is all quote,

If you notice, the subject contain 1 word that is highlighted in the quote. So it has a relation between the set of subject and the set of quote for each set of person. Since Google is the master of information, it must be very easy to get a link between all these set.

Daok
A: 

I don't have an answer for your question but my suggestion is that you ask a Google engineer directly through Google moderator. You may not get an answer quickly(or at all) but you will get an accurate answer there.

BlueGene
+1  A: 

It's simple, it checks for the words, but there can be anything in between them, as long as they're still in order. "Hello world!" would become the regex /hello [ .]* world/

FrozenFire