tags:

views:

81

answers:

4

There's a movie which name I can't remember. It's about a carnival or amusement park with a horror house and a bunch of teens who are murdered one by one by something with a clowns mask. I've seen this movie about 20 years ago, and it's sequel, but can't remember it exactly. (And also forgot it's title.) As a result, I started wondering about how to solve something technical.

Assume that I have a database with the story plot and other data of each and every movie published. (Something like the IMDb.) And I would have an edit field where a user can just enter a description in plain text. The system would then start analysing this text to find the movie(s) that would qualify to this description.

For example (different movie), I enter this in the edit field: "Some movie about an Egyptian king who attacks a bunch of indians on horseback, but he's badly wounded and his horse dies while he lost this battle." The system should then report the movie "Alexander" from 2004 as answer, but possibly a few more. (Even allowing a few errors in the description.)

To create such a system where a description gets analysed to find a matching record by searching through descriptions, what techniques should I need for something as complex as that? Not that I want to build something like that right now, but more out of curiosity if I ever want to pick up some interesting new project.

(I wanted to award extra points for those who recognise the movie I've mentioned in the beginning. But one Google-attempt later and I found it myself!)

Btw, it's not the search engine itself that interests me, but analysing the description to get to something a search engine will understand! With the example movie, it's human logic that helped me to find the title. (And it's annoying that this movie isn't for sale in the Netherlands.) Human logic will always be a requirement but it's about analysing the user input, which is in the form of a story or description, with possible errors.

+1  A: 

For what I can tell by your own comments, Google is the technique to be used. ;-) But, honestly, I think more or less any search engine would do.

Edit: heh, you removed your comment, but I do remember you mentioned Google as the one deserving extra points.

Edit+: well, you mentioned Google again, but I don't want to remove my first edit. ;-)

Michael Krelin - hacker
True. I found it within minutes after asking the Q. Googled for "horror movie clown amusement park" and found a link with several related movie's. But what if you want to do something yourself? Where lies the power of a good search engine?
Workshop Alex
Yeah, removed the second comment since it was incorrect. I think there's a movie similar to Funhouse from the 80's but it wasn't that one. So, extra points for the person who knows which movie I mean. :-)
Workshop Alex
Well, in this realm I'm not really prepared to go for anything but jocular answer without research. As for the power of a good search engine - looking at one step forward-two steps back stance all SE are making, I'd venture to doubt the power of a good search engine is stored in well-known place ;)
Michael Krelin - hacker
+1  A: 

Pure speculation: Would something trivial such as taking every word of more than 4 letters in the description "Egyptian, Indian, horse battle etc." and fuzzy matching against a database of such summaries work? Perhaps with some normalisation eg. king == leader == emperor?

Hmmm ... Young Man, Girlfriend, swimming pool, mother, wedding does that get us to The Graduate? Well I guess with a small amount of specifics "Robinson" it might.

djna
Maybe. But what if the description says something like "no mother"? It would then search for movie's with mothers. :-) At least some analysis would be required.
Workshop Alex
Ah yes, so much for simplicity. It would be interesting to find out how far we'd get with the trival - how many films would actually be mistaken when we don't differentiat "Mother" and "No Mother". Apparantly you can identify tunes by reducing them to "up down up same same down" - see http://knowbodies.blogspot.com/2008/03/identifying-tune.html - I wonder how effective identification film identification might be even if elided such seemingly crucial modifiers as "No" - with music it seems like the difference between a semi-tone and an octave doesn't matter!
djna
+1  A: 

You should check out document classification.

A few document classification techniques

Nick D
Ah! It would definitely be an ambitious plan if I ever make this a new project. :-)
Workshop Alex
yes but it's a fascinating topic ;-)
Nick D
+1  A: 

You can do lots of interesting stuff with the imdb keyword search:

http://akas.imdb.com/keyword/carnival/clown/murder/

You can specify multiple keywords, it suggests movies and more keywords which are in similar context with your given keywords.

The data contained in imdb is publicy available for non-commercial use and can be downloaded as text files. You could build a database from it.

codymanix
True, but the Q isn't about building the database but on how to translate a description to something that can be used to search for. (Something I did manually to remember the movie again.)
Workshop Alex