views:

60

answers:

2

Is there some library out there that can figure out if a given string of characters contains a "real sentence" in English, meaning that it contains words from English? (The sentence need not make sense, but it should contains real English words)


For example, the following is not a sentence (at least in English:) -

hsgdhjf asdf dsusdf udfhpiew
+2  A: 

You can ensure that every word is spelled correctly using a spelling checker (there are a number of libraries for this, none of which I have used) but that still won't tell you if the sentence is grammatical. Furthermore, an English speaker would probably consider a sentence "real" even if it had some errors, and some words aren't in the dictionary.

The best way to do this remains to have your program show the alleged sentence to a human being who speaks English, and ask them if it is a "real sentence."

kindall
+2  A: 

This is an unsolved problem, as computers have no idea of what "makes sense". Even if it tries to parse a sentence by detecting nouns, verbs, etc, there are still phrases like "colorless green ideas sleep furiously" or "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo" that would get through. I doubt many people would say those are sentences.

There are also multiple ways of parsing sentences, for example "Time flies like an arrow; fruit flies like a banana" can be parsed as:

  • adjective noun verb article noun; noun verb preposition article noun
  • noun verb preposition article noun; adjective noun verb article noun

to take just two ways.

The bottom line: parsing natural language is hard, and making sense of it is even harder.

Dave
Side note: of the two parsings listed above (there are others), the first doesn't make sense because, as far as we know, there are no such things as "time flies". The second half of the first parsing does make sense, of course.
Dave
I think all he really cares about is something like putting all the words of a sentance into an array and qualifying them one by one against a dictionary database, which would of course be slow, but would do what he wanted.
MaQleod
Yes -- since the question was updated :-) The original question was ambiguous, so I assumed the most difficult thing was being asked. As a side note, a dictionary lookup shouldn't necessarily be that slow, assuming it's stored well (e.g. a DAWG)
Dave
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo, right?
Juliet