I'm trying to isolate the single words in a pdf file, but when reading the file using the pdf-reader gem the text arrives fractured, like this
"A lit"
"tle "
"bit of tex"
"t"
So I'm planning to put these together using some heuristics. For this, I need a library which checks if a given string is a valid english word, like
"tree".is_english? # => true
"askdjfah".is_english? # => false
Does this exist? Ideally, it would also work with german text.
If not, is there some freely available dictionary online? I guess I could write my own tree structure to do the lookup, if i had to.