ansaurus

Question

Is there a way to check the spelling of words in a character vector?

Answer 1

+2 A:

There exists an open source GNU spell checker called Aspell with suppot for various languages. This is a command line program which I basically use for scanning bunches of text files at once (then the output is just given to the console).
But there also exists a C API and perhaps more interesting for you a Pipe mode which accepts streams of texts and outputs to the standard output.

Hope this helps.

Henrik 2010-09-02 10:00:36

Thank you. Is there a windows binary for Aspell?

gd047 2010-09-02 10:59:15

Yes, there is, and the windows binary is what I am using: http://aspell.net/win32/

Henrik 2010-09-02 11:08:25

Is there a way to use it from R? I saw this http://www.omegahat.org/Aspell/ but I read that `There is currently no binary version for Windows`

gd047 2010-09-02 11:36:21

I think Hunspell should be used instead of Aspell today; it certainly works on Windows, but you may need to compile it by yourself.

mbq 2010-09-02 11:49:45

Sorry, but I haven't heard of any R Version. And truly, Hunspell is the more up-to-date thing, but as you just need a spell check, Aspell is probably enough. If you get it to work for your problem.

Henrik 2010-09-02 13:49:55

Answer 2

+4 A:

Alternatively, OpenOffice ships with a dictionary that entries stored in a text file. You can read that and remove the word definitions to create your word list.

This was tested on v3.0; the file location may have shifted, and the filename will change depending on which dictionary you want.

library(stringr)
dict <- readLines("C:/Program Files/OpenOffice.org 3/share/uno_packages/cache/uno_packages/174.tmp_/dict-en.oxt/th_en_US_v2.dat")
is_word <- str_detect(dict, "^[^(]")
words <- str_split_fixed(dict[is_word], "\\|", 2)
words <- words[,1]

This list contains some multi-word phrases. You may prefer to split on the first space, and take unique values. You probably also want to write words to file, to save repeating yourself.

Once this is done, checking a word is as easy as

c("persnickety", "sqwrzib") %in% words      # TRUE FALSE

Richie Cotton 2010-09-02 12:55:34

ansaurus

tags:

views:

answers:

Is there a way to check the spelling of words in a character vector?

related questions