The text to be checked is in Greek, but I would like to know if it can be done for English words too. My initial idea is described here, and I have already found a way to do it using VBA. But I wonder if there's a way to do it using R. If there isn't a way in R, do you think of something better than Excel-vba?
There exists an open source GNU spell checker called Aspell with suppot for various languages. This is a command line program which I basically use for scanning bunches of text files at once (then the output is just given to the console).
But there also exists a C API and perhaps more interesting for you a Pipe mode which accepts streams of texts and outputs to the standard output.
Hope this helps.
Alternatively, OpenOffice ships with a dictionary that entries stored in a text file. You can read that and remove the word definitions to create your word list.
This was tested on v3.0; the file location may have shifted, and the filename will change depending on which dictionary you want.
library(stringr)
dict <- readLines("C:/Program Files/OpenOffice.org 3/share/uno_packages/cache/uno_packages/174.tmp_/dict-en.oxt/th_en_US_v2.dat")
is_word <- str_detect(dict, "^[^(]")
words <- str_split_fixed(dict[is_word], "\\|", 2)
words <- words[,1]
This list contains some multi-word phrases. You may prefer to split on the first space, and take unique values. You probably also want to write words
to file, to save repeating yourself.
Once this is done, checking a word is as easy as
c("persnickety", "sqwrzib") %in% words # TRUE FALSE