This is not exactly easy to do. The way I would do it is with some simple statistical analysis.
Start off by downloading a dictionary of English words (or any language, really - you just need a dictionary of words that are "pronounceable"). Then, take each word in the dictionary and break it up into 3-letter blocks. So given the word "dictionary", you'd break it up into "dic", "ict", "cti", "tio", "ion", "ona", "nar", and "ary". Then add each three-letter block from all the words in the dictionary into a collection that maps the three letter block to the number of times it appears. Something like this:
"dic" -> 36365
"ict" -> 2721
"cti" -> 532
And so on... Next, normalize the numbers by dividing each number by the total number of words in the dictionary. That way, you have a mapping of three-letter combinations to the percentage of words in the dictionary that contain that three letter combination.
Finally, implement your IsWordPronounceable
method something like this:
bool IsWordPronounceable(string word)
{
string[] threeLetterBlocks = BreakIntoThreeLetterBlocks(word);
foreach(string block in threeLetterBlocks)
{
if (blockFrequency[block] < THRESHOLD)
return false;
}
return true;
}
Obviously, there's a few parameters you'll want to "tune". The THRESHOLD
parameter is one, also the size of the blocks might be better off being 2 or 3 or 4, etc. It'll take a bit of massaging around to get it right, I think.