views:

772

answers:

13

How do you create words which are not part of the english language, but sound english? For example: janertice, bellagom

+2  A: 

You might be interested in How do I determine if a random string sounds like English?

cxfx
+8  A: 

Consider this algorithm, which is really just a degenerate case of a Markov chain.

JSBangs
A: 

A common practice is to build a Markov Chain based on the letter transitions in a "training set" made of several words (noums?) from an English lexicon, and to then let this chain produce "random" words for you.

mjv
+1  A: 

One approach that's relatively easy and effective is to run a Markov chain generator per-character instead of per-word, using a large corpus of English words as source material.

camccann
+3  A: 

Here's an example of somebody doing it. They talk about Markov chains and dissociated press.

Here's some code I found. You can run it online at codepad.

import random

vowels = ["a", "e", "i", "o", "u"]
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 
              'r', 's', 't', 'v', 'w', 'x', 'y', 'z']

def _vowel():
    return random.choice(vowels)

def _consonant():
    return random.choice(consonants)

def _cv():
    return _consonant() + _vowel()

def _cvc():
    return _cv() + _consonant()

def _syllable():
    return random.choice([_vowel, _cv, _cvc])()

def create_fake_word():
    """ This function generates a fake word by creating between two and three
        random syllables and then joining them together.
    """
    syllables = []
    for x in range(random.randint(2,3)):
        syllables.append(_syllable())
    return "".join(syllables)

if __name__ == "__main__":
    print create_fake_word()
Andy West
This post reminds me of Raymond Che's blog posts (with all the links) ;)
RCIX
Oops, Raymond *Chen*...
RCIX
+1  A: 

Note: Linguistics is a hobby, but I am in no way an expert at it.

First you need to get a "dictionary" so to speak of English Phonemes.

Then you simply string them together.

While not the most complex and accurate solution, it should lead you to a generally acceptable outcome.

Far simpler to implement if you don't understand the complexities of the other solutions mentioned.

Aequitarum Custos
+3  A: 

Using Markov chains is an easy way, as already pointed out. Just be careful that you don't end up with an Automated Curse Generator.

Tim Sylvester
+3  A: 

I think this story will answer your question quite nicely.

It describes the development of a Markov chain algorithm quite nicely, including the pitfalls that come up.

abelenky
+8  A: 

Take the start of one English word and the end of another and concatenate.

E.g.

Fortune + totality = fortality

You might want to add some more rules like only cutting your words on consonant-vowel boundaries and so on.

Artelius
Upvoted for simplicity.
esac
I agree. People rearrange prefixes/infixes/suffixes all the time subconsciously to create new English words. It's an exceptionally simple algorithm (heuristic?) in the mind, so it wouldn't be hard to implement in code. I'm happy to contribute to this post's upvotedness =)
Repo Man
A: 

Markov chain is the way to go, as others have already posted. Here is an overview of the algorithm:

  • Let H be a dictionary mapping letters to another dictionary mapping letters to the frequency they occur with.
  • Initialize H by scanning through a corpus of text (for example, the Bible, or the Stack Overflow public data). This is a simple frequency count. An example entry might be H['t'] = {'t': 23, 'h': 300, 'a': 50}. Also create a special "start" symbol indicating the beginning of a word, and an "end" symbol for the end.
  • Generate a word by starting with the "start" symbol, and then randomly picking a next letter based on the frequency counts. Generate each additional letter based on the last letter. For example, if the last letter is 't', then you will pick 'h' with probability 300/373, 't' with probability 23/373, and 'a' with probability 50/373. Stop when you hit the "end" symbol.

To make your algorithm more accurate, instead of mapping one letter to the next letters, you could map two letters to the next letter.

Claudiu
+1  A: 

Use n-grams based off the English corpora with n > 3, that gets you an approximation.

Paul Nathan
+2  A: 

I can't think of any cromulent ways of doing this.

Dan Lorenc
;-) This kind of humorous tidbits is most welcome in SO. (Helps us keep with otherwise terse material and also stops us from taking ourselves too seriously. This said this kind of of lines should be placed as a comment to the question, not as an answer! Thanks.
mjv
A: 

If you decide to go with a simple approach like the code Andy West suggested, you might get even better results by weighting the frequencies of vowels and consonants to correspond with those occurring normally in the English language: Wikipedia: Letter Frequency

You could even go as far as looking at the frequencies of paired letters or sequences of three letters, but at that point you're actually implementing the same idea as the Markov chain others have suggested. Is it more important that the "fake words" look potentially authentic to humans, or are the statistical properties of the words more important, such as in cryptographic applications?

JohnE