ansaurus

Question

How do you create an english like word?

Answer 1

+2 A:

You might be interested in How do I determine if a random string sounds like English?

cxfx 2009-12-11 22:54:23

Answer 2

+8 A:

Consider this algorithm, which is really just a degenerate case of a Markov chain.

JSBangs 2009-12-11 22:55:34

Answer 3

A:

A common practice is to build a Markov Chain based on the letter transitions in a "training set" made of several words (noums?) from an English lexicon, and to then let this chain produce "random" words for you.

mjv 2009-12-11 22:56:16

Answer 4

+1 A:

One approach that's relatively easy and effective is to run a Markov chain generator per-character instead of per-word, using a large corpus of English words as source material.

camccann 2009-12-11 22:56:18

Answer 5

+3 A:

Here's an example of somebody doing it. They talk about Markov chains and dissociated press.

Here's some code I found. You can run it online at codepad.

import random

vowels = ["a", "e", "i", "o", "u"]
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 
              'r', 's', 't', 'v', 'w', 'x', 'y', 'z']

def _vowel():
    return random.choice(vowels)

def _consonant():
    return random.choice(consonants)

def _cv():
    return _consonant() + _vowel()

def _cvc():
    return _cv() + _consonant()

def _syllable():
    return random.choice([_vowel, _cv, _cvc])()

def create_fake_word():
    """ This function generates a fake word by creating between two and three
        random syllables and then joining them together.
    """
    syllables = []
    for x in range(random.randint(2,3)):
        syllables.append(_syllable())
    return "".join(syllables)

if __name__ == "__main__":
    print create_fake_word()

Andy West 2009-12-11 22:56:24

This post reminds me of Raymond Che's blog posts (with all the links) ;)

RCIX 2009-12-11 23:22:47

Oops, Raymond *Chen*...

RCIX 2009-12-11 23:23:52

Answer 6

+1 A:

Note: Linguistics is a hobby, but I am in no way an expert at it.

First you need to get a "dictionary" so to speak of English Phonemes.

Then you simply string them together.

While not the most complex and accurate solution, it should lead you to a generally acceptable outcome.

Far simpler to implement if you don't understand the complexities of the other solutions mentioned.

Aequitarum Custos 2009-12-11 22:58:43

Answer 7

+3 A:

Using Markov chains is an easy way, as already pointed out. Just be careful that you don't end up with an Automated Curse Generator.

Tim Sylvester 2009-12-11 22:59:20

Answer 8

+3 A:

I think this story will answer your question quite nicely.

It describes the development of a Markov chain algorithm quite nicely, including the pitfalls that come up.

abelenky 2009-12-11 23:00:52

Answer 9

+8 A:

Take the start of one English word and the end of another and concatenate.

E.g.

Fortune + totality = fortality

You might want to add some more rules like only cutting your words on consonant-vowel boundaries and so on.

Artelius 2009-12-11 23:28:27

Upvoted for simplicity.

esac 2009-12-11 23:41:26

I agree. People rearrange prefixes/infixes/suffixes all the time subconsciously to create new English words. It's an exceptionally simple algorithm (heuristic?) in the mind, so it wouldn't be hard to implement in code. I'm happy to contribute to this post's upvotedness =)

Repo Man 2010-01-15 15:11:07

Answer 10

A:

Markov chain is the way to go, as others have already posted. Here is an overview of the algorithm:

Let H be a dictionary mapping letters to another dictionary mapping letters to the frequency they occur with.
Initialize H by scanning through a corpus of text (for example, the Bible, or the Stack Overflow public data). This is a simple frequency count. An example entry might be H['t'] = {'t': 23, 'h': 300, 'a': 50}. Also create a special "start" symbol indicating the beginning of a word, and an "end" symbol for the end.
Generate a word by starting with the "start" symbol, and then randomly picking a next letter based on the frequency counts. Generate each additional letter based on the last letter. For example, if the last letter is 't', then you will pick 'h' with probability 300/373, 't' with probability 23/373, and 'a' with probability 50/373. Stop when you hit the "end" symbol.

To make your algorithm more accurate, instead of mapping one letter to the next letters, you could map two letters to the next letter.

Claudiu 2009-12-11 23:36:00

Answer 11

+1 A:

Use n-grams based off the English corpora with n > 3, that gets you an approximation.

Paul Nathan 2009-12-11 23:37:16

Answer 12

+2 A:

I can't think of any cromulent ways of doing this.

Dan Lorenc 2009-12-12 01:53:34

;-) This kind of humorous tidbits is most welcome in SO. (Helps us keep with otherwise terse material and also stops us from taking ourselves too seriously. This said this kind of of lines should be placed as a comment to the question, not as an answer! Thanks.

mjv 2009-12-14 05:26:26

Answer 13

A:

If you decide to go with a simple approach like the code Andy West suggested, you might get even better results by weighting the frequencies of vowels and consonants to correspond with those occurring normally in the English language: Wikipedia: Letter Frequency

You could even go as far as looking at the frequencies of paired letters or sequences of three letters, but at that point you're actually implementing the same idea as the Markov chain others have suggested. Is it more important that the "fake words" look potentially authentic to humans, or are the statistical properties of the words more important, such as in cryptographic applications?

JohnE 2009-12-30 15:15:27

ansaurus

tags:

views:

answers:

How do you create an english like word?

related questions