views:

95

answers:

2

I just began reading more about Markov Chain Generators today, and am really intrigued by the whole process of building one. From my understanding, the future state is dependent upon the statistical past states to the present.

Example:

Hello World. Hello Dolly. Hello World.

"World" follows "Hello" ~66% of the time in that source.

If that is always the case, how then do you avoid out-putting the same results each time? The statistical-occurrences won't change with a static string, so am I right to assume that no variants will every be generated, unless the source-data is modified in some way?

How could I get variations from a static-source, considering the statistical-values, yet allowing some flexibility? Using my example above, how do I allow my generator to follow "Hello" with "Dolly," when "Dolly" only follows "Hello" 33% of the time?

I guess what I'm asking is, How do I base the probability of my next selection on the statistical presence of the words that follow my present selection? That way, "Dolly" shows up 33% of the time, and "World" shows up 66% of the time - or am I completely lost here?

+3  A: 

You use a random number generator to pick which path you go down. You have to save each state (which is really a history of N previous items) and the probabilities for that state. Then you pick a random number and decide based on it what the next state you transition to is.

In your example you have a Markov chain with an N of 1 you would have a chain structure that looked something like this:

<start> -> Hello : 1.0

Hello -> World. : 0.66666
Hello -> Dolly. : 0.33333

Dolly. -> Hello : 1.0

World. -> <end> : 0.5
World. -> Hello : 0.5

If your current state is Hello, then your next possible states are World. and Dolly.. Generate a random number between 0 and 1 and choose World. if it's less than 0.666666, otherwise choose Dolly.

With an N=2 Markov chain, you get almost deterministic behavior with that input:

<start> <start> -> <start> Hello : 1.0

<start> Hello -> Hello World. : 1.0

Hello World. -> World. Hello : 0.5
Hello World. -> World. <end> : 0.5

World. Hello -> Hello Dolly. : 1.0

Hello Dolly. -> Dolly. Hello : 1.0

Dolly. Hello -> Hello World. : 1.0
Omnifarious
A: 

Two comments:

1) To generate samples from a random process, whether or not a certain choice is quite (>50%) likely, and others less so, just requires a weighted "coin flip": generate a random real number uniformly on [0,1), and consider the possibilities in the same fixed order, keeping a sum of probabilities so far. As soon as that sum exceeds your randomly chosen number, select that choice. If your choices have unnormalized (not summing to 1) probabilities, you first need to compute the sum of probabilities s, and either divide them all by s, or choose your random number on [0,s)

2) To prevent overfitting when estimating your model from a small amount of sample training data (compared to the number of parameters), use Bayesian priors on the model parameters. For a really cool example of this, where the number of model parameters (history size) isn't fixed to any finite number in advance, see the Infinite HMM. If you don't use Bayesian methods, then you'll want to choose the history length appropriately for the amount of training data you have, and/or implement some ad-hoc smoothing (e.g. linear interpolation between an order-2 and order-1 model).

wrang-wrang

related questions