views:

163

answers:

2

I need to generate three lines of text (essentially jibberish) that are each 60 characters long, including a hard return at the end of each line. The lines are generated from a dictionary of words of various lengths (typically 1-8 characters). No word may be used more than once, and words must be separated by spaces. I think this is essentially a bin-packing problem.

The approach I've taken so far is to create a hashMap of the words, grouped by their lengths. I then choose a random length, pull a word of that length from the map, and append it to the end of the line I'm currently generating, accounting for spaces or a hard return. It works about half the time, but the other half of the time I'm getting stuck in an infinite loop and my program crashes.

One problem I'm running into is this: as I add random words to the lines, groups of words of a given length may become depleted. This is because there are not necessarily the same number of words of each length in the dictionary, e.g., there may only be one word with a length of 1. So, I might need a word of a given length, but there are no longer any words of that length available.

Below is a summary of what I have so far. I'm working in ActionScript, but would appreciate insight into this problem in any language. Many thanks in advance.

dictionary // map of words with word lengths as keys and arrays of corresponding words as values
lengths // array of word lengths, sorted numerically
min = lengths[0] // minimum word length
max = lengths[lengths.length - 1] // maximum word length
line = ""
while ( line.length < 60 ) {
    len = lengths[round( rand() * ( lengths.length - 1 ) )]
    if ( dictionary[len] != null && dictionary[len].length > 0 ) {
        diff = 60 - line.length // number of characters needed to complete the line

        if ( line.length + len + 1 == 60 ) {
            // this word will complete the line exactly
            line += dictionary[len].splice(0, 1) + "\n"
        }
        else if ( min + max + 2 >= diff ) {
            // find the two word lengths that will complete the line
            // ==> this is where I'm having trouble
        }
        else if ( line.length + len + 1 < 60 - max ) {
            // this word will fit safely, so just add it
            line += dictionary[len].splice(0, 1) + " "
        }

        if ( dictionary[len].length == 0 ) {
            // delete any empty arrays and update min and max lengths accordingly
            dictionary[len] = null
            delete dictionary[len]

            i = lengths.indexOf( len )
            if ( i >= 0 ) {
                // words of this length have been depleted, so
                // update lengths array to ensure that next random
                // length is valid
                lengths.splice( i, 1 )
            }
            if ( lengths.indexOf( min ) == -1 ) {
                // update the min
                min = lengths[0]
            }
            if ( lengths.indexOf( max ) == -1 ) {
                // update the max
                max = lengths[lengths.length - 1]
            }
        }
    }
}
+1  A: 

  1. You should think of an n-letter word as being n+1 letters, because each word has either a space or return after it.
  2. Since all your words are at least 2 characters long, you don't ever want to get to a point where you have 59 characters filled in. If you get to 57, you need to pick something that is 2 letters plus the return. If you get to 58, you need a 1-letter word plus the return.
  3. Are you trying to optimize for time? Can you have the same word multiple times? Multiple times in one line? Does it matter if your words are not uniformly distributed, e.g. a lot of lines contain "a" or "I" because those are the only one-letter words in English?

Here's the basic idea. For each line, start choosing word lengths, and keep track of the word lengths and total character count so far. As you get toward the end of the line, choose word lengths less than the number of characters you have left. (e.g. if you have 5 characters left, choose words in the range of 2-5 characters, counting the space.) If you get to 57 characters, pick a 3-letter word (counting return). If you get to 58 characters, pick a 2-letter word (counting return.)

If you want, you can shuffle the word lengths at this point, so all your lines don't end with short words. Then for each word length, pick a word of that length and plug it in.

Vanessa MacDougal
Thank you for your response. I updated my question above with more details. I can't use any word more than once, so I'm deleting words as I use them. As a result, there's no guarantee that there will be a word of the exact length needed when I reach the end of the line.
Bryan
A: 
dictionnary = Group your words by lengths (like you already do)
total_length = 0
phrase = ""

while (total_length < 60){

random_length = generate_random_number(1,8)

if (total_length + random_length > 60)
{
  random_length = 60 - total_length // possibly - 1 if you cound \n and -2 if you 
                                    // append a blank anyway at the end
}

phrase += dictionnary.get_random_word_of_length(random_length) + " "
total_length += random_length + 1 

}
Eric