views:

599

answers:

4

Given a list of common words, sorted in order of prevalence of use, is it possible to form word combinations of an arbitrary length (any desired number of words) in order of the 'most common' sequences. For example,if the most common words are 'a, b, c' then for combinations of length two, the following would be generated:

aa
ab
ba
bb
ac
bc
ca
cb
cc

Here is the correct list for length 3:

aaa
aab
aba
abb
baa
bab
bba
bbb
aac
abc
bac
bbc
aca
acb
bca
bcb
acc
bcc
caa
cab
cba
cbb
cac
cbc
cca
ccb
ccc

This is simple to implement for combinations of 2 or 3 words (set length) for any number of elements, but can this be done for arbitrary lengths? I want to implement this in PHP, but pseudocode or even a summary of the algorithm would be much appreciated!

A: 

I googled for php permutations and got: http://www.php.happycodings.com/Algorithms/code21.html

I haven't looked into the code if it is good or not. But it seems to do what you want.

WizKid
What I need to do is not actually permutation, which does not allow for the repetition of values. Interestingly, although the solution you provided calls itself a "Permutation Generator" it appears to do exactly what I need. Thanks very much for your response!
Dustin Fineout
Upon closer inspection of the code (actually reading it), the solution provided actually does not do what I need. I understand that my example looks very similar, but what I need to combine is words, not just letters, and the solution provided only operates on single letters.
Dustin Fineout
A: 

I don't know what the term is for what you're trying to calculate, but it's not combinations or even permutations, it's some sort of permutations-with-repetition.

Below I've enclosed some slightly-adapted code from the nearest thing I have lying around that does something like this, a string permutation generator in LPC. For a, b, c it generates

abc
bac
bca
acb
cab
cba

Probably it can be tweaked to enable the repetition behavior you want.

varargs mixed array permutations(mixed array list, int num) {
    mixed array out = ({});
    foreach(mixed item : permutations(list[1..], num - 1))
        for(int i = 0, int j = sizeof(item); i <= j; i++)
            out += ({ implode(item[0 .. i - 1] + ({ list[0] }) + item[i..], "") });
    if(num < sizeof(list))
        out += permutations(list[1..], num);
    return out;
}

FWIW, another way of stating your problem is that, for an input of N elements, you want the set of all paths of length N in a fully-connected, self-connected graph with the input elements as nodes.

chaos
You're right, what I am doing isn't permutation and isn't exactly combination either, it's a sort of ordered combination. In any case, the order and repetition are the two central necessities of what I'm attempting to do.
Dustin Fineout
A: 

Hi, I'm assuming that when saying it's easy for fixed length, you're using m nested loops, where m is the lenght of the sequence (2 and 3 in your examples).

You could use recursion like this:

Your words are numbered 0, 1, .. n, you need to generate all sequences of length m:

generate all sequences of length m:
{
    start with 0, and generate all sequences of length m-1
    start with 1, and generate all sequences of length m-1
    ...
    start with n, and generate all sequences of length m-1 
}

generate all sequences of length 0
{
    // nothing to do
}

How to implement this? Well, in each call you can push one more element to the end of the array, and when you hit the end of the recursion, print out array's contents:

// m is remaining length of sequence, elements is array with numbers so far
generate(m, elements)
{
    if (m == 0)
    {
        for j = 0 to elements.length print(words[j]);
    }
    else
    {
        for i = 0 to n - 1
        {
            generate(m-1, elements.push(i));
        }   
    }
}

And finally, call it like this: generate(6, array())

Martin Konicek
This seems *close* to what I need. My implementation of this in PHP renders different results than what I need though - view source at http://stage.dustinfineout.com/stackoverflow/20090621/word_combinations_a.phps or view result by changing the extension 'phps' to 'php'. Let me know if you see any bugs in my implementation or if you have an idea of what needs to change. Thanks!
Dustin Fineout
Hi, sorry for late response, replace the line"for ($i = 0; $i <= ($m-1); $i++)" with "for ($i = 0; $i < count($words); $i++)". I did a typo, corrected it now.
Martin Konicek
Getting closer! This generates all the combinations I need. However, I need the order to be a bit different; see my updated list in the question - for length 3 and words 'a, b, c' the list must exactly match that shown. For example, 'aba' must come before 'aac'. You can view my new source for version b (different link than last one) at http://stage.dustinfineout.com/stackoverflow/20090621/word_combinations_b.phps - thanks again for your help!
Dustin Fineout
Hmm, that's interesting.. I can't find any order in that sequence. Why is aac less common than baa? Maybe because 0+0+2 > 1+0+0 (taking the sum of importances). But then, the first 4 elements should be: aaa, aab, aba, baa. To me, it seems that 'c' is somehow special, because 'a' and 'b' are alternating at the first position, but sequences starting with 'c' come at the very end. Maybe if you could post the code that generates the sequence for fixed length = 3..
Martin Konicek
I will admit it's a difficult ordering to grasp, it took me 5 minutes to do that by hand. You cannot generate the sum of weights to find the order. There is a very strict and methodical ordering. The first letter can't be C until all combinations using A and B in the first position and having a C in either second or third positions have already been generated. This is also why AAC comes after BBB. We don't use C at all until all combinations using only A and B are exhausted.
Dustin Fineout
I've posted source code for my script to generate combinations of length 2. http://stage.dustinfineout.com/stackoverflow/20090622/word_combinations_ex-2.phps - this works for any list of words. Here is another version using five "words" a-e, it may help understand the ordering: http://stage.dustinfineout.com/stackoverflow/20090622/word_combinations_ex-2_b.phps - again, both of those scripts can be run by replacing the '.phps' extension with '.php'. Thanks again for your help!
Dustin Fineout
I'm thinking I may just need to add a conditional or two to your algorithm to get it to respect the order I need.
Dustin Fineout
+1  A: 

Here's a recursive function that might be what you need. The idea is, when given a length and a letter, to first generate all sequences that are one letter shorter that don't include that letter. Add the new letter to the end and you have the first part of the sequence that involves that letter. Then move the new letter to the left. Cycle through each sequence of letters including the new one to the right.

So if you had gen(5, d) It would start with

(aaaa)d
(aaab)d
...
(cccc)d

then when it got done with the a-c combinations it would do

(aaa)d(a)
...
(aaa)d(d)
(aab)d(d)
... 
(ccc)d(d)

then when it got done with d as the 4th letter it would move it to the 3rd

(aa)d(aa)

etc., etc.

<?php 
/** 
 * Word Combinations (version c) 6/22/2009 1:20:14 PM 
 * 
 * Based on pseudocode in answer provided by Erika: 
 *   http://stackoverflow.com/questions/1024471/generating-ordered-weighted-combinations-of-arbitrary-length-in-php/1028356#1028356 
 *   (direct link to Erika's answer) 
 * 
 * To see the results of this script, run it: 
 *   http://stage.dustinfineout.com/stackoverflow/20090622/word_combinations_c.php 
**/ 

init_generator(); 

function init_generator() { 
    global $words; 
    $words = array('a','b','c'); 
    generate_all(5);


} 

function generate_all($len){
    global $words;
    for($i = 0; $i < count($words); $i++){
        $res = generate($len, $i); 

     echo join("<br />", $res);  
     echo("<br/>");
    } 
}

function generate($len, $max_index = -1){ 
    global $words; 

    // WHEN max_index IS NEGATIVE, STARTING POSITION 
    if ($max_index < 0) { 
        $max_index = count($words) - 1; 
    } 

    $list = array(); 


    if ($len <= 0) { 
     $list[] = "";
        return $list; 
    } 

    if ($len == 1) { 

        if ($max_index >= 1) { 
      $add = generate(1, ($max_index - 1));
      foreach ($add as $addit) { 
                $list[] = $addit; 
            } 


        } 
     $list[] = $words[$max_index]; 
        return $list; 
    } 

    if($max_index == 0) { 
        $list[] = str_repeat($words[$max_index], $len); 
        return $list; 
    } 

    for ($i = 1; $i <= $len; $i++){ 
        $prefixes = generate(($len - $i), ($max_index - 1)); 
        $postfixes = generate(($i - 1), $max_index); 
        foreach ($prefixes as $pre){ 
      //print "prefix = $pre<br/>";
            foreach ($postfixes as $post){ 
       //print "postfix = $post<br/>";
                $list[] = ($pre . $words[$max_index] . $post); 
            } 
        } 
    } 
    return $list; 
} 

?>
Erika
Yes this is along the lines of what I've been working on. I'm going to mod your pseudocode to respect a max_index as, again, I'm working with an array of *words*, not letters. I'll let you know how it turns out and post the links to my implementation source and results. Thanks Erika!
Dustin Fineout
Hm, not quite there. Check out the source at http://stage.dustinfineout.com/stackoverflow/20090622/word_combinations_c.phps or change 'phps' extension to 'php' to see results. Please advise if you see any errors in my implementation or if this gives you a clue as to what's happening wrong. Thanks again Erika!
Dustin Fineout
I modified a few things in your code. This version works (as far as I can tell). I forgot to mention that the generate function I gave you only gives the output that does include the last character. So to get them all you have to run for each character.
Erika
Just tested for length 3. Getting very close but there are still 6 values excluded in the output (baa, aac, caa, cab, cba, cbb). Also some issues with ordering, for example acc should come *after* bcb, not before bca (never use 2 of c until all combos with one c and any combination of a/b are generated).
Dustin Fineout
Forgot to post links. Source for new version (same as yours except run for length 3 rather than 5) is at http://stage.dustinfineout.com/stackoverflow/20090622/word_combinations_d.phps - to view result, replace .phps extension with .php.
Dustin Fineout
Oh, of course. You'll need to push the logic for generate_all down into generate, so that it really generates all combinations for len-1 (when it starts with c, it's now going back to the b's, but not including the a-only ones). This is left as an exercise for the reader. Good luck!
Erika
Thanks for the assignment and all your help! I'll let you know how it works out for me :)
Dustin Fineout