views:

16952

answers:

17

How would I go about generating a list of all possible permutations of a string between x and y characters in length, containing a variable list of characters.

Any language would work but it should be portable.

+6  A: 

There are several ways to do this. Common methods use recursion, memoization, or dynamic programming. The basic idea is that you produce a list of all strings of length 1, then in each iteration, for all strings produced in the last iteration, add that string concatenated with each character in the string individually. (the variable index in the code below keeps track of the start of the last and the next iteration)

Some pseudocode:

list = originalString.split('')
index = (0,0)
list = [""]
for iteration n in 1 to y:
index = (index[1], len(list))
for string s in list.subset(index[0] to end):
for character c in originalString:
list.add(s + c)

you'd then need to remove all strings less than x in length, they'll be the first (x-1) * len(originalString) entries in the list.

Cheers,
Andy

Mike Stone wrote:

FYI, alumb's pseudocode has a bug...

Good call Mike. I have amended my solution to fix this issue and an initialization bug.

alumb
+3  A: 

I just whipped this up quick in Ruby:

def perms(x, y, possible_characters)
all = [""]
current_array = all.clone
1.upto(y) { |iteration|
next_array = []
current_array.each { |string|
possible_characters.each { |c|
value = string + c
next_array.insert next_array.length, value
all.insert all.length, value
}
}
current_array = next_array
}
all.delete_if { |string| string.length < x }
end

You might look into language API for built in permutation type functions, and you might be able to write more optimized code, but if the numbers are all that high, I'm not sure there is much of a way around having a lot of results.

Anyways, the idea behind the code is start with string of length 0, then keep track of all the strings of length Z where Z is the current size in the iteration. Then, go through each string and append each character onto each string. Finally at the end, remove any that were below the x threshold and return the result.

I didn't test it with potentially meaningless input (null character list, weird values of x and y, etc).

Mike Stone
This code is WRONG. It will generate invalid permutations, such as ones with repeated characters. For example, for string "abc", it generates these permutations of size 3: ["aaa", "aab", "aac", "aba", "abb", "abc", "aca", "acb", "acc", "baa", "bab", "bac", "bba", "bbb", "bbc", "bca", "bcb", "bcc", "caa", "cab", "cac", "cba", "cbb", "cbc", "cca", "ccb", "ccc"]. This is incorrect.
pmc255
A: 

FYI, alumb's pseudocode has a bug... you can't simply iterate on each string in the list and add the character to all... you have to just use all the items from the previous iteration such as in my solution. For example, in his pseudocode, with originalString being "ab", the first iteration will loop over ["a", "b"] and produce ["a", "b", "aa", "ab", "ba", "bb"]. The second iteration will produce ["a", "b", "aa", "ab", "ba", "bb", "aa", "ab", "ba", "bb", "aaa", "aab", "aba", "abb", "baa", "bab", "bba", "bbb"]. Note the duplicates of the size 2 strings because of iterating over the size 1 strings twice. The duplicate count will only get worse as the iterations progress!

Mike Stone
A: 

I'm not sure why you would want to do this in the first place. The resulting set for any moderately large values of x and y will be huge, and will grow exponentially as x and/or y get bigger.

Lets say your set of possible characters is the 26 lowercase letters of the alphabet, and you ask your application to generate all permutations where length = 5. Assuming you don't run out of memory you'll get 11,881,376 (i.e. 26 to the power of 5) strings back. Bump that length up to 6, and you'll get 308,915,776 strings back. These numbers get painfully large, very quickly.

Here's a solution I put together in Java. You'll need to provide two runtime arguments (corresponding to x and y). Have fun.

public class GeneratePermutations {
public static void main(String[] args) {
int lower = Integer.parseInt(args[0]);
int upper = Integer.parseInt(args[1]);

if (upper < lower || upper == 0 || lower == 0) {
System.exit(0);
}

for (int length = lower; length <= upper; length++) {
generate(length, "");
}
}

private static void generate(int length, String partial) {
if (length <= 0) {
System.out.println(partial);
} else {
for (char c = 'a'; c <= 'z'; c++) {
generate(length - 1, partial + c);
}
}
}
}
Brian
A: 

Why generate another dictionary when there are so many freely available? Not a sarcastic or condescending question, but a genuine enquiry as to why you needed a custom one?

Rob Burke
A: 

@Rob Burke Flexibility, the ease to change the available characters, length min and max.

Unkwntech
+5  A: 

You are going to get a lot of strings, that's for sure...

\sum_{i=x}^y{\frac{r!}{{(r-i)}!}}
Where, x and y is how you define them and r is the number of characters we are selecting from --if I am understanding you correctly. You should definitely generate these as needed and not get sloppy and say, generate a powerset and then filter the length of strings.

The following definitely isn't the best way to generate these, but it's an interesting aside, none-the-less.

Knuth (volume 4, fascicle 2, 7.2.1.3) tells us that (s,t)-combination is equivalent to s+1 things taken t at a time with repetition -- an (s,t)-combination is notation used by Knuth that is equal to {t \choose {s+t}. We can figure this out by first generating each (s,t)-combination in binary form (so, of length (s+t)) and counting the number of 0's to the left of each 1.

10001000011101 --> becomes the permutation: {0, 3, 4, 4, 4, 1}

nlucaroni
A: 

In ruby:

str = "a"
100_000_000.times {puts str.next!}

It is quite fast, but it is going to take some time =). Of course, you can start at "aaaaaaaa" if the short strings aren't interesting to you.

I might have misinterpreted the actual question though - in one of the posts it sounded as if you just needed a bruteforce library of strings, but in the main question it sounds like you need to permutate a particular string.

Your problem is somewhat similar to this one: http://beust.com/weblog/archives/000491.html (list all integers in which none of the digits repeat themselves, which resulted in a whole lot of languages solving it, with the ocaml guy using permutations, and some java guy using yet another solution).

An issue with your proposal is that str.next! won't iterate over all printable characters. Your example will only generate lowercase letters - no punctuation, or capital letters.
Jarsen
+1  A: 

This is a translation of Mike's Ruby version, into Common Lisp:

(defun perms (x y original-string)
  (loop with all = (list "")
        with current-array = (list "")
        for iteration from 1 to y
        do (loop with next-array = nil
                 for string in current-array
                 do (loop for c across original-string
                          for value = (concatenate 'string string (string c))
                          do (push value next-array)
                             (push value all))
                    (setf current-array (reverse next-array)))
        finally (return (nreverse (delete-if #'(lambda (el) (< (length el) x)) all)))))

And another version, slightly shorter and using more loop facility features:

(defun perms (x y original-string)
  (loop repeat y
        collect (loop for string in (or (car (last sets)) (list ""))
                      append (loop for c across original-string
                                   collect (concatenate 'string string (string c)))) into sets
        finally (return (loop for set in sets
                              append (loop for el in set when (>= (length el) x) collect el)))))
A: 

Though this doesn't answer your question exactly, here's one way to generate every permutation of the letters from a number of strings of the same length: eg, if your words were "coffee", "joomla" and "moodle", you can expect output like "coodle", "joodee", "joffle", etc.

Basically, the number of combinations is the (number of words) to the power of (number of letters per word). So, choose a random number between 0 and the number of combinations - 1, convert that number to base (number of words), then use each digit of that number as the indicator for which word to take the next letter from.

eg: in the above example. 3 words, 6 letters = 729 combinations. Choose a random number: 465. Convert to base 3: 122020. Take the first letter from word 1, 2nd from word 2, 3rd from word 2, 4th from word 0... and you get... "joofle".

If you wanted all the permutations, just loop from 0 to 728. Of course, if you're just choosing one random value, a much simpler less-confusing way would be to loop over the letters. This method lets you avoid recursion, should you want all the permutations, plus it makes you look like you know Maths(tm)!

If the number of combinations is excessive, you can break it up into a series of smaller words and concatenate them at the end.

nickf
+2  A: 

Recursive solution in C++

int main (int argc, char * const argv[]) {
     string s = "sarp";
     bool used [4];
     permute(0, "", used, s);
}

void permute(int level, string permuted, bool used [], string &original) {
 int length = original.length();

 if(level == length) { // permutation complete, display
  cout << str << endl;
 } else {
  for(int i=0; i<length; i++) { // try to add an unused character
   if(!used[i]) {
    used[i] = true;
    permute(level+1, original[i] + permuted, used, original); // find the permutations starting with this string
    used[i] = false;
   }
  }
}
Sarp Centel
@Sarp Centel: does this code work in C++? <`string`>
Lazer
A: 

Here is a simple word C# recursive solution:

Method:

public ArrayList CalculateWordPermutations(string[] letters, ArrayList words, int index)
        {
            bool finished = true;
            ArrayList newWords = new ArrayList();
            if (words.Count == 0)
            {
                foreach (string letter in letters)
                {
                    words.Add(letter);
                }
            }

            for(int j=index; j<words.Count; j++)
            {
                string word = (string)words[j];
                for(int i =0; i<letters.Length; i++)
                {
                    if(!word.Contains(letters[i]))
                    {
                        finished = false;
                        string newWord = (string)word.Clone();
                        newWord += letters[i];
                        newWords.Add(newWord);
                    }
                }
            }

            foreach (string newWord in newWords)
            {   
                words.Add(newWord);
            }

            if(finished  == false)
            {
                CalculateWordPermutations(letters, words, words.Count - newWords.Count);
            }
            return words;
        }

Calling:

string[] letters = new string[]{"a","b","c"};
ArrayList words = CalculateWordPermutations(letters, new ArrayList(), 0);
Crackerjack
+1  A: 

You might look at "Efficiently Enumerating the Subsets of a Set", which describes an algorithm to do part of what you want - quickly generate all subsets of N characters from length x to y. It contains an implementation in C.

For each subset, you'd still have to generate all the permutations. For instance if you wanted 3 characters from "abcde", this algorithm would give you "abc","abd", "abe"... but you'd have to permute each one to get "acb", "bac", "bca", etc.

AShelly
A: 

In Perl, if you want to restrict yourself to the lowercase alphabet, you can do this:

my @result = ("a" .. "zzzz");

This gives all possible strings between 1 and 4 characters using lowercase characters. For uppercase, change "a" to "A" and "zzzz" to "ZZZZ".

For mixed-case it gets much harder, and probably not doable with one of Perl's builtin operators like that.

Chris Lutz
+2  A: 

Some working Java code based on Sarp's answer:

public class permute {

    static void permute(int level, String permuted,
                    boolean used[], String original) {
        int length = original.length();
        if (level == length) {
            System.out.println(permuted);
        } else {
            for (int i = 0; i < length; i++) {
                if (!used[i]) {
                    used[i] = true;
                    permute(level + 1, permuted + original.charAt(i),
                       used, original);
                    used[i] = false;
                }
            }
        }
    }

    public static void main(String[] args) {
        String s = "hello";
        boolean used[] = {false, false, false, false, false};
        permute(0, "", used, s);
    }
}
Lazer
As a comment, note that for a string with repeated characters this will not produce the unique permutations. This could be solved with a hash, but that might be a problem with long strings.
Glenn
A: 

Here is a simple solution in C#.

It generates only the distinct permutations of a given string.

    static public IEnumerable<string> permute(string word)
    {
        if (word.Length > 1)
        {

            char character = word[0];
            foreach (string subPermute in permute(word.Substring(1)))
            {

                for (int index = 0; index <= subPermute.Length; index++)
                {
                    string pre = subPermute.Substring(0, index);
                    string post = subPermute.Substring(index);

                    if (post.Contains(character))
                            continue;                       

                    yield return pre + character + post;
                }

            }
        }
        else
        {
            yield return word;
        }
    }
Prakhar Gupta
A: 
import java.util.*;

public class all_subsets {
    public static void main(String[] args) {
        String a = "abcd";
        for(String s: all_perm(a)) {
            System.out.println(s);
        }
    }

    public static Set<String> concat(String c, Set<String> lst) {
        HashSet<String> ret_set = new HashSet<String>();
        for(String s: lst) {
            ret_set.add(c+s);
        }
        return ret_set;
    }

    public static HashSet<String> all_perm(String a) {
        HashSet<String> set = new HashSet<String>();
        if(a.length() == 1) {
            set.add(a);
        } else {
            for(int i=0; i<a.length(); i++) {
                set.addAll(concat(a.charAt(i)+"", all_perm(a.substring(0, i)+a.substring(i+1, a.length()))));
            }
        }
        return set;
    }
}
Swapneel Patil