views:

988

answers:

9

What's the best algorithm to find all binary strings of length n that contain k bits set? For example, if n=4 and k=3, there are...

0111
1011
1101
1110

I need a good way to generate these given any n and any k so I'd prefer it to be done with strings.

+3  A: 

Python

import itertools

def kbits(n, k):
    result = []
    for bits in itertools.combinations(range(n), k):
        s = ['0'] * n
        for bit in bits:
            s[bit] = '1'
        result.append(''.join(s))
    return result

print kbits(4, 3)

Output: ['1110', '1101', '1011', '0111']

Explanation:

Essentially we need to choose the positions of the 1-bits. There are n choose k ways of choosing k bits among n total bits. itertools is a nice module that does this for us. itertools.combinations(range(n), k) will choose k bits from [0, 1, 2 ... n-1] and then it's just a matter of building the string given those bit indexes.

Since you aren't using Python, look at the pseudo-code for itertools.combinations here:

http://docs.python.org/library/itertools.html#itertools.combinations

Should be easy to implement in any language.

Here's a Java combination generator:

http://www.merriampark.com/comb.htm

FogleBird
Do you know of a language independent solution? This depends on python's itertools but my program is not written in python.
kevmo314
See my edit. The docs show how itertools.combinations is implemented. You can easily port it to whatever language you're using.
FogleBird
I added a link to a Java combination generator.
FogleBird
+1  A: 

One algorithm that should work:

generate-strings(prefix, len, numBits) -> String:
    if (len == 0):
        print prefix
        return
    if (len == numBits):
        print prefix + (len x "1")
    generate-strings(prefix + "0", len-1, numBits)
    generate-strings(prefix + "1", len-1, numBits)

Good luck!

Chip Uni
Ah, with a little modification, this algorithm works. Thanks! I'll post the modification in the original question.
kevmo314
Though, after consideration, this does produce a lot of dead branches in the tree. I'll have to test this with larger n values.
kevmo314
Ah, yeah it looks like the runtime for this algorithm will take too long for the data sets I need tested. I'm looking at n=32 and k=7 specifically, but I need the flexibility of scale for future tests.
kevmo314
FWIW, my algorithm runs in about 5 seconds for (32, 7)... 3.3 million combinations. And that's Python, Java will be faster.
FogleBird
A: 

One possible 1.5-liner:

$ python -c 'import itertools; \
             print set([ n for n in itertools.permutations("0111", 4)])'

set([('1', '1', '1', '0'), ('0', '1', '1', '1'), ..., ('1', '0', '1', '1')])

.. where k is the number of 1s in "0111".

The itertools module explains equivalents for its methods; see the equivalent for the permutation method.

The MYYN
Nice, but won't scale as well, especially when n is large and k is small-ish.
FogleBird
+2  A: 

Forget about implementation ("be it done with strings" is obviously an implementation issue!) -- think about the algorithm, for Pete's sake... just as in, your very first TAG, man!

What you're looking for is all combinations of K items out of a set of N (the indices, 0 to N-1 , of the set bits). That's obviously simplest to express recursively, e.g., pseudocode:

combinations(K, setN):
  if k > length(setN): return "no combinations possible"
  if k == 0: return "empty combination"
  # combinations INcluding the first item:
  return (first-item-of setN) combined combinations(K-1, all-but-first-of setN)
   union combinations(K-1, all-but-first-of setN)

i.e., the first item is either present or absent: if present, you have K-1 left to go (from the tail aka all-but-firs), if absent, still K left to go.

Pattern-matching functional languages like SML or Haskell may be best to express this pseudocode (procedural ones, like my big love Python, may actually mask the problem too deeply by including too-rich functionality, such as itertools.combinations, which does all the hard work for you and therefore HIDES it from you!).

What are you most familiar with, for this purpose -- Scheme, SML, Haskell, ...? I'll be happy to translate the above pseudocode for you. I can do it in languages such as Python too, of course -- but since the point is getting you to understand the mechanics for this homework assignment, I won't use too-rich functionality such as itertools.combinations, but rather recursion (and recursion-elimination, if needed) on more obvious primitives (such as head, tail, and concatenation). But please DO let us know what pseudocode-like language you're most familiar with! (You DO understand that the problem you state is identically equipotent to "get all combinations of K items out or range(N)", right?).

Alex Martelli
Dang, Alex -- you scared **me** with your vehemence. Would you like some chamomile tea?
Chip Uni
@Chip, "vehemence"?! You ain't seen nuttin' yet -- remember, I started out designing (digital) chips, so this kind of problems really stirs my Italian blood!-)
Alex Martelli
You love itertools and you know it.
FogleBird
Uh, first of all, this isn't a homework assignment. Second, I'm using Java but that really shouldn't matter. While itertools.combinations is a python specific solution, I suppose I can implement it in Java, but that's another potential source of redundancy in a program that already runs slower than I had intended. The execution time for the program is already in the range of days but I'm able to find the computing power to brute force it as this is an NP-complete problem. I just don't need it to be any longer.
kevmo314
The problem I'm referring to as NP-complete isn't this binary string problem, rather the matching preclusion problem that I'm trying to solve that requires this algorithm. Sorry.
kevmo314
+1  A: 

This C# method returns an enumerator that creates all combinations. As it creates the combinations as you enumerate them it only uses stack space, so it's not limited by memory space in the number of combinations that it can create.

This is the first version that I came up with. It's limited by the stack space to a length of about 2700:

static IEnumerable<string> BinStrings(int length, int bits) {
  if (length == 1) {
    yield return bits.ToString();
  } else {
    if (length > bits) {
      foreach (string s in BinStrings(length - 1, bits)) {
        yield return "0" + s;
      }
    }
    if (bits > 0) {
      foreach (string s in BinStrings(length - 1, bits - 1)) {
        yield return "1" + s;
      }
    }
  }
}

This is the second version, that uses a binary split rather than splitting off the first character, so it uses the stack much more efficiently. It's only limited by the memory space for the string that it creates in each iteration, and I have tested it up to a length of 10000000:

static IEnumerable<string> BinStrings(int length, int bits) {
  if (length == 1) {
    yield return bits.ToString();
  } else {
    int first = length / 2;
    int last = length - first;
    int low = Math.Max(0, bits - last);
    int high = Math.Min(bits, first);
    for (int i = low; i <= high; i++) {
      foreach (string f in BinStrings(first, i)) {
        foreach (string l in BinStrings(last, bits - i)) {
          yield return f + l;
        }
      }
    }
  }
}
Guffa
A: 

I would try recursion.

There are n digits with k of them 1s. Another way to view this is sequence of k+1 slots with n-k 0s distributed among them. That is, (a run of 0s followed by a 1) k times, then followed by another run of 0s. Any of these runs can be of length zero, but the total length needs to be n-k.

Represent this as an array of k+1 integers. Convert to a string at the bottom of the recursion.

Recursively call to depth n-k, a method that increments one element of the array before a recursive call and then decrements it, k+1 times.

At the depth of n-k, output the string.

int[] run = new int[k+1];

void recur(int depth) {
    if(depth == 0){
        output();
        return;
    }

    for(int i = 0; i < k + 1; ++i){
        ++run[i];
        recur(depth - 1);
        --run[i];
    }

public static void main(string[] arrrgghhs) {
    recur(n - k);
}

It's been a while since I have done Java, so there are probably some errors in this code, but the idea should work.

UncleO
A: 

Are strings faster than an array of ints? All the solutions prepending to strings probably result in a copy of the string at each iteration.

So probably the most efficient way would be an array of int or char that you append to. Java has efficient growable containers, right? Use that, if it's faster than string. Or if BigInteger is efficient, it's certainly compact, since each bit only takes a bit, not a whole byte or int. But then to iterate over the bits you need to & mask a bit, and bitshift the mask to the next bit position. So probably slower, unless JIT compilers are good at that these days.

I would post this a a comment on the original question, but my karma isn't high enough. Sorry.

Peter Cordes
+1  A: 

This method will generate all integers with exactly N '1' bits

http://www-graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation

e.g. all pairs of bits in positions 0-4 (N=2):

[00011b, 00101b, 00110b, 01001b, 01010b, 01100b, 10001b, 10010b, 10100b, 11000b]
= [3, 5, 6, 9, 10, 12, 17, 18, 20, 24]

finnw
A: 

One problem with many of the standard solutions to this problem is that the entire set of strings is generated and then those are iterated through, which may exhaust the stack. It quickly becomes unwieldy for any but the smallest sets. In addition, in many instances, only a partial sampling is needed, but the standard (recursive) solutions generally chop the problem into pieces that are heavily biased to one direction (eg. consider all the solutions with a zero starting bit, and then all the solutions with a one starting bit).

In many cases, it would be more desireable to be able to pass a bit string (specifying element selection) to a function and have it return the next bit string in such a way as to have a minimal change (this is known as a Gray Code) and to have a representation of all the elements.

Donald Knuth covers a whole host of algorithms for this in his Fascicle 3A, section 7.2.1.3: Generating all Combinations.

There is an approach for tackling the iterative Gray Code algorithm for all ways of choosing k elements from n at http://answers.yahoo.com/question/index?qid=20081208224633AA0gdMl with a link to final PHP code listed in the comment (click to expand it) at the bottom of the page.

Csaba Gabor