views:

5877

answers:

10

I am looking to refactor a c# method into a c function in an attempt to gain some speed, and then call the c dll in c# to allow my program to use the functionality.

Currently the c# method takes a list of integers and returns a list of lists of integers. The method calculated the power set of the integers so an input of 3 ints would produce the following output (at this stage the values of the ints is not important as it is used as an internal weighting value)

1
2
3
1,2
1,3
2,3
1,2,3

Where each line represents a list of integers. The output indicates the index (with an offset of 1) of the first list, not the value. So 1,2 indicates that the element at index 0 and 1 are an element of the power set.

I am unfamiliar with c, so what are my best options for data structures that will allow the c# to access the returned data?

Thanks in advance

Update

Thank you all for your comments so far. Here is a bit of a background to the nature of the problem.

The iterative method for calculating the power set of a set is fairly straight forward. Two loops and a bit of bit manipulation is all there is to it really. It just get called..a lot (in fact billions of times if the size of the set is big enough).

My thoughs around using c (c++ as people have pointed out) are that it gives more scope for performance tuning. A direct port may not offer any increase, but it opens the way for more involved methods to get a bit more speed out of it. Even a small increase per iteration would equate to a measurable increase.

My idea was to port a direct version and then work to increase it. And then refactor it over time (with help from everyone here at SO).

Update 2

Another fair point from jalf, I dont have to use list or equivelent. If there is a better way then I am open to suggestions. The only reason for the list was that each set of results is not the same size.

The code so far...

public List<List<int>> powerset(List<int> currentGroupList)
    {
        _currentGroupList = currentGroupList;
        int max;
        int count;


        //Count the objects in the group
        count = _currentGroupList.Count;
        max = (int)Math.Pow(2, count);

        //outer loop
        for (int i = 0; i < max; i++)
        {

            _currentSet = new List<int>();

            //inner loop
            for (int j = 0; j < count; j++)
            {              
                if ((i & (1 << j)) == 0)
                {
                    _currentSetList.Add(_currentGroupList.ElementAt(j));                          
                }
            }


            outputList.Add(_currentSetList);

        }   

        return outputList;
    }

As you can see, not a lot to it. It just goes round and round a lot!

I accept that the creating and building of lists may not be the most efficient way, but I need some way of providing the results back in a manageable way.

Update 2

Thanks for all the input and implementation work. Just to clarify a couple of points raised: I dont need the output to be in 'natural order', and also I am not that interested in the empty set being returned.

hughdbrown's implementation is intesting but i think that i will need to store the results (or at least a subset of them) at some point. It sounds like memory limitiations will apply long before running time becomes a real issue. Partly because of this, I think I can get away with using bytes instead of integers, giving more potential storage.

The question really is then: Have we reached the maximum speed for this calcualtion in C#? Does the option of unmanaged code provide any more scope. I know in many respects the answer is futile, as even if we havled the time to run, it would only allow an extra values in the original set.

+1  A: 

Does it have to be C, or is C++ an option too? If C++, you can just its own list type from the STL. Otherwise, you'll have to implement your own list - look up linked lists or dynamically sized arrays for pointers on how to do this.

Smokey Bacon Esq.
Don't use C++ list. It is completely different. In C++, list is a linked list. The equivalent of C#'s List<T> is std::vector.
jalf
+7  A: 

If you are looking to use C for a performance gain, most likely you are planning to do so through the use of pointers. C# does allow for use of pointers, using the unsafe keyword. Have you considered that?

Also how will you be calling this code.. will it be called often (e.g. in a loop?) If so, marshalling the data back and forth may more than offset any performance gains.


Follow Up

Take a look at Native code without sacrificing .NET performance for some interop options. There are ways to interop without too much of a performance loss, but those interops can only happen with the simplest of data types.

Though I still think that you should investigate speeding up your code using straight .NET.


Follow Up 2

Also, may I suggest that if you have your heart set on mixing native code and managed code, that you create your library using c++/cli. Below is a simple example. Note that I am not a c++/cli guy, and this code doesn't do anything useful...its just meant to show how easily you can mix native and managed code.

#include "stdafx.h"

using namespace System;

System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList);


int main(array<System::String ^> ^args)
{
    System::Collections::Generic::List<int> ^intList = gcnew System::Collections::Generic::List<int>();

    intList->Add(1);
    intList->Add(2);
    intList->Add(3);
    intList->Add(4);
    intList->Add(5);

    Console::WriteLine("Before Call");
    for each(int i in intList)
    {
     Console::WriteLine(i);
    }

    System::Collections::Generic::List<int> ^modifiedList = MyAlgorithm(intList);

    Console::WriteLine("After Call");
    for each(int i in modifiedList)
    {
     Console::WriteLine(i);
    }
}


System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList)
{
    int* nativeInts = new int[sourceList->Count];

    int nativeIntArraySize = sourceList->Count;

    //Managed to Native
    for(int i=0; i<sourceList->Count; i++)
    {
     nativeInts[i] = sourceList[i];
    }

    //Do Something to native ints
    for(int i=0; i<nativeIntArraySize; i++)
    {
     nativeInts[i]++;
    }


    //Native to Managed
    System::Collections::Generic::List<int> ^returnList = gcnew System::Collections::Generic::List<int>();
    for(int i=0; i<nativeIntArraySize; i++)
    {
     returnList->Add(nativeInts[i]);
    }


    return returnList;
}
Giovanni Galbo
+9  A: 

Also, be sure that moving to C/C++ is really what you need to do for speed to begin with. Instrument the original C# method (standalone, executed via unit tests), instrument the new C/C++ method (again, standalone via unit tests) and see what the real world difference is.

The reason I bring this up is that I fear it may be a pyrhhic victory -- using Smokey Bacon's advice, you get your list class, you're in "faster" C++, but there's still a cost to calling that DLL: Bouncing out of the runtime with P/Invoke or COM interop carries a fairly substantial performance cost.

Be sure you're getting your "money's worth" out of that jump before you do it.

Update based on the OP's Update

If you're calling this loop repeatedly, you need to absolutely make sure that the entire loop logic is encapsulated in a single interop call -- otherwise the overhead of marshalling (as others here have mentioned) will definitely kill you.

I do think, given the description of the problem, that the issue isn't that C#/.NET is "slower" than C, but more likely that the code needs to be optimized. As another poster here mentioned, you can use pointers in C# to seriously boost performance in this kind of loop, without the need for marshalling. I'd look into that first, before jumping into a complex interop world, for this scenario.

John Rudy
+7  A: 

What makes you think you'll gain speed by calling into C code? C isn't magically faster than C#. It can be, of course, but it can also easily be slower (and buggier). Especially when you factor in the p/invoke calls into native code, it's far from certain that this approach will speed up anything.

In any case, C doesn't have anything like List. It has raw arrays and pointers (and you could argue that int** is more or less equivalent), but you're probably better off using C++, which does have equivalent datastructures. In particular, std::vector. There are no simple ways to expose this data to C# however, since it will be scattered pretty much randomly (each list is a pointer to some dynamically allocated memory somewhere)

However, I suspect the biggest performance improvement comes from improving the algorithm in C#.

Edit:

I can see several things in your algorithm that seem suboptimal. Constructing a list of lists isn't free. Perhaps you can create a single list and use different offsets to represent each sublist. Or perhaps using 'yield return' and IEnumerable instead of explicitly constructing lists might be faster.

Have you profiled your code, found out where the time is being spent?

jalf
+5  A: 

I'm also going to put in a vote for tuning-up your C#, particularly by going to 'unsafe' code and losing what might be a lot of bounds-checking overhead.

Even though it's 'unsafe', it's no less 'safe' than C/C++, and it's dramatically easier to get right.

Will Dean
+2  A: 

Your list of results does not match the results your code would produce. In particular, you do not show generating the empty set.

If I were producing powersets that could have a few billion subsets, then generating each subset separately rather than all at once might cut down on your memory requirements, improving your code's speed. How about this:

static class PowerSet<T>
{
 static long[] mask = { 1L << 0, 1L << 1, 1L << 2, 1L << 3, 
         1L << 4, 1L << 5, 1L << 6, 1L << 7, 
         1L << 8, 1L << 9, 1L << 10, 1L << 11, 
         1L << 12, 1L << 13, 1L << 14, 1L << 15, 
         1L << 16, 1L << 17, 1L << 18, 1L << 19, 
         1L << 20, 1L << 21, 1L << 22, 1L << 23, 
         1L << 24, 1L << 25, 1L << 26, 1L << 27, 
         1L << 28, 1L << 29, 1L << 30, 1L << 31};
 static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
 {
  int count = currentGroupList.Length;
  long max = 1L << count;
  for (long iter = 0; iter < max; iter++)
  {
   T[] list = new T[count];
   int k = 0, m = -1;
   for (long i = iter; i != 0; i &= (i - 1))
   {
    while ((mask[++m] & i) == 0)
     ;
    list[k++] = currentGroupList[m];
   }
   yield return list;
  }
 }
}

Then your client code looks like this:

 static void Main(string[] args)
 {
  int[] intList = { 1, 2, 3, 4 };
  foreach (IList<int> set in PowerSet<int>.powerset(intList))
  {
   foreach (int i in set)
    Console.Write("{0} ", i);
   Console.WriteLine();
  }
 }

I'll even throw in a bit-twiddling algorithm with templated arguments for free. For added speed, you can wrap the powerlist() inner loop in an unsafe block. It doesn't make much difference.

On my machine, this code is slightly slower than the OP's code until the sets are 16 or larger. However, all times to 16 elements are less than 0.15 seconds. At 23 elements, it runs in 64% of the time. The original algorithm does not run on my machine for 24 or more elements -- it runs out of memory.

This code takes 12 seconds to generate the power set for the numbers 1 to 24, omitting screen I/O time. That's 16 million-ish in 12 seconds, or about 1400K per second. For a billion (which is what you quoted earlier), that would be about 760 seconds. How long do you think this should take?

hughdbrown
Jimmy
hughdbrown
+2  A: 

Below is a C# algorithm that should be much faster (and use less memory) than the algorithm you posted. It doesn't use the neat binary trick yours uses, and as a result, the code is a good bit longer. It has a few more for loops than yours, and might take a time or two stepping through it with the debugger to fully grok it. But it's actually a simpler approach, once you understand what it's doing.

As a bonus, the returned sets are in a more "natural" order. It would return subsets of the set {1 2 3} in the same order you listed them in your question. That wasn't a focus, but is a side effect of the algorithm used.

In my tests, I found this algorithm to be approximately 4 times faster than the algorithm you posted for a large set of 22 items (which was as large as I could go on my machine without excessive disk-thrashing skewing the results too much). One run of yours took about 15.5 seconds, and mine took about 3.6 seconds.

For smaller lists, the difference is less pronounced. For a set of only 10 items, yours ran 10,000 times in about 7.8 seconds, and mine took about 3.2 seconds. For sets with 5 or fewer items, they run close to the same time. With many iterations, yours runs a little faster.

Anyway, here's the code. Sorry it's so long; I tried to make sure I commented it well.

/* 
 * Made it static, because it shouldn't really use or modify state data.
 * Making it static also saves a tiny bit of call time, because it doesn't
 * have to receive an extra "this" pointer.  Also, accessing a local
 * parameter is a tiny bit faster than accessing a class member, because
 * dereferencing the "this" pointer is not free.
 * 
 * Made it generic so that the same code can handle sets of any type.
 */
static IList<IList<T>> PowerSet<T>(IList<T> set){
    if(set == null)
        throw new ArgumentNullException("set");

    /*
     * Caveat:
     * If set.Count > 30, this function pukes all over itself without so
     * much as wiping up afterwards.  Even for 30 elements, though, the
     * result set is about 68 GB (if "set" is comprised of ints).  24 or
     * 25 elements is a practical limit for current hardware.
     */
    int   setSize     = set.Count;
    int   subsetCount = 1 << setSize; // MUCH faster than (int)Math.Pow(2, setSize)
    T[][] rtn         = new T[subsetCount][];
    /* 
     * We don't really need dynamic list allocation.  We can calculate
     * in advance the number of subsets ("subsetCount" above), and
     * the size of each subset (0 through setSize).  The performance
     * of List<> is pretty horrible when the initial size is not
     * guessed well.
     */

    int subsetIndex = 0;
    for(int subsetSize = 0; subsetSize <= setSize; subsetSize++){
        /*
         * The "indices" array below is part of how we implement the
         * "natural" ordering of the subsets.  For a subset of size 3,
         * for example, we initialize the indices array with {0, 1, 2};
         * Later, we'll increment each index until we reach setSize,
         * then carry over to the next index.  So, assuming a set size
         * of 5, the second iteration will have indices {0, 1, 3}, the
         * third will have {0, 1, 4}, and the fifth will involve a carry,
         * so we'll have {0, 2, 3}.
         */
        int[] indices = new int[subsetSize];
        for(int i = 1; i < subsetSize; i++)
            indices[i] = i;

        /*
         * Now we'll iterate over all the subsets we need to make for the
         * current subset size.  The number of subsets of a given size
         * is easily determined with combination (nCr).  In other words,
         * if I have 5 items in my set and I want all subsets of size 3,
         * I need 5-pick-3, or 5C3 = 5! / 3!(5 - 3)! = 10.
         */
        for(int i = Combination(setSize, subsetSize); i > 0; i--){
            /*
             * Copy the items from the input set according to the
             * indices we've already set up.  Alternatively, if you
             * just wanted the indices in your output, you could
             * just dup the index array here (but make sure you dup!
             * Otherwise the setup step at the bottom of this for
             * loop will mess up your output list!  You'll also want
             * to change the function's return type to
             * IList<IList<int>> in that case.
             */
            T[] subset = new T[subsetSize];
            for(int j = 0; j < subsetSize; j++)
                subset[j] = set[indices[j]];

            /* Add the subset to the return */
            rtn[subsetIndex++] = subset;

            /*
             * Set up indices for next subset.  This looks a lot
             * messier than it is.  It simply increments the
             * right-most index until it overflows, then carries
             * over left as far as it needs to.  I've made the
             * logic as fast as I could, which is why it's hairy-
             * looking.  Note that the inner for loop won't
             * actually run as long as a carry isn't required,
             * and will run at most once in any case.  The outer
             * loop will go through as few iterations as required.
             * 
             * You may notice that this logic doesn't check the
             * end case (when the left-most digit overflows).  It
             * doesn't need to, since the loop up above won't
             * execute again in that case, anyway.  There's no
             * reason to waste time checking that here.
             */
            for(int j = subsetSize - 1; j >= 0; j--)
                if(++indices[j] <= setSize - subsetSize + j){
                    for(int k = j + 1; k < subsetSize; k++)
                        indices[k] = indices[k - 1] + 1;
                    break;
                }
        }
    }
    return rtn;
}

static int Combination(int n, int r){
    if(r == 0 || r == n)
        return 1;

    /*
     * The formula for combination is:
     *
     *       n!
     *   ----------
     *   r!(n - r)!
     *
     * We'll actually use a slightly modified version here.  The above
     * formula forces us to calculate (n - r)! twice.  Instead, we only
     * multiply for the numerator the factors of n! that aren't canceled
     * out by (n - r)! in the denominator.
     */

    /*
     * nCr == nC(n - r)
     * We can use this fact to reduce the number of multiplications we
     * perform, as well as the incidence of overflow, where r > n / 2
     */
    if(r > n / 2) /* We DO want integer truncation here (7 / 2 = 3) */
        r = n - r;

    /*
     * I originally used all integer math below, with some complicated
     * logic and another function to handle cases where the intermediate
     * results overflowed a 32-bit int.  It was pretty ugly.  In later
     * testing, I found that the more generalized double-precision
     * floating-point approach was actually *faster*, so there was no
     * need for the ugly code.  But if you want to see a giant WTF, look
     * at the edit history for this post!
     */

    double denominator = Factorial(r);
    double numerator   = n;
    while(--r > 0)
        numerator *= --n;

    return (int)(numerator / denominator + 0.1/* Deal with rounding errors. */);
}

/*
 * The archetypical factorial implementation is recursive, and is perhaps
 * the most often used demonstration of recursion in text books and other
 * materials.  It's unfortunate, however, that few texts point out that
 * it's nearly as simple to write an iterative factorial function that
 * will perform better (although tail-end recursion, if implemented by
 * the compiler, will help to close the gap).
 */
static double Factorial(int x){
    /*
     * An all-purpose factorial function would handle negative numbers
     * correctly - the result should be Sign(x) * Factorial(Abs(x)) -
     * but since we don't need that functionality, we're better off
     * saving the few extra clock cycles it would take.
     */

    /*
     * I originally used all integer math below, but found that the
     * double-precision floating-point version is not only more
     * general, but also *faster*!
     */

    if(x < 2)
        return 1;

    double rtn = x;
    while(--x > 1)
        rtn *= x;

    return rtn;
}
P Daddy
Okay, so that is some pretty good code.Here's what I don't understand: when you increase the problem size from n to n+1, the amount of data should double and the running time should double, too. That's not what I see: 18 takes 0.16 sec and 24 takes 0.97 sec. 64x the data takes 6 times as long.
hughdbrown
The data more than doubles. The number of subsets doubles, and the total number of members of those sets increases by (n + 1) * 2 / n, which means it quadruples as n goes from 1 to 2, triples as n goes from 2 to 3, and approaches a factor of two as n goes to infinity. (continued...)
P Daddy
Crap! I found a bug... Looks like my numerator in Combination is overflowing. I'll update in a while.
P Daddy
Okay, fixed the bug. I now get about an 80x speed difference between 16 and 22 (24 on my machine is too high), which is about the same as the difference in data size. Good catch, by the way.
P Daddy
Uh, P Daddy, your code continues to be wrong. You have not tested your Factorial() function since modifying it. Here are the results from 10! to 13!:10: 362880011: 3991680012: 47900160013: 1932053504Notice that 13! should have at least as many 0's as 12!.
hughdbrown
Try your current code on set of size 26. It blows up because 26C13 is incorrect.
hughdbrown
Use this, eliminate Factorial(): static long Combination(long n, long r) { r = (r > n - r) ? (n - r) : r; if (r == 0) return 1; long result = 1; long k = 1; while (r-- > 0) { result *= n--; result /= k++; } return result; }
hughdbrown
??? Factorial(13) returns 6227020800. Combination(26, 13) returns 10400600. Both are correct values. What code are you testing?
P Daddy
I retried your float-based Factorial code. It generates the results you said. I have no idea how I was getting the results I cited earlier. Still, the Combination function I offered is faster.Please look at:http://www.iwebthereforeiam.com/files/TestYieldReturn.zipYours is Powerset3.cs.
hughdbrown
Your latest code is fast and elegant. You should post it as a new answer. I want to point out a few things, though. (1) You changed my implementation in your tests. Your PowerSet3 is slower than my code above. My original is still slower than your PowerSet4, but only just. (continued...)
P Daddy
... (2) Your implementation seems to benefit most from not allocating all the memory for the entire power set. I changed yours to return IList<IList<T>> and it ran significantly slower. I haven't figured out exactly why, yet. (Note that it's still much faster than the OP's code.) (continued...)
P Daddy
... (3) In your output, not only are the sets not in "natural" order, but the order of the elements within each set is unpredictable (i.e., the set elements are not sorted). I'm not sure if this is important to the OP or not. (A true set has no order, so maybe not.) (continued...)
P Daddy
I must reiterate, though, your algorithm is really neat, and you should post it. Your idea of using yield return is really good and will save a lot of time and memory if the OP needs to visit each subset only once. My implementation requires 2**n * (8 + n * b / 2) storage (continued...)
P Daddy
...where n is the number of elements in the original set and b is the size in bytes of each element in the set (the 8 is an approximation of the overhead required for each array: 4 bytes for the count value and 4 bytes for the pointer to the array itself) (continued...)
P Daddy
...whereas yours peaks out at only about n * (20 + b * 2) (the 20 + b is approximate storage for your dictionary), so while mine would require about 68 GB to store an entire power set of 30 items 4 bytes large, yours would use less than 1 KB at any given time.
P Daddy
I changed your implementation to (1) use yield return for memory and (2) cached/reduced calculation of Combination. When I ran it, both my implementations were faster than yours when n>12 or so. By 26, it is about 40% faster than yours.
hughdbrown
A: 

I concur with the "optimize .NET first" opinion. It's the most painless. I imagine that if you wrote some unmanaged .NET code using C# pointers, it'd be identical to C execution, except for the VM overhead.

Paul Nathan
A: 

P Daddy:

You could change your Combination() code to this:

 static long Combination(long n, long r)
 {
     r = (r > n - r) ? (n - r) : r;
     if (r == 0)
         return 1;
     long result = 1;
     long k = 1;
     while (r-- > 0)
     {
         result *= n--;
         result /= k++;
     }

     return result;
 }

This will reduce the multiplications and the chance of overflow to a minimum.

hughdbrown
That's an interesting approach. It kind of misses the point, though. Performance is the main goal, not generality, and the performance of this is rather abysmal, I'm sorry to say. Also, it suffers from the same overflow problems with large inputs (It appears at n = 31 and r = 13). (... continued)
P Daddy
In testing this, though, I discovered that a much simpler implementation that uses doubles to store intermediate results, which I knew would be more general, is also faster! I'm going to change mine again.
P Daddy
Your latest revision is *much* better, but a still about 3-4 times slower than floating-point version in my post. You've traded multiplications for divisions, but divisions (even integer) are significantly slower than multiplications.
P Daddy
Okay, generate a table of combinations and put it in your code for lookup at runtime.See python code http://www.iwebthereforeiam.com/files/combination_table.py
hughdbrown
+3  A: 

This returns one set of a powerset at a time. It is based on python code here. It works for powersets of over 32 elements. If you need fewer than 32, you can change long to int. It is pretty fast -- faster than my previous algorithm and faster than (my modified to use yield return version of) P Daddy's code.

static class PowerSet4<T>
{
 static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
 {
  int count = currentGroupList.Length;
  Dictionary<long, T> powerToIndex = new Dictionary<long, T>();
  long mask = 1L;
  for (int i = 0; i < count; i++)
  {
   powerToIndex[mask] = currentGroupList[i];
   mask <<= 1;
  }

  Dictionary<long, T> result = new Dictionary<long, T>();
  yield return result.Values.ToArray();

  long max = 1L << count;
  for (long i = 1L; i < max; i++)
  {
   long key = i & -i;
   if (result.ContainsKey(key))
    result.Remove(key);
   else
    result[key] = powerToIndex[key];
   yield return result.Values.ToArray();
  }
 }
}

You can download all the fastest versions I have tested here.

I really think that using yield return is the change that makes calculating large powersets possible. Allocating large amounts of memory upfront increases runtime dramatically and causes algorithms to fail for lack of memory very early on. Original Poster should figure out how many sets of a powerset he needs at once. Holding all of them is not really an option with >24 elements.

hughdbrown
I think you are correct that it is the yeild that makes the difference. I am very impressed by your tenacity to provide the fastest possible implementation, as well as several very valid points made. Honourable mention to P Daddy as well, and to all who have provided input.
jheppinstall
P Daddy had a cool idea that I have not called out yet -- the allocation of fixed-size arrays that are the right length. In contrast, resizing Lists is expensive. My other code makes Lists of maximum length so as not to reallocate.
hughdbrown
If I were doing this I'd write a custom IList<T> class that wraps around a array whose size is fixed at currentGroupList.Length. You could easily get better performance than the dictionary-based implementation here.
Qwertie