ansaurus

Question

C data structure to mimic C#'s List<List<int>>?

Answer 1

+1 A:

Does it have to be C, or is C++ an option too? If C++, you can just its own list type from the STL. Otherwise, you'll have to implement your own list - look up linked lists or dynamically sized arrays for pointers on how to do this.

Smokey Bacon Esq. 2008-12-05 12:29:20

Don't use C++ list. It is completely different. In C++, list is a linked list. The equivalent of C#'s List<T> is std::vector.

jalf 2008-12-05 12:55:15

Answer 2

+7 A:

If you are looking to use C for a performance gain, most likely you are planning to do so through the use of pointers. C# does allow for use of pointers, using the unsafe keyword. Have you considered that?

Also how will you be calling this code.. will it be called often (e.g. in a loop?) If so, marshalling the data back and forth may more than offset any performance gains.

Follow Up

Take a look at Native code without sacrificing .NET performance for some interop options. There are ways to interop without too much of a performance loss, but those interops can only happen with the simplest of data types.

Though I still think that you should investigate speeding up your code using straight .NET.

Follow Up 2

Also, may I suggest that if you have your heart set on mixing native code and managed code, that you create your library using c++/cli. Below is a simple example. Note that I am not a c++/cli guy, and this code doesn't do anything useful...its just meant to show how easily you can mix native and managed code.

#include "stdafx.h"

using namespace System;

System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList);


int main(array<System::String ^> ^args)
{
    System::Collections::Generic::List<int> ^intList = gcnew System::Collections::Generic::List<int>();

    intList->Add(1);
    intList->Add(2);
    intList->Add(3);
    intList->Add(4);
    intList->Add(5);

    Console::WriteLine("Before Call");
    for each(int i in intList)
    {
     Console::WriteLine(i);
    }

    System::Collections::Generic::List<int> ^modifiedList = MyAlgorithm(intList);

    Console::WriteLine("After Call");
    for each(int i in modifiedList)
    {
     Console::WriteLine(i);
    }
}


System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList)
{
    int* nativeInts = new int[sourceList->Count];

    int nativeIntArraySize = sourceList->Count;

    //Managed to Native
    for(int i=0; i<sourceList->Count; i++)
    {
     nativeInts[i] = sourceList[i];
    }

    //Do Something to native ints
    for(int i=0; i<nativeIntArraySize; i++)
    {
     nativeInts[i]++;
    }


    //Native to Managed
    System::Collections::Generic::List<int> ^returnList = gcnew System::Collections::Generic::List<int>();
    for(int i=0; i<nativeIntArraySize; i++)
    {
     returnList->Add(nativeInts[i]);
    }


    return returnList;
}

Giovanni Galbo 2008-12-05 12:35:34

Answer 3

+9 A:

Also, be sure that moving to C/C++ is really what you need to do for speed to begin with. Instrument the original C# method (standalone, executed via unit tests), instrument the new C/C++ method (again, standalone via unit tests) and see what the real world difference is.

The reason I bring this up is that I fear it may be a pyrhhic victory -- using Smokey Bacon's advice, you get your list class, you're in "faster" C++, but there's still a cost to calling that DLL: Bouncing out of the runtime with P/Invoke or COM interop carries a fairly substantial performance cost.

Be sure you're getting your "money's worth" out of that jump before you do it.

Update based on the OP's Update

If you're calling this loop repeatedly, you need to absolutely make sure that the entire loop logic is encapsulated in a single interop call -- otherwise the overhead of marshalling (as others here have mentioned) will definitely kill you.

I do think, given the description of the problem, that the issue isn't that C#/.NET is "slower" than C, but more likely that the code needs to be optimized. As another poster here mentioned, you can use pointers in C# to seriously boost performance in this kind of loop, without the need for marshalling. I'd look into that first, before jumping into a complex interop world, for this scenario.

John Rudy 2008-12-05 12:36:17

Answer 4

+7 A:

What makes you think you'll gain speed by calling into C code? C isn't magically faster than C#. It can be, of course, but it can also easily be slower (and buggier). Especially when you factor in the p/invoke calls into native code, it's far from certain that this approach will speed up anything.

In any case, C doesn't have anything like List. It has raw arrays and pointers (and you could argue that int** is more or less equivalent), but you're probably better off using C++, which does have equivalent datastructures. In particular, std::vector. There are no simple ways to expose this data to C# however, since it will be scattered pretty much randomly (each list is a pointer to some dynamically allocated memory somewhere)

However, I suspect the biggest performance improvement comes from improving the algorithm in C#.

Edit:

I can see several things in your algorithm that seem suboptimal. Constructing a list of lists isn't free. Perhaps you can create a single list and use different offsets to represent each sublist. Or perhaps using 'yield return' and IEnumerable instead of explicitly constructing lists might be faster.

Have you profiled your code, found out where the time is being spent?

jalf 2008-12-05 13:08:02

Answer 5

+5 A:

I'm also going to put in a vote for tuning-up your C#, particularly by going to 'unsafe' code and losing what might be a lot of bounds-checking overhead.

Even though it's 'unsafe', it's no less 'safe' than C/C++, and it's dramatically easier to get right.

Will Dean 2008-12-05 13:17:07

Answer 6

+2 A:

Your list of results does not match the results your code would produce. In particular, you do not show generating the empty set.

If I were producing powersets that could have a few billion subsets, then generating each subset separately rather than all at once might cut down on your memory requirements, improving your code's speed. How about this:

static class PowerSet<T>
{
 static long[] mask = { 1L << 0, 1L << 1, 1L << 2, 1L << 3, 
         1L << 4, 1L << 5, 1L << 6, 1L << 7, 
         1L << 8, 1L << 9, 1L << 10, 1L << 11, 
         1L << 12, 1L << 13, 1L << 14, 1L << 15, 
         1L << 16, 1L << 17, 1L << 18, 1L << 19, 
         1L << 20, 1L << 21, 1L << 22, 1L << 23, 
         1L << 24, 1L << 25, 1L << 26, 1L << 27, 
         1L << 28, 1L << 29, 1L << 30, 1L << 31};
 static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
 {
  int count = currentGroupList.Length;
  long max = 1L << count;
  for (long iter = 0; iter < max; iter++)
  {
   T[] list = new T[count];
   int k = 0, m = -1;
   for (long i = iter; i != 0; i &= (i - 1))
   {
    while ((mask[++m] & i) == 0)
     ;
    list[k++] = currentGroupList[m];
   }
   yield return list;
  }
 }
}

Then your client code looks like this:

 static void Main(string[] args)
 {
  int[] intList = { 1, 2, 3, 4 };
  foreach (IList<int> set in PowerSet<int>.powerset(intList))
  {
   foreach (int i in set)
    Console.Write("{0} ", i);
   Console.WriteLine();
  }
 }

I'll even throw in a bit-twiddling algorithm with templated arguments for free. For added speed, you can wrap the powerlist() inner loop in an unsafe block. It doesn't make much difference.

On my machine, this code is slightly slower than the OP's code until the sets are 16 or larger. However, all times to 16 elements are less than 0.15 seconds. At 23 elements, it runs in 64% of the time. The original algorithm does not run on my machine for 24 or more elements -- it runs out of memory.

This code takes 12 seconds to generate the power set for the numbers 1 to 24, omitting screen I/O time. That's 16 million-ish in 12 seconds, or about 1400K per second. For a billion (which is what you quoted earlier), that would be about 760 seconds. How long do you think this should take?

hughdbrown 2008-12-07 00:56:53

Jimmy 2008-12-08 17:21:14

hughdbrown 2008-12-08 21:19:44

Answer 7

+2 A:

Below is a C# algorithm that should be much faster (and use less memory) than the algorithm you posted. It doesn't use the neat binary trick yours uses, and as a result, the code is a good bit longer. It has a few more for loops than yours, and might take a time or two stepping through it with the debugger to fully grok it. But it's actually a simpler approach, once you understand what it's doing.

As a bonus, the returned sets are in a more "natural" order. It would return subsets of the set {1 2 3} in the same order you listed them in your question. That wasn't a focus, but is a side effect of the algorithm used.

In my tests, I found this algorithm to be approximately 4 times faster than the algorithm you posted for a large set of 22 items (which was as large as I could go on my machine without excessive disk-thrashing skewing the results too much). One run of yours took about 15.5 seconds, and mine took about 3.6 seconds.

For smaller lists, the difference is less pronounced. For a set of only 10 items, yours ran 10,000 times in about 7.8 seconds, and mine took about 3.2 seconds. For sets with 5 or fewer items, they run close to the same time. With many iterations, yours runs a little faster.

Anyway, here's the code. Sorry it's so long; I tried to make sure I commented it well.

/* 
 * Made it static, because it shouldn't really use or modify state data.
 * Making it static also saves a tiny bit of call time, because it doesn't
 * have to receive an extra "this" pointer.  Also, accessing a local
 * parameter is a tiny bit faster than accessing a class member, because
 * dereferencing the "this" pointer is not free.
 * 
 * Made it generic so that the same code can handle sets of any type.
 */
static IList<IList<T>> PowerSet<T>(IList<T> set){
    if(set == null)
        throw new ArgumentNullException("set");

    /*
     * Caveat:
     * If set.Count > 30, this function pukes all over itself without so
     * much as wiping up afterwards.  Even for 30 elements, though, the
     * result set is about 68 GB (if "set" is comprised of ints).  24 or
     * 25 elements is a practical limit for current hardware.
     */
    int   setSize     = set.Count;
    int   subsetCount = 1 << setSize; // MUCH faster than (int)Math.Pow(2, setSize)
    T[][] rtn         = new T[subsetCount][];
    /* 
     * We don't really need dynamic list allocation.  We can calculate
     * in advance the number of subsets ("subsetCount" above), and
     * the size of each subset (0 through setSize).  The performance
     * of List<> is pretty horrible when the initial size is not
     * guessed well.
     */

    int subsetIndex = 0;
    for(int subsetSize = 0; subsetSize <= setSize; subsetSize++){
        /*
         * The "indices" array below is part of how we implement the
         * "natural" ordering of the subsets.  For a subset of size 3,
         * for example, we initialize the indices array with {0, 1, 2};
         * Later, we'll increment each index until we reach setSize,
         * then carry over to the next index.  So, assuming a set size
         * of 5, the second iteration will have indices {0, 1, 3}, the
         * third will have {0, 1, 4}, and the fifth will involve a carry,
         * so we'll have {0, 2, 3}.
         */
        int[] indices = new int[subsetSize];
        for(int i = 1; i < subsetSize; i++)
            indices[i] = i;

        /*
         * Now we'll iterate over all the subsets we need to make for the
         * current subset size.  The number of subsets of a given size
         * is easily determined with combination (nCr).  In other words,
         * if I have 5 items in my set and I want all subsets of size 3,
         * I need 5-pick-3, or 5C3 = 5! / 3!(5 - 3)! = 10.
         */
        for(int i = Combination(setSize, subsetSize); i > 0; i--){
            /*
             * Copy the items from the input set according to the
             * indices we've already set up.  Alternatively, if you
             * just wanted the indices in your output, you could
             * just dup the index array here (but make sure you dup!
             * Otherwise the setup step at the bottom of this for
             * loop will mess up your output list!  You'll also want
             * to change the function's return type to
             * IList<IList<int>> in that case.
             */
            T[] subset = new T[subsetSize];
            for(int j = 0; j < subsetSize; j++)
                subset[j] = set[indices[j]];

            /* Add the subset to the return */
            rtn[subsetIndex++] = subset;

            /*
             * Set up indices for next subset.  This looks a lot
             * messier than it is.  It simply increments the
             * right-most index until it overflows, then carries
             * over left as far as it needs to.  I've made the
             * logic as fast as I could, which is why it's hairy-
             * looking.  Note that the inner for loop won't
             * actually run as long as a carry isn't required,
             * and will run at most once in any case.  The outer
             * loop will go through as few iterations as required.
             * 
             * You may notice that this logic doesn't check the
             * end case (when the left-most digit overflows).  It
             * doesn't need to, since the loop up above won't
             * execute again in that case, anyway.  There's no
             * reason to waste time checking that here.
             */
            for(int j = subsetSize - 1; j >= 0; j--)
                if(++indices[j] <= setSize - subsetSize + j){
                    for(int k = j + 1; k < subsetSize; k++)
                        indices[k] = indices[k - 1] + 1;
                    break;
                }
        }
    }
    return rtn;
}

static int Combination(int n, int r){
    if(r == 0 || r == n)
        return 1;

    /*
     * The formula for combination is:
     *
     *       n!
     *   ----------
     *   r!(n - r)!
     *
     * We'll actually use a slightly modified version here.  The above
     * formula forces us to calculate (n - r)! twice.  Instead, we only
     * multiply for the numerator the factors of n! that aren't canceled
     * out by (n - r)! in the denominator.
     */

    /*
     * nCr == nC(n - r)
     * We can use this fact to reduce the number of multiplications we
     * perform, as well as the incidence of overflow, where r > n / 2
     */
    if(r > n / 2) /* We DO want integer truncation here (7 / 2 = 3) */
        r = n - r;

    /*
     * I originally used all integer math below, with some complicated
     * logic and another function to handle cases where the intermediate
     * results overflowed a 32-bit int.  It was pretty ugly.  In later
     * testing, I found that the more generalized double-precision
     * floating-point approach was actually *faster*, so there was no
     * need for the ugly code.  But if you want to see a giant WTF, look
     * at the edit history for this post!
     */

    double denominator = Factorial(r);
    double numerator   = n;
    while(--r > 0)
        numerator *= --n;

    return (int)(numerator / denominator + 0.1/* Deal with rounding errors. */);
}

/*
 * The archetypical factorial implementation is recursive, and is perhaps
 * the most often used demonstration of recursion in text books and other
 * materials.  It's unfortunate, however, that few texts point out that
 * it's nearly as simple to write an iterative factorial function that
 * will perform better (although tail-end recursion, if implemented by
 * the compiler, will help to close the gap).
 */
static double Factorial(int x){
    /*
     * An all-purpose factorial function would handle negative numbers
     * correctly - the result should be Sign(x) * Factorial(Abs(x)) -
     * but since we don't need that functionality, we're better off
     * saving the few extra clock cycles it would take.
     */

    /*
     * I originally used all integer math below, but found that the
     * double-precision floating-point version is not only more
     * general, but also *faster*!
     */

    if(x < 2)
        return 1;

    double rtn = x;
    while(--x > 1)
        rtn *= x;

    return rtn;
}

P Daddy 2008-12-07 01:29:38

Okay, so that is some pretty good code.Here's what I don't understand: when you increase the problem size from n to n+1, the amount of data should double and the running time should double, too. That's not what I see: 18 takes 0.16 sec and 24 takes 0.97 sec. 64x the data takes 6 times as long.

hughdbrown 2008-12-07 06:03:23

The data more than doubles. The number of subsets doubles, and the total number of members of those sets increases by (n + 1) * 2 / n, which means it quadruples as n goes from 1 to 2, triples as n goes from 2 to 3, and approaches a factor of two as n goes to infinity. (continued...)

P Daddy 2008-12-07 06:36:11

Crap! I found a bug... Looks like my numerator in Combination is overflowing. I'll update in a while.

P Daddy 2008-12-07 07:04:57

Okay, fixed the bug. I now get about an 80x speed difference between 16 and 22 (24 on my machine is too high), which is about the same as the difference in data size. Good catch, by the way.

P Daddy 2008-12-07 10:06:39

Uh, P Daddy, your code continues to be wrong. You have not tested your Factorial() function since modifying it. Here are the results from 10! to 13!:10: 362880011: 3991680012: 47900160013: 1932053504Notice that 13! should have at least as many 0's as 12!.

hughdbrown 2008-12-07 20:29:59

Try your current code on set of size 26. It blows up because 26C13 is incorrect.

hughdbrown 2008-12-07 20:32:53

Use this, eliminate Factorial(): static long Combination(long n, long r) { r = (r > n - r) ? (n - r) : r; if (r == 0) return 1; long result = 1; long k = 1; while (r-- > 0) { result *= n--; result /= k++; } return result; }

hughdbrown 2008-12-07 20:48:42

??? Factorial(13) returns 6227020800. Combination(26, 13) returns 10400600. Both are correct values. What code are you testing?

P Daddy 2008-12-07 21:01:17

I retried your float-based Factorial code. It generates the results you said. I have no idea how I was getting the results I cited earlier. Still, the Combination function I offered is faster.Please look at:http://www.iwebthereforeiam.com/files/TestYieldReturn.zipYours is Powerset3.cs.

hughdbrown 2008-12-08 02:59:22

Your latest code is fast and elegant. You should post it as a new answer. I want to point out a few things, though. (1) You changed my implementation in your tests. Your PowerSet3 is slower than my code above. My original is still slower than your PowerSet4, but only just. (continued...)

P Daddy 2008-12-08 06:05:30

... (2) Your implementation seems to benefit most from not allocating all the memory for the entire power set. I changed yours to return IList<IList<T>> and it ran significantly slower. I haven't figured out exactly why, yet. (Note that it's still much faster than the OP's code.) (continued...)

P Daddy 2008-12-08 06:10:46

... (3) In your output, not only are the sets not in "natural" order, but the order of the elements within each set is unpredictable (i.e., the set elements are not sorted). I'm not sure if this is important to the OP or not. (A true set has no order, so maybe not.) (continued...)

P Daddy 2008-12-08 06:15:22

I must reiterate, though, your algorithm is really neat, and you should post it. Your idea of using yield return is really good and will save a lot of time and memory if the OP needs to visit each subset only once. My implementation requires 2**n * (8 + n * b / 2) storage (continued...)

P Daddy 2008-12-08 06:20:40

...where n is the number of elements in the original set and b is the size in bytes of each element in the set (the 8 is an approximation of the overhead required for each array: 4 bytes for the count value and 4 bytes for the pointer to the array itself) (continued...)

P Daddy 2008-12-08 06:23:35

...whereas yours peaks out at only about n * (20 + b * 2) (the 20 + b is approximate storage for your dictionary), so while mine would require about 68 GB to store an entire power set of 30 items 4 bytes large, yours would use less than 1 KB at any given time.

P Daddy 2008-12-08 06:47:08

I changed your implementation to (1) use yield return for memory and (2) cached/reduced calculation of Combination. When I ran it, both my implementations were faster than yours when n>12 or so. By 26, it is about 40% faster than yours.

hughdbrown 2008-12-08 15:37:42

Answer 8

A:

I concur with the "optimize .NET first" opinion. It's the most painless. I imagine that if you wrote some unmanaged .NET code using C# pointers, it'd be identical to C execution, except for the VM overhead.

Paul Nathan 2008-12-07 02:03:47

Answer 9

A:

P Daddy:

You could change your Combination() code to this:

 static long Combination(long n, long r)
 {
     r = (r > n - r) ? (n - r) : r;
     if (r == 0)
         return 1;
     long result = 1;
     long k = 1;
     while (r-- > 0)
     {
         result *= n--;
         result /= k++;
     }

     return result;
 }

This will reduce the multiplications and the chance of overflow to a minimum.

hughdbrown 2008-12-07 16:27:30

That's an interesting approach. It kind of misses the point, though. Performance is the main goal, not generality, and the performance of this is rather abysmal, I'm sorry to say. Also, it suffers from the same overflow problems with large inputs (It appears at n = 31 and r = 13). (... continued)

P Daddy 2008-12-07 18:27:15

In testing this, though, I discovered that a much simpler implementation that uses doubles to store intermediate results, which I knew would be more general, is also faster! I'm going to change mine again.

P Daddy 2008-12-07 18:30:08

Your latest revision is *much* better, but a still about 3-4 times slower than floating-point version in my post. You've traded multiplications for divisions, but divisions (even integer) are significantly slower than multiplications.

P Daddy 2008-12-08 05:52:07

Okay, generate a table of combinations and put it in your code for lookup at runtime.See python code http://www.iwebthereforeiam.com/files/combination_table.py

hughdbrown 2008-12-08 22:20:20

Answer 10

+3 A:

This returns one set of a powerset at a time. It is based on python code here. It works for powersets of over 32 elements. If you need fewer than 32, you can change long to int. It is pretty fast -- faster than my previous algorithm and faster than (my modified to use yield return version of) P Daddy's code.

static class PowerSet4<T>
{
 static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
 {
  int count = currentGroupList.Length;
  Dictionary<long, T> powerToIndex = new Dictionary<long, T>();
  long mask = 1L;
  for (int i = 0; i < count; i++)
  {
   powerToIndex[mask] = currentGroupList[i];
   mask <<= 1;
  }

  Dictionary<long, T> result = new Dictionary<long, T>();
  yield return result.Values.ToArray();

  long max = 1L << count;
  for (long i = 1L; i < max; i++)
  {
   long key = i & -i;
   if (result.ContainsKey(key))
    result.Remove(key);
   else
    result[key] = powerToIndex[key];
   yield return result.Values.ToArray();
  }
 }
}

You can download all the fastest versions I have tested here.

I really think that using yield return is the change that makes calculating large powersets possible. Allocating large amounts of memory upfront increases runtime dramatically and causes algorithms to fail for lack of memory very early on. Original Poster should figure out how many sets of a powerset he needs at once. Holding all of them is not really an option with >24 elements.

hughdbrown 2008-12-08 15:49:04

I think you are correct that it is the yeild that makes the difference. I am very impressed by your tenacity to provide the fastest possible implementation, as well as several very valid points made. Honourable mention to P Daddy as well, and to all who have provided input.

jheppinstall 2008-12-08 21:01:34

P Daddy had a cool idea that I have not called out yet -- the allocation of fixed-size arrays that are the right length. In contrast, resizing Lists is expensive. My other code makes Lists of maximum length so as not to reallocate.

hughdbrown 2008-12-08 21:13:55

If I were doing this I'd write a custom IList<T> class that wraps around a array whose size is fixed at currentGroupList.Length. You could easily get better performance than the dictionary-based implementation here.

Qwertie 2008-12-24 16:33:51

ansaurus

tags:

views:

answers:

C data structure to mimic C#'s List<List<int>>?

related questions