views:

1060

answers:

19

This is a part of a program that analyzes the odds of poker, specifically Texas Hold'em. I have a program I'm happy with, but it needs some small optimizations to be perfect.

I use this type (among others, of course):

  type
    T7Cards = array[0..6] of integer;

There are two things about this array that may be important when deciding how to sort it:

  1. Every item is a value from 0 to 51. No other values are possible.
  2. There are no duplicates. Never.

With this information, what is the absolutely fastest way to sort this array? I use Delphi, so pascal code would be the best, but I can read C and pseudo, albeit a bit more slowly :-)

At the moment I use quicksort, but the funny thing is that this is almost no faster than bubblesort! Possible because of the small number of items. The sorting counts for almost 50% of the total running time of the method.

EDIT:

Mason Wheeler asked why it's necessary to optimize. One reason is that the method will be called 2118760 times.

Basic poker information: All players are dealt two cards (the pocket) and then five cards are dealt to the table (the 3 first are called the flop, the next is the turn and the last is the river. Each player picks the five best cards to make up their hand)

If I have two cards in the pocket, P1 and P2, I will use the following loops to generate all possible combinations:

for C1 := 0 to 51-4 do
  if (C1<>P1) and (C1<>P2) then
     for C2 := C1+1 to 51-3 do
       if (C2<>P1) and (C2<>P2) then
         for C3 := C2+1 to 51-2 do
           if (C3<>P1) and (C3<>P2) then
             for C4 := C3+1 to 51-1 do
               if (C4<>P1) and (C4<>P2) then
                 for C5 := C4+1 to 51 do
                   if (C5<>P1) and (C5<>P2) then
                   begin
                     //This code will be executed 2 118 760 times
                     inc(ComboCounter[GetComboFromCards([P1,P2,C1,C2,C3,C4,C5])]);
                   end;

As I write this I notice one thing more: The last five elements of the array will always be sorted, so it's just a question of putting the first two elements in the right position in the array. That should simplify matters a bit.

So, the new question is: What is the fastest possible way to sort an array of 7 integers when the last 5 elements are already sorted. I believe this could be solved with a couple (?) of if's and swaps :-)

+13  A: 

For a very small set, insertion sort can usually beat quicksort because it has very low overhead.

WRT your edit, if you're already mostly in sort order (last 5 elements are already sorted), insertion sort is definitely the way to go. In an almost-sorted set of data, it'll beat quicksort every time, even for large sets. (Especially for large sets! This is insertion sort's best-case scenario and quicksort's worst case.)

Mason Wheeler
+7  A: 

Don't know how you are implementing this, but what you could do is have an array of 52 instead of 7, and just insert the card in its slot directly when you get it since there can never be duplicates, that way you never have to sort the array. This might be faster depending on how its used.

JRL
In fact it could even be an array of bits, which would fit in single 64-bit integer and consume less memory.
Filip Navara
Not really. Instead of trying to look through a 7-element array, he'll instead look through 52-element array. Same for bits within a single integer.
Pavel Shved
@Pavel Shved: yes it would mean looking at a 52 element array - but this array would never have to be sorted, you would get that for free. That's why I said it *might* be faster depending on how the code is.
JRL
Avoiding the sort may be the best way to speed up the program.
Nosredna
(+1) Just to be fancy. Array implementation of binary tree that can be traversed in in-order can be used. http://en.wikipedia.org/wiki/Binary_tree
TheMachineCharmer
@Pavel Shved, storing it as bits will not make the sort faster, but it may be better representation overall for the program. Bit operations are generally much faster than accessing the 7 element array and it can easily fit into processor registers. -- Actually on second thought, it may make the sort faster since "find first set" is natively implemented on almost any processor, so even creating the bit array and reconstructing the normal array could be VERY FAST.
Filip Navara
Actually, if you do have to sort all 7 then this is the fastest way, particulary if you know how to use scatter-gather bit-masks to do the re-collection (reduces the scan order from 52 to 7+7). However, if 5 are already sorted then the fastest is to sort the other two (O(2.5)) and do a merge on the two lists (O(k*(5+2)) where k is usually around 4 or 5.
RBarryYoung
A: 

Take a look at this:

http://en.wikipedia.org/wiki/Sorting%5Falgorithm

You would need to pick one that will have a stable worst case cost...

Another option could be to keep the array sorted the whole time, so an addition of a card would keep the array sorted automatically, that way you could skip to sorting...

Heiko Hatzfeld
+1  A: 

For 7 elements, there are only few options. You can easily write a generator that produces method to sort all possible combinations of 7 elements. Something like this method for 3 elements:

if a[1] < a[2] {
    if a[2] < a[3] {
        // nothing to do, a[1] < a[2] < a[3]
    } else {
         if a[1] < a[3] {
             // correct order should be a[1], a[3], a[2]
             swap a[2], a[3]
         } else {
             // correct order should be a[3], a[1], a[2]
             swap a[2], a[3]
             swap a[1], a[3]
         }
    }
} else {
    // here we know that a[1] >= a[2]
    ...
}

Of course method for 7 elements will be bigger, but it's not that hard to generate.

Peter Štibraný
"will be bigger". Hmm. 3! is 6, 7! is 5040.
Steve Jessop
That's why I suggested generating it, and not writing manually :-) And it will always use minimum number of comparisons and swap operations.
Peter Štibraný
...it will be very bad on processor cache.
Filip Navara
I don't know much about processor instruction cache to argue :-), but as you can see, deeper you go, more local the jumps are (at least in pseudocode ;-)), so after few cache misses (few 'else' branches), rest should be cached already. But it depends on how code is compiled down to instructions.
Peter Štibraný
5040 is smaller than 2^13, so you should ideally be able to find the right permutation with 13 binary branches. And 5040 elements should easily fit into a modern CPU's cache.
nikie
A: 

What JRL is referring to is a bucket sort. Since you have a finite discrete set of possible values, you can declare 52 buckets and just drop each element in a bucket in O(1) time. Hence bucket sort is O(n). Without the guarantee of a finite number of different elements, the fastest theoretical sort is O(n log n) which things like merge sort an quick sort are. It's just a balance of best and worst case scenarios then.

But long answer short, use bucket sort.

Tesserex
As I already noted, bucket sort is not O(n), but O(n+m), where m is the number of values the elements in the array given can have. That's not applicable in this case.
Pavel Shved
Asymptotic performance is utterly irrelevant to the question, which fixes n at 7.
Steve Jessop
And if you represent your buckets using bits, you've the same sort as some of the other answers here.
Pete Kirkham
+2  A: 

Use min-sort. Search for minimal and maximal element at once and place them into resultant array. Repeat three times. (EDIT: No, I won't try to measure the speed theoretically :_))

var
cards,result: array[0..6] of integer;
i,min,max: integer;

begin
   n=0;
   while (n<3) do begin
      min:=-1;
      max:=52;
      for i from 0 to 6 do begin
          if cards[i]<min then min:=cards[i]
          else if cards[i]>max then max:=cards[i]
      end
      result[n]:=min;
      result[6-n]:=max;
      inc(n);
   end
   for i from 0 to 6 do 
       if (cards[i]<52) and (cards[i]>=0) then begin
           result[3] := cards[i];
           break;
       end
    { Result is sorted here! }
end
Pavel Shved
A: 

If you like the above mentioned suggestion to keep a 52 element array which always keeps your array sorted, then may be you could keep another list of 7 elements which would reference the 7 valid elements in the 52 element array. This way we can even avoid parsing the 52 element array.

I guess for this to be really efficient, we would need to have a linked list type of structure which be supports operations: InsertAtPosition() and DeleteAtPosition() and be efficient at that.

Trainee4Life
+4  A: 

There are only 5040 permutations of 7 elements. You can programmaticaly generate a program that finds the one represented by your input in a minimal number of comparisons. It will be a big tree of if-then-else instructions, each comparing a fixed pair of nodes, for example if (a[3]<=a[6]).

The tricky part is deciding which 2 elements to compare in a particular internal node. For this, you have to take into account the results of comparisons in the ancestor nodes from root to the particular node (for example a[0]<=a[1], not a[2]<=a[7], a[2]<=a[5]) and the set of possible permutations that satisfy the comparisons. Compare the pair of elements that splits the set into as equal parts as possible (minimize the size of the larger part).

Once you have the permutation, it is trivial to sort it in a minimal set of swaps.

Rafał Dowgird
+1  A: 

The code below is close to optimal. It could be made better by composing a list to be traversed while making the tree, but I'm out of time right now. Cheers!

object Sort7 {
  def left(i: Int) = i * 4
  def right(i: Int) = i * 4 + 1
  def up(i: Int) = i * 4 + 2
  def value(i: Int) = i * 4 + 3

  val a = new Array[Int](7 * 4)
  def reset = {
    0 until 7 foreach { 
      i => {
        a(left(i)) = -1
        a(right(i)) = -1
        a(up(i)) = -1
        a(value(i)) = scala.util.Random.nextInt(52)
      }
    }
  }

  def sortN(i : Int) {
    var index = 0
    def getNext = if (a(value(i)) < a(value(index))) left(index) else right(index)
    var next = getNext
    while(a(next) != -1) {
      index = a(next)
      next = getNext
    }
    a(next) = i
    a(up(i)) = index
  }

  def sort = 1 until 7 foreach (sortN(_))

  def print {
    traverse(0)
    def traverse(i: Int): Unit = {
      if (i != -1) {
        traverse(a(left(i)))
        println(a(value(i)))
        traverse(a(right(i)))
      }
    }
  }
}
Daniel
+1  A: 

In pseudo code:

int64 temp = 0;
int index, bit_position;

for index := 0 to 6 do
   temp |= 1 << cards[index];

for index := 0 to 6 do
begin
   bit_position = find_first_set(temp);
   temp &= ~(1 << bit_position);
   cards[index] = bit_position;
end;

It's an application of bucket sort, which should generally be faster than any of the comparison sorts that were suggested.

Note: The second part could also be implemented by iterating over bits in linear time, but in practice it may not be faster:

index = 0;
for bit_position := 0 to 51 do
begin
   if (temp & (1 << bit_position)) > 0 then
   begin
      cards[index] = bit_position;
      index++;
   end;
end;
Filip Navara
A: 

There are a lot of loops in the answers. Given his speed requirement and the tiny size of the data set I would not do ANY loops.

I have not tried it but I suspect the best answer is a fully unrolled bubble sort. It would also probably gain a fair amount of advantage from being done in assembly.

I wonder if this is the right approach, though. How are you going to analyze a 7 card hand?? I think you're going to end up converting it to some other representation for analysis anyway. Would not a 4x13 array be a more useful representation? (And it would render the sorting issue moot, anyway.)

Loren Pechtel
A: 

Considering that last 5 elements are always sorted:


for i := 0 to 1 do begin
  j := i;
  x := array[j];
  while (j+1 <= 6) and (array[j+1] < x) do begin
    array[j] := array[j+1];
    inc(j);
  end;
  array[j] := X;
end;
A: 

bubble sort is your friend. Other sorts have too many overhead codes and not suitable for small number of elements

Cheers

+4  A: 

Since the last 5 items are already sorted, the code can be written just to reposition the first 2 items. Since you're using Pascal, I've written and tested a sorting algorithm that can execute 2,118,760 times in about 62 milliseconds.

procedure SortT7Cards(var Cards: T7Cards);
const
  CardsLength = Length(Cards);
var
  I, J, V: Integer;
  V1, V2: Integer;
begin
  // Last 5 items will always be sorted, so we want to place the first two into
  // the right location.
  V1 := Cards[0];
  V2 := Cards[1];
  if V2 < V1 then
  begin
    I := V1;
    V1 := V2;
    V2 := I;
  end;

  J := 0;
  I := 2;
  while I < CardsLength do
  begin
    V := Cards[I];
    if V1 < V then
    begin
      Cards[J] := V1;
      Inc(J);
      Break;
    end;
    Cards[J] := V;
    Inc(J);
    Inc(I);
  end;
  while I < CardsLength do
  begin
    V := Cards[I];
    if V2 < V then
    begin
      Cards[J] := V2;
      Break;
    end;
    Cards[J] := V;
    Inc(J);
    Inc(I);
  end;
  if J = (CardsLength - 2) then
  begin
    Cards[J] := V1;
    Cards[J + 1] := V2;
  end
  else if J = (CardsLength - 1) then
  begin
    Cards[J] := V2;
  end;
end;
Jon Benedicto
+2  A: 

This is the fastest method: since the 5-card list is already sorted, sort the two-card list (a compare & swap), and then merge the two lists, which is O(k * (5+2). In this case (k) will normally be 5: the loop test(1), the compare(2), the copy(3), the input-list increment(4) and the output list increment(5). That's 35 + 2.5. Throw in loop initialization and you get 41.5 statements, total.

You could also unroll the loops which would save you maybe 8 statements or execution, but make the whole routine about 4-5 times longer which may mess with your instruction cache hit ratio.

Given P(0 to 2), C(0 to 5) and copying to H(0 to 6) with C() already sorted (ascending):

If P(0) > P(1) Then
    // Swap:
    T = P(0)
    P(0) = P(1)
    P(1) = T
    // 1stmt + (3stmt * 50%) = 2.5stmt
End

P(2), C(5) = 53    \\ Note these are end-of-list flags
k = 0     \\ P() index
J = 0     \\ H() index
i = 0     \\ C() index
// 4 stmt

Do While (j) < 7 
    If P(k) < C(I) then
        H(j) = P(k)
        k = k+1
    Else
        H(j) = C(i)
        j = j+1
    End if
    j = j+1
    // 5stmt * 7loops = 35stmt
Loop

And note that this is faster than the other algorithm that would be "fastest" if you had to truly sort all 7 cards: use a bit-mask(52) to map & bit-set all 7 cards into that range of all possible 52 cards (the bit-mask), and then scan the bit-mask in order looking for the 7 bits that are set. That takes 60-120 statements at best (but is still faster than any other sorting approach).

RBarryYoung
+1  A: 

Assuming that you need an array of cards at the end of it.

Map the original cards to bits in a 64 bit integer ( or any integer with >= 52 bits ).

If during the initial mapping the array is sorted, don't change it.

Partition the integer into nibbles - each will correspond to values 0x0 to 0xf.

Use the nibbles as indices to corresponding sorted sub-arrays. You'll need 13 sets of 16 sub-arrays ( or just 16 sub-arrays and use a second indirection, or do the bit ops rather than looking the answer up; what is faster will vary by platform ).

Concatenate the non-empty sub-arrays into the final array.

You could use larger than nibbles if you want; bytes would give 7 sets of 256 arrays and make it more likely that the non-empty arrays require concatenating.

This assumes that branches are expensive and cached array accesses cheap.

#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>

// for general case of 7 from 52, rather than assuming last 5 sorted
uint32_t card_masks[16][5] = {
    { 0, 0, 0, 0, 0 },
    { 1, 0, 0, 0, 0 },
    { 2, 0, 0, 0, 0 },
    { 1, 2, 0, 0, 0 },
    { 3, 0, 0, 0, 0 },
    { 1, 3, 0, 0, 0 },
    { 2, 3, 0, 0, 0 },
    { 1, 2, 3, 0, 0 },
    { 4, 0, 0, 0, 0 },
    { 1, 4, 0, 0, 0 },
    { 2, 4, 0, 0, 0 },
    { 1, 2, 4, 0, 0 },
    { 3, 4, 0, 0, 0 },
    { 1, 3, 4, 0, 0 },
    { 2, 3, 4, 0, 0 },
    { 1, 2, 3, 4, 0 },
};

void sort7 ( uint32_t* cards) {
    uint64_t bitset = ( ( 1LL << cards[ 0 ] ) | ( 1LL << cards[ 1LL ] ) | ( 1LL << cards[ 2 ] ) | ( 1LL << cards[ 3 ] ) | ( 1LL << cards[ 4 ] ) | ( 1LL << cards[ 5 ] ) | ( 1LL << cards[ 6 ] ) ) >> 1;

    uint32_t*   p    = cards;
    uint32_t    base = 0;

    do {
        uint32_t* card_mask = card_masks[ bitset & 0xf ];

        // you might remove this test somehow, as well as unrolling the outer loop
        // having separate arrays for each nibble would save 7 additions and the increment of base
        while ( *card_mask )
            *(p++) = base + *(card_mask++);

        bitset >>= 4;
        base += 4;
    } while ( bitset );
}

void print_cards ( uint32_t* cards ) {
    printf ( "[ %d %d %d %d %d %d %d ]\n", cards[0], cards[1], cards[2], cards[3], cards[4], cards[5], cards[6] );
}

int main ( void ) {
    uint32_t cards[7] = { 3, 9, 23, 17, 2, 42, 52 };

    print_cards ( cards );
    sort7 ( cards );
    print_cards ( cards );

    return 0;
}
Pete Kirkham
+4  A: 

I don't know that much about Texas Hold'em: Does it matter what suit P1 and P2 are, or does it only matter if they are of the same suit or not? If only suit(P1)==suit(P2) matters, then you could separate the two cases, you have only 13x12/2 different possibilities for P1/P2, and you can easily precalculate a table for the two cases.

Otherwise, I would suggest something like this:

(* C1 < C2 < P1 *)
for C1:=0 to P1-2 do 
   for C2:=C1+1 to P1-1 do 
      Cards[0] = C1;
      Cards[1] = C2;
      Cards[2] = P1;
      (* generate C3...C7 *)

(* C1 < P1 < C2 *)
for C1:=0 to P1-1 do 
   for C2:=P1+1 to 51 do 
      Cards[0] = C1;
      Cards[1] = P1;
      Cards[2] = C2;
      (* generate C3...C7 *)

(* P1 < C1 < C2 *)
for C1:=P1+1 to 51 do 
   for C2:=C1+1 to 51 do 
      Cards[0] = P1;
      Cards[1] = C1;
      Cards[2] = C2;
      (* generate C3...C7 *)

(this is just a demonstration for one card P1, you would have to expand that for P2, but I think that's straightforward. Although it'll be a lot of typing...) That way, the sorting doesn't take any time at all. The generated permutations are already ordered.

nikie
By far the most useful answer, even though it doesn't address the actual question.By implementing two tables (one for flush combos, on for other combos), each taking about 8Mb of memory, the procedure went fram about 16 seconds to 3,5. Way to go! And the best part is that the initialization of the tables was also very fast, since I could discard information about suit.(for the curious, the tables are gPokerCombosWithFlush : array[0..12,0..12,0..12,0..12,0..12] of TPokerCombo; gPokerCombosWithoutFlush : array[0..12,0..12,0..12,0..12,0..12] of TPokerCombo;)
Svein Bringsli
Did you even try insertion sort?
FogleBird
@FogleBird: If you create the combinations in ordered in the first place, you don't have to sort at all. Even insertion sort can't beat O(0) ;-)
nikie
A: 

Here is your basic O(n) sort. I'm not sure how it compares to the others. It uses unrolled loops.

char card[7]; // the original table of 7 numbers in range 0..51
char table[52]; // workspace

// clear the workspace
memset(table, 0, sizeof(table));

// set the 7 bits corresponding to the 7 cards
table[card[0]] = 1;
table[card[1]] = 1;
...
table[card[6]] = 1;

// read the cards back out
int j = 0;
if (table[0]) card[j++] = 0;
if (table[1]) card[j++] = 1;
...
if (table[51]) card[j++] = 51;
Mike Dunlavey
A: 

For seven numbers, the most efficient algorithm that exists with regards to the number of comparisons is Ford-Johnson's. In fact, wikipedia references a paper, easily found on google, that claims Ford-Johnson's the best for up to 47 numbers. Unfortunately, references to Ford-Johnson's aren't all that easy to found, and the algorithm uses some complex data structures.

It appears on The Art Of Computer Programming, Volume 3, by Donald Knuth, if you have access to that book.

There's a paper which describes FJ and a more memory efficient version here.

At any rate, because of the memory overhead of that algorithm, I doubt it would be worth your while for integers, as the cost of comparing two integers is rather cheap compared to the cost of allocating memory and manipulating pointers.

Now, you mentioned that 5 cards are already sorted, and you just need to insert two. You can do this with insertion sort most efficiently like this:

Order the two cards so that P1 > P2
Insert P1 going from the high end to the low end
(list) Insert P2 going from after P1 to the low end
(array) Insert P2 going from the low end to the high end

How you do that will depend on the data structure. With an array you'll be swapping each element, so place P1 at 1st, P2 and 7th (ordered high to low), and then swap P1 up, and then P2 down. With a list, you just need to fix the pointers as appropriate.

However once more, because of the particularity of your code, it really is best if you follow nikie suggestion and just generate the for loops apropriately for every variation in which P1 and P2 can appear in the list.

For example, sort P1 and P2 so that P1 < P2. Let's make Po1 and Po2 the position from 0 to 6, of P1 and P2 on the list. Then do this:

Loop Po1 from 0 to 5
Loop Po2 from Po1 + 1 to 6
If (Po2 == 1) C1start := P2 + 1; C1end := 51 - 4
If (Po1 == 0 && Po2 == 2) C1start := P1+1; C1end := P2 - 1
If (Po1 == 0 && Po2 > 2) C1start := P1+1; C1end := 51 - 5
If (Po1 > 0) C1start := 0; C1end := 51 - 6
for C1 := C1start to C1end
  // Repeat logic to compute C2start and C2end
  // C2 can begin at C1+1, P1+1 or P2+1
  // C2 can finish at P1-1, P2-1, 51 - 3, 51 - 4 or 51 -5
  etc

You then call a function passing Po1, Po2, P1, P2, C1, C2, C3, C4, C5, and have this function return all possible permutations based on Po1 and Po2 (that's 36 combinations).

Personally, I think that's the fastest you can get. You completely avoid having to order anything, because the data will be pre-ordered. You incur in some comparisons anyway to compute the starts and ends, but their cost is minimized as most of them will be on the outermost loops, so they won't be repeated much. And they can even be more optimized at the cost of more code duplication.

Daniel