views:

813

answers:

5

I just finished a homework problem for Computer Science 1 (yes, it's homework, but hear me out!). Now, the assignment is 100% complete and working, so I don't need help on it. My question involves the efficiency of an algorithm I'm using (we aren't graded on algorithmic efficiency yet, I'm just really curious).

The function I'm about to present currently uses a modified version of the linear search algorithm (that I came up with, all by myself!) in order to check how many numbers on a given lottery ticket match the winning numbers, assuming that both the numbers on the ticket and the numbers drawn are in ascending order. I was wondering, is there any way to make this algorithm more efficient?

/*
 * Function: ticketCheck
 *
 * @param struct ticket
 * @param array winningNums[6]
 *
 * Takes in a ticket, counts how many numbers
 * in the ticket match, and returns the number
 * of matches.
 *
 * Uses a modified linear search algorithm,
 * in which the index of the successor to the
 * last matched number is used as the index of
 * the first number tested for the next ticket value.
 *
 * @return int numMatches
 */
int ticketCheck( struct ticket ticket, int winningNums[6] )
{
    int numMatches = 0;
    int offset = 0;
    int i;
    int j;

    for( i = 0; i < 6; i++ )
    {
        for( j = 0 + offset; j < 6; j++ )
        {
            if( ticket.ticketNum[i] == winningNums[j] )
            {
                numMatches++;
                offset = j + 1;
                break;
            }
            if( ticket.ticketNum[i] < winningNums[j] )
            {
                i++;
                j--;
                continue;
            }
        }
    }

    return numMatches;
}
+16  A: 

It's more or less there, but not quite. In most situations, it's O(n), but it's O(n^2) if every ticketNum is greater than every winningNum. (This is because the inner j loop doesn't break when j==6 like it should, but runs the next i iteration instead.)

You want your algorithm to increment either i or j at each step, and to terminate when i==6 or j==6. [Your algorithm almost satisfies this, as stated above.] As a result, you only need one loop:

for (i=0,j=0; i<6 && j<6; /* no increment step here */) {
    if (ticketNum[i] == winningNum[j]) {
        numMatches++;
        i++;
        j++;
    }
    else if (ticketNum[i] < winningNum[j]) {
        /* ticketNum[i] won't match any winningNum, discard it */
        i++;
    }
    else { /* ticketNum[i] > winningNum[j] */
        /* discard winningNum[j] similarly */
        j++;
    }
}

Clearly this is O(n); at each stage, it either increments i or j, so the most steps it can do is 2*n-1. This has almost the same behaviour as your algorithm, but is easier to follow and easier to see that it's correct.

Philip Potter
Your answer had better comments, so I removed mine.
Pascal Cuoq
There is an anecdote about two programmers — I don't remember which but it's probably two among Pike, Ritchie, Thomson and Kernighan — independently writing the same function to the comma. I always thought it was not so extraordinary.
Pascal Cuoq
@Pascal: I quite agree. If you are thinking the same algorithm, you'll write similar code. If you are on the same team, you'll use the same style. Same algorithm + same style = same code.
Philip Potter
+1 That's shows a clear understanding of the problem instead of concentrating on the given solution.
Praveen S
You could remove the reptition of the `i++;` and `j++;` by removing them from the first `if()`, then removing the `else`s and changing the other `if`s to `<=` and `>=`.
caf
+7  A: 

You're basically looking for the size of the intersection of two sets. Given that most lottos use around 50 balls (or so), you could store the numbers as bits that are set in an unsigned long long. Finding the common numbers is then a simple matter of ANDing the two together: commonNums = TicketNums & winningNums;.

Finding the size of the intersection is a matter of counting the one bits in the resulting number, a subject that's been covered previously (though in this case, you'd use 64-bit numbers, or a pair of 32-bit numbers, instead of a single 32-bit number).

Jerry Coffin
Jeffry, this is smart but i personally do not like this style of programming. in my opinion code must be readable and operate on arrays if arrays are needed. using integers as arrays is not a good style i think. but it is just me. your solution would be great in an embedded system with limited memory and a slow CPU though. :)edit: but your response actually gives the general approach. iterate over the both sequences in parallel and emit 1 for a match and a zero for a mismatch. this i think is how the AND is implemented in the hardware anyway.
akonsu
@akonsu it's basically the same algo I proposed but using a bitfield instead of an array. If the sets are big it is a proven and widely used technique (`select` syscall on Unix for example).
tristopia
@Jerry, sorry i called you Jeffry. lack of sleep today...
akonsu
@Akonsu: no problem about the name (as the old line goes, just don't call me late for dinner). Actually, in most hardware the AND is going to be executed bit-parallel (i.e., the CPU will have 32 or 64 (or whatever the word size size is) separate 2-input AND gates, all producing outputs at the same time). As for "arrays if arrays are needed", I tend to agree -- but this is a case where (IMO) arrays are *not* needed (nor probably even preferred).
Jerry Coffin
@Jerry: Of course *counting* the set bits is not going to be parallel...
caf
@caf: not entirely parallel anyway, that's entirely true. OTOH, I'm pretty sure you can execute parts of it in parallel, so the ten (or so) total operations can happen in something like four clock cycles, with ~3 operations in parallel for each of the first three clocks, then a single operation to combine them all at the end.
Jerry Coffin
A: 

This is really only a minor change on a scale like this, but if the second loop reaches a number bigger than the current ticket number, it is already allowed to brake. Furthermore, if your seconds traverses numbers lower than your ticket number, it may update the offset even if no match is found within that iteration.

PS: Not to forget, general results on efficiency make more sense, if we take the number of balls or the size of the ticket to be variable. Otherwise it is too much dependent of the machine.

+2  A: 

Yes, there is something faster, but probably using more memory. Make an array full of 0 in the size of the possible numbers, put a 1 on every drawn number. For every ticket number add the value at the index of that number.

 int NumsArray[MAX_NUMBER+1];
 memset(NumsArray, 0, sizeof NumsArray);

 for( i = 0; i < 6; i++ )
   NumsArray[winningNums[i]] = 1;

 for( i = 0; i < 6; i++ )
   numMatches += NumsArray[ticket.ticketNum[i]];

12 loop rounds instead of up to 36 The surrounding code left as an exercise.

EDIT: It also has the advantage of not needing to sort both set of values.

tristopia
This is the same speed as my solution (~2n loop iterations) but uses more memory. It's faster in the unsorted case, however, because my answer depends on sorted data while yours doesn't.
Philip Potter
This method is also easy to extend to correctly handle duplicate numbers in input by using NumsArray[winningNums[i]] +=1 instead.
Christoffer
Yes, mine has also the advantage if you want to check more than one loterie ticket against the winning numbers, to need only the first loop once.
tristopia
A: 

If instead of comparing the arrays of lottery numbers you were to create two bit arrays of flags -- each flag being set if it's index is in that array -- then you could perform a bitwise and on the two bit arrays (the lottery ticket and the winning number sets) and produce another bit array whose bits were flags for matching numbers only. Then count the bits set.

For many lotteries 64 bits would be enough, so a uint64_t should be big enough to cover this. Also, some architectures have instructions to count the bits set in a register, which some compilers might be able to recognize and optimize for.

The efficiency of this algorithm is based both on the range of lottery numbers (M) and the number of lottery numbers per ticket (N). The setting if the flags is O(N), while the and-ing of the two bit arrays and counting of the bits could be O(M), depending on if your M (lotto number range) is larger than the size that the target cpu can preform these operations on directly. Most likely, though, M will be small and its impact will likely be less than that of N on the performance.

nategoose