tags:

views:

169

answers:

6

I need to write a program to find the mode. Or the most occurrence of an integer or integers. So,

1,2,3,4,1,10,4,23,12,4,1 would have mode of 1 and 4.

I'm not really sure what kind of algorithm i should use. I'm having a hard time trying to think of something that would work.

I was thinking of a frequency table of some sort maybe where i could go through array and then go through and create a linked list maybe. If the linked doesn't contain that value add it to the linked, if it does then add 1 to the value.

So if i had the same thing from above. loop through 1,2,3,4,1,10,4,23,12,4,1

Then list is empty so add node with number = 1 and value = 1. 2 doesnt exist so add node with number = 2 and value = 1 and so on. Get to the 1 and 1 already exists so value = 2 now.

I would have to loop through the array and then loop through linked list everytime to find that value.

Once i am done then go through the linked list and create a new linked list that will hold the modes. So i set the head to the first element which is 1. Then i go through the linked list that contains the occurences and compare the values. If the occurences of the current node is > the current highest then i set the head to this node. If its = to the highest then i add the node to the mode linked list.

Once i am done i loop through the mode list and print the values.

Not sure if this would work. Does anyone see anything wrong with this? Is there an easier way to do this? I was thinking a hash table too, but not really sure how to do that in C.

Thanks.

+2  A: 

The algorithm you have is fine for a homework assignment. There are all sorts of things you could do to optimise the code, such as:

  • use a binary tree for efficiency,
  • use an array of counts where the index is the number (assuming the number range is limited).

But I think you'll find they're not necessary in this case. For homework, the intent is just to show that you understand how to program, not that you know all sorts of tricks for wringing out the last ounce of performance. Your educator will be looking far more for readable, structured, code than tricky optimisations.

I'll describe below what I'd do. You're obviously free to use my advice as much or as little as you wish, depending on how much satisfaction you want to gain at doing it yourself. I'll provide pseudo-code only, which is my standard practice for homework questions.

I would start with a structure holding a number, a count and next pointer (for your linked list) and the global pointer to the first one:

typedef struct sElement {
    int number;
    int count;
    struct sElement *next;
} tElement;
tElement first = NULL;

Then create some functions for creating and using the list:

tElement *incrementElement (int number);
tElement *getMaxCountElement (void);
tElement *getNextMatching (tElement *ptr, int count);

Those functions will, respectively:

  • Increment the count for an element (or create it and set count to 1).
  • Scan all the elements returning the maximum count.
  • Get the next element pointer matching the count, starting at a given point, or NULL if no more.

The pseudo-code for each:

def incrementElement (number):
    # Find matching number in list or NULL.

    set ptr to first
    while ptr is not NULL:
        if ptr->number is equal to number:
            return ptr
        set ptr to ptr->next

    # If not found, add one at start with zero count.

    if ptr is NULL:
        set ptr to newly allocated element
        set ptr->number to number
        set ptr->count to 0
        set ptr->next to first
        set first to ptr            

    # Increment count.

    set ptr->count to ptr->count + 1

 

def getMaxCountElement (number):
    # List empty, no mode.

    if first is NULL:
        return NULL

    # Assume first element is mode to start with.

    set retptr to first

    # Process all other elements.

    set ptr to first->next
    while ptr is not NULL:
        # Save new mode if you find one.

        if ptr->count is greater than retptr->count:
            set retptr to ptr
        set ptr to ptr->next

    # Return actual mode element pointer.

    return retptr

 

def getNextMatching (ptr, number):
    # Process all elements.

    while ptr is not NULL:
        # If match on count, return it.

        if ptr->number is equal to number:
            return ptr
        set ptr to ptr->next

    # Went through whole list with no match, return NULL.

    return NULL

Then your main program becomes:

# Process all the numbers, adding to (or incrementing in) list .

for each n in numbers to process:
    incrementElement (n)

# Get the mode quantity, only look for modes if list was non-empty.

maxElem = getMaxCountElement ()
if maxElem is not NULL:
    # Find the first one, whil exists, print and find the next one.

    ptr = getNextMatching (first, maxElem->count)
    while ptr is not NULL:
        print ptr->number
        ptr = getNextMatching (ptr->next, maxElem->count)
paxdiablo
Yea, i like to try to challenge myself. I don't like programming badly, although in some cases i just do it to get it done for homework. I like to optimize my code as much as i possibly could. Im not gonna go insane with it, but i like to do the best optimization possible with time given. Thanks though, i think i got a lot from everyone here to work with this.
Matt
It's often difficult to tell at first glance if your pseudocode is really pseudocode or if it is Python... :-D
James McNellis
That's why I love Python. It's great for educating youngsters on how to program. At least it was before they introduced all that lambda rubbish :-) Still, I can just use the easy bit for teaching. A lot of times, I'll actually code it up in Python, check it, then go back and change lines from (for example) `el = arr[7]` to `set el to the eighth element of arr` to make it more pseudo.
paxdiablo
A: 

You're definitely on the right track. However, since C doesn't have a built-in hashtable type, here's how I'd recommend proceeding. I'm going to assume you're working with unsigned ints, but it's not hard to extend the algorithm to handle negative numbers as well. Instead of implementing a linked list for your frequency table, you can get away with a(n admittedly much more boring) array.

  1. Iterate over your array (or is it a list? -whatever) of integers to find the largest value in the input, maxInt.
  2. Malloc a new array of size (sizeof int)*maxInt and initialize it to all zeros. This will be your lookup table; the index into the array is the number you're counting, and the value at that index is the count. (This is why I made the initial simplifying assumption of inputs > 0.)
  3. Now you can proceed as you described, counting up the occurrences. Instead of having to deal with a linked list, you're just using a fixed-size array.
Matt Ball
I had thought about this, but wasn't sure as it might be possible to have a huge array. So if i had an array of ints and say i had 999,999,999 in there. I would have to create an array that is that big which seems like a waste of memory, no? Unless you could do something like php where you could have your own key like arr["happy"] = "gilmore" and i could call it from arr["happy"]. Thanks.
Matt
It's a tradeoff of memory consumption for simplicity. I agree, it does suffer from the large number problem, but it is a _lot_ simpler than implementing your own linked list (yes, I also know that's a very basic task in C) let alone a hash table.
Matt Ball
+5  A: 

If you can keep the entire list of integers in memory, you could sort the list first, which will make repeated values adjacent to each other. Then you can do a single pass over the sorted list to look for the mode. That way, you only need to keep track of the best candidate(s) for the mode seen up until now, along with how many times the current value has been seen so far.

Paul Kuliniewicz
You can use the stdlib function [`qsort`](http://www.manpagez.com/man/3/qsort/) to perform the sorting.
strager
yea, that would work. Basically what i have above. Sort list. Set mode to first element. Start scanning for different int and count it. If count is higher then set mode to new int, if its equal add it to the linked list of modes. This could probably be easier since i could use qsort like strager mentioned. Thanks.
Matt
Sorting takes the whole computation to `O( N * lgN )` which is *unnecessary*; mode can be computed in `O( N )`.
ArunSaha
+1  A: 

If the range of numbers is known in advance, and is a reasonable number, you can allocate a sufficiently large array for the counters and just do count[i] += 1.

If the range of numbers is not known in advance, or is too large for the naive use of an array, you could instead maintain a binary tree of values to maintain your counters. This will give you far less searching than a linked list would. Either way you'd have to traverse the array or tree and build an ordering of highest to lowest counts. Again I'd recommend a tree for that, but your list solution could work as well.

Another interesting option could be the use of a priority queue for your extraction phase. Once you have your list of counters completed, walk your tree and insert each value at a priority equal to its count. Then you just pull values from the priority queue until the count goes down.

JUST MY correct OPINION
binary tree could work for sure. Never heard of priority queue. I honestly don't know what my teacher will do for test cases. For all i know he may randomize a list of 10,000 integers and pass them into the program.
Matt
If you don't know the potential range, then yes a binary tree would likely work. As for priority queues, follow the link and down at the bottom, if memory serves, you can find some implementations that you can hastily file the serial numbers off of for your homework.
JUST MY correct OPINION
A: 

This response is a sample for the idea of Paul Kuliniewicz:

int CompInt(const void* ptr1, const void* ptr2) {
  const int a = *(int*)ptr1;
  const int b = *(int*)ptr2;
  if (a < b) return -1;
  if (a > b) return +1;
  return 0;
}

// This function leave the modes in output and return the number
// of modes in output. The output pointer should be available to
// hold at least n integers.
int GetModes(const int* v, int n, int* output) {
  // Sort the data and initialize the best result.
  qsort(v, v + n, CompInt);
  int outputSize = 0;

  // Loop through elements while there are not exhausted.
  // (look there is no ++i after each iteration).
  for (int i = 0; i < n;) {
    // This is the begin of the new group.
    const int begin = i;

    // Move the pointer until there are no more equal elements.
    for (; i < n && v[i] == v[begin]; ++i);

    // This is one-past the last element in the current group.
    const int end = i;

    // Update the best mode found until now.
    if (end - begin > best) {
      best = end - begin;
      outputSize = 0;
    }
    if (end - begin == best)
      output[outputSize++] = v[begin];
  }
  return outputSize;
}
jbernadas
The question is about C, not C++.
strager
Done, changed the example to use C instead of C++.
jbernadas
Sorting takes the whole computation to `O( N * lgN )` which is unnecessary; mode can be computed in `O( N )`
ArunSaha
I guess it is using hashing. While I know hashing gives really good results, I prefer to not use hashing in homeworks or really simple stuff, the ideas look simpler and does not depend on amortized analysis.
jbernadas
+1  A: 

I would go for a simple hash table based solution.

A structure for hash table containing a number and corresponding frequency. Plus a pointer to the next element for chaining in the hash bucket.

struct ItemFreq {
    struct ItemFreq * next_;
    int    number_;
    int    frequency_;
};

The processing starts with

max_freq_so_far = 0;

It goes through the list of numbers. For each number, the hash table is looked up for a ItemFreq element x such that x.number_ == number.

  • If no such x is found, then a ItemFreq element is created as { number_ = number, frequency_ = 1} and inserted into the hash table.
  • If some x was found then its frequency_ is incremented.
  • If frequency_ > max_freq_so_far then max_freq_so_far = frequency

Once traversing through the list of numbers of complete, we traverse through the hash table and print the ItemFreq items whose frequency_ == max_freq_so_far

The complexity of the algorithm is O(N) where N is the number of items in the input list.

For a simple and elegant construction of hash table, see section 6.6 of K&R (The C Programming Language).

ArunSaha
looks like a linked list to me.
Matt
@Matt: Yes, thats right, the `number`s that hash into the same bucket are linked in to a list.
ArunSaha
not sure i understand. As far as i know or at least i thought hash tables are suppose to be O(1). i did do a freq table using a linked list though. So it is on the order of O(N). still quite fast though.
Matt
@Matt: Yes, `lookup` and `insert` in hash table is "expected" O(1). The `O(N)` is coming from the fact that each input number (there are N numbers) has to be examined. For each input number, we do a hash-table lookup/insert, which is O(1). Hence, the entire procedure is `O(N)`.
ArunSaha
Yea thats what i thought. Ill have to leave the hash table for now and come back to it later. At least i learned linked list in C. Yay!
Matt