views:

385

answers:

7

I am looking for an idea, concept or proven datastructure that would be very efficient when accessing a collection that keeps cumulative values.

Example may shed more light on my need:

I have a list of values (2,3,5). This list when looking at the cumulative values would be (2,5,10).

I will now add 1 at the start of the list and get (1,2,3,5) and in cumulative terms (1,3,6,11).

I only need to look at the cumulative values, I am not at all interested in the 1,2,3,5. I need to be able to quickly insert at position, remove a position and all this should quickly update the cumulative array (ideally without iterating throughout the whole array and recalculating the values.

Any ideas or hints ?

@Kristo (too long to put in the commentary): To clarify why negative numbers would make the total sum value meaningless please follow this example.

Insert 1 followed by -1. Sum is 1 than 0. (1,-1) // (1,0) Insert 3 followed by insert -3. Sum is 3 then 0. (1,3,-1,-3) // (1,4,3,0) Insert 2 followed by insert -2. Sum is 2 then 0. (1,3,2,-1,-2,-3) // (1,4,6,5,3,0)

If my "magic number" were 4 total sum wouldn't tell me whether I've exceeded it.

PS: The main reason for this is to be able to tell if I went over a certain value and where in the chain.

+1  A: 

Check the cumulative frequency table.

Igor Krivokon
+1  A: 

There are two simple ways that I see, both use a basic data types - lists.

  1. Keep the original list, and recalculate the cumulatives on every change.

  2. Only keep the cumulative list, and only add to it or delete by using functions like:

    • Add(item,position default is end of list) would add the value of the item starting from position-1.
    • Delete(position) would calculate the original value be subtracting two numbers, then decreases this number from the rest of the list before deleting the item.

    Add 2 : (2) adds 2 to the empty list.

    Add 3 : (2,5) adds 3 at the end of the list to the prior element (2).

    Add 5 : (2,5,10) adds 5 at the end of the list to the prior element (5).

    Add 1 at start: (1,3,6,11) adds 1 at the start of the list, and increments by 1 till the end (no prior elements).

    Add 7 at 2nd position: (1,8,11,14,19) adds 7 and increments by 7 till the end (no prior elements).

    Delete 3rd position (The 11) : (1,8,3,8) get the value, delete it, add the value to the rest.

This way would keep it all synchronized without keeping the original values.

Osama ALASSIRY
Thank you for your answer. This would still basically optimize the memory space, but you still have to recalculate the whole list after the change. I am wondering whether I could partially avoid it.
Tomas Pajonk
I think you made a few mistakes at the end there. Adding 7 at position 2 should yield a cumulative list of (1, 8, 10, 13, 18), the actual list would be (1, 7, 2, 3, 5). So then removing position 3 should yield (1, 8, 11, 16), from an actual list of (1, 7, 3, 5).
SergioL
I don't think you can add numbers without adding them.How big do you expect the list to be?? would this be on a micro-controller? Addition fixed.
Osama ALASSIRY
You're still a little off, I think. Plus removing the 3rd position shouldn't be construed as adding a negative number to the list...your cumulative list should be growing here.
SergioL
+1  A: 

Using C++ terms, could you get away with a std::list (easy inserts/removes in the middle) or std::set (always sorted) for the data and one variable to hold the sum? On every insert/remove, you modify the sum as appropriate. The sum represents the highest number in your would-be cumulative list. Only when you bust your magic number do you need to do some algorithmic work to figure out where you busted.

Update:

Based on your new information, I don't see many shortcuts available. You need to insert or remove frequently from the middle, so that suggests some sort of linked list approach. You can save a little bit of calculation by only updating the portions of the list that have changed. Let L be the list of values and n be the desired location in the list. To insert value x at location n:

  • Insert the value x + L(n-1) at location n
  • Add x to all elements after this new n
  • Stop if you bust your magic number

The procedure is the same for a removal, except you subtract from all subsequent values. This way, you're only doing a lot of work if you insert near the beginning.

Kristo
Hmmm. Interesting.
Tomas Pajonk
The only problem I can see is when negative values would be in place.
Tomas Pajonk
What's wrong with negative numbers?
Kristo
Ammended the question to show example where negative numbers make the total values less usefull.
Tomas Pajonk
Kristo, I think the algorithm to keep track of this would end up being equivalent to just recalculating (a part of) the list every time.
dss539
+3  A: 

In C#, I would keep all the actual values in a list, and use a custom iterator to loop through the cumulative values.

You'll only recalculate up to the point where the iterator tells you that you've past your limit (obviously, you'd have to code for that).

I think the value is that you can add/remove without doing any calculations until it's time to iterate over the list (which I think you need to do anyway to find the cut-off number).

SergioL
+4  A: 

The only optimization I can think of is to do a 'lazy' evaluation of the cumulative list.

in addition to your list of source values, keep track of a number of the highest position in the cumulative list that is accurate. if you need a number higher than that then you walk up the list, updating the values, and the index.

idx  values       cumulative    operation
 3   (2,3,5)      (2, 5, 10)
 0   (1,2,3,5)    (X,X,X,X)     insert 1 at 0 
 3   (1,2,3,5)    (1,3,6,X)     look for value over 5     
 3   (1,2,3,5,4)  (1,3,6,X,X)   insert 4 at 4 

if course, this wont do you a whole lot of good if you are usually adding items early in the list....

Dolphin
This does look very reasonable.
Tomas Pajonk
+3  A: 

Use a binary search tree with the additional property that nodes contain the sum of their subtree. All the operations are still O(lg n). To insert or remove a value you do the normal procedure, and also update the sums of all parents. Getting the sum is as easy as finding the node containing the element and returning its sum minus its right child's sum.

Strilanc
+1 this sounds like the most reasonable algorithm so far.
Zifre
A: 
  • You can look at a data structure for cumulative frequency the Binary Indexed Trees
  • You can break your value range into fixed bit ranges. Ex. 3 intervals:

    #define NUM (1<<24)  // max value in your data set
    #define BITS0 8
    #define BITS1 8
    int cum0[NUM >> (BITS0+BITS1)]; // sum of cum1
    int cum1[NUM >> BITS1]; // sum of count
    int count[NUM];
    
    
    int add(id, val) { // add a value
      cum0[id >> (BITS0+BITS1)] += val;
      cum1[id >> BITS1] += val; 
      count[id] += val;      
    }
    
    
    int cumvalue(int id) { int cum = 0; // return cum value at index id   
      for(i = 0; i < (id >> (BITS0+BITS1));i++) cum += cum0[i]; i <<= BITS0;
      for(i = (id & ~((1 << (BITS0+BITS1))-1)) >> BITS1; i < (id >> BITS1); i++) cum+= cum1[i]; i <<= BITS1;
      for(i = id & ~((1 << BITS1) -1); i < id; i++) cum += count[i];      
      return cum;
    }
    
bill