Hi there,
I subscribe to a data feed and from that create and maintain a structure using the index values on the INSERT/DELETE messages. I would like to ask the assembled cognoscenti whether they know of any algorithm which can deal with the piecemeal updates in an efficient way - generally batch-updates contain between two and six such messages.
The estimated size of the array is around 1000 elements.
Batch updates arrive as a list of messages ordered by index, which stipulate the insertion or deletion of an item at a given index. I expect most of the churn in the array to be nearer its start than its end.
It occurs to me that with some basic processing I can determine the range affected by the batch and the overall size-delta, and therefore move an unaffected tail section of the array just once.
Similarly, I could maintain a certain amount of free space before the first element and after the final element to do the least amount of copying possible.
Other optimisations include recognising updates such as the following:
DELETE 10, INSERT 10 - effectively a replace which requires no copying
INSERT 10, DELETE 11 - as above
DELETE 10, DELETE 10, DELETE 10 - bulk deletion can be optimised into one copy operation
INSERT 11, INSERT 12, INSERT 13 - bulk insertion can be optimised into one copy operation
and so on.
However, I'm wary of the overhead in performing the recognition step - it smacks of look-aheads and track-backs, which may well take more time than simply performing the copy.
Given the expected size of the array, tree structures seem heavyweight: some basic performance testing suggests that binary or self-balancing trees (in this case a red-black tree list-implementation) only begin to show performance advantages after around 15K - 20K elements: array-copies are significantly faster at lower sizes. I should probably add that I'm using Java for this implementation.
Any hints, tips or suggestions would be welcomed.
Cheers
Mike