views:

357

answers:

12

I am currently trying to implement an algorithm to select a unique (16-bit) identifier. The challenge is to do this in an fast way that doesn't use too much memory. The list of currently used identifiers is determined through scanning an external Flash device via a sequence of SPI transactions and is therefore a relatively slow process. Also, the algorithm will be running on a small-ish microcontroller, so I can't really just read all the entries into RAM and process them there.

The thoughts I've had so far are:

  1. Pick a number, then scan through the list and see if it's used. Rinse and repeat. Suffers from being rather slow (particularly if there are a lot of files).
  2. As above, but pick the number using a pseudo-random number generator with an appropriate seed. This has the advantage that it's less likely that there will be such large numbers of iterations.
  3. Scan through the list and populate an array with all the entries found. Sort it and then it becomes trivial. This could use an enormous amount of memory.
  4. Use an enormous (okay, ridiculously enormous) bit mask. Not really practical.
  5. Accept that the life-time of the tool is such that it will be thrown away or 'formatted' long before it has written 65534 files to the Flash, so just store the highest value used so far in the Flash or Backup memory and keep incrementing. In all honesty, this would probably work quite well for this specific application.

At the moment, I'm verging towards using either the second one or the fifth, but I'd be interested to know if anyone has any other thoughts. I'd like to think that there's an algorithm similar in form to a CRC that could be used to process each number in turn and give a fair idea of a number that hasn't been used, but I've no idea how this might work.

A: 

I will go with 2. Be careful however how you pick the generator and the seed. All pseudo-number sequences repeat them selves after some number of iterations. So you need to test yours that it won't repeat itself too soon.

kgiannakakis
+1  A: 

I'm guessing that the FLASH device is not removable due to the mention of SPI, but IIRC SD cards have an SPI access mode, so that may not be true.

If the FLASH is permanent and you have a robust, non-volatile place to remember the last ID issued, then that is probably the thing to do. It is certainly fast and low memory at run time. It should be easy to explain, implement and test.

If the FLASH is removable, then using a pseudo-random number generator and testing for collisions is probably the way to go. Assuming your numbers are well distributed, your chances of a collision are easy to predict from the total in use. Just pick a generator with a decently long repeat interval. It may be a good idea to mock this up in a simulation as an acceptance test for the selected algorithm.

RBerteig
The Flash is indeed permanent.
Al
A: 

I would try a variant of 3. Instead of storing a sorted array of values, I would store a sorted array of ranges.

mouviciel
+4  A: 

I think you have some options here, but one more to consider is a Bloom Filter. This has a chance of false positives (i.e. you may rule out an ID as already used even though it hasn't been) but it can allow you to choose the exact amount of space you can dedicate to this data.

Kevin Peterson
+1  A: 

I'm wondering why you don't simply store the last ID and increment it. Is there a reason why you hesitate? You don't give one on your question, just a general uneasiness.

If you need the ID to be somewhat random for security reasons, then use a random number generator and save the current register values of the generator in the flash memory. This way, you can load them for the next ID which makes sure you'll get the full cycle length without repeats if you choose your algorithm carefully.

[EDIT] Since you're concerned with collisions, there must be some data where the collision can occur, for example in file names or some such. If the obvious approach (create a filename and check whether it exists) is too slow and you have huge gaps in the "allocation map", then generate a random ID and check with that. This should allow you to find an unused ID with just a few iterations.

Aaron Digulla
There's no security need for the ID; the only reason I'm uneasy with storing the last ID is in the situation that I've reached ID number 65535. It's assumed that there will be gaps as IDs will have been deleted (there wouldn't be space in Flash memory for 65535 entries), so I then have to find which ones are unused, which brings me back to the same problem.
Al
This solution is fast, simple and predictable. If you're concerned that you may one day overflow, perhaps implement searching for an available ID as a background task (only after overflowing of course).
Peter Gibson
A: 

How much RAM do you have? It's a bit hard to tell, "embedded" can mean so much these days. :) A bitmap would require 8192 bytes, for the duration of the generation, and guarantee perfect results every time.

I also considered some kind of "sparse" bitmap, but I don't know a suitable data structure off hand, but might be worth investigating.

unwind
The total RAM on the microcontroller is 10k, but there's quite a lot going on (including a fair few circular buffers), so 8k is a bit much, even temporarily.
Al
+4  A: 

If there isn't enough RAM to implement a bitmap large enough for 64K entries, the number of scans through FLASH to find an unused ID could be reduced by using a smaller, temporary bitmap for each scan. A 16-byte bitmap could record found IDs in the range 0-255 on the first pass, 256-511 on the second scan, etc., at the end of each scan if there is at least one unmarked bit in the bitmap you're done. I believe this would work well in combination with the use of a random starting range.

On the other hand, if I had high confidence in option 5 I might just go with that.

Lance Richardson
A: 

Keep a running count a sequential ID.
Run the ID through MD5.
Use the lowest 16-bits.

The theory is that MD5 creates a different hash for each input. The lowest 16-bits should be just as "random" as the whole hash.

Robert
A: 

If you could use bigger ids then 5 would be a no-brainer.

starblue
A: 

About your CRC like algorithm interest...

It appears that you are looking for an algorithm that will run through a random list of less than 64K 16-bit numbers and generate a new 16-bit number not already in the list; preferably doing this in a single pass through the given list.

If the sequence in which IDs are freed has no relation to the sequence in which they are allocated (like i feel is your case), there is nothing you can do with generation or allocation of the IDs to get your algorithm.

The best bet then seems to be 5 from your list.

If you are adventurous...

and, if you can re-number your IDs (that is change an allocated ID to another unallocated ID), you could run a 'defrag' kind of iteration once in a while to move all allocated IDs to lower values and find the highest free ID number for the next allocation. It would help to remember the total number of allocations and frees done since the last such 'defrag' run. Allocate incrementing sequentially starting from 0.

This way you need to remember just 3 unsigned short integers in memory. And yes, take a slightly costly re-allocation iteration once in a while based on their values.

nik
A: 

Another option would be to keep a file of valid ids on the flash drive. This way, you aren't querying all the possibilities every time. If you want random IDs, then randomize the file. Store the offset to the final one as the first number in the file. When you need one, remove the last from the file, and when you free one, add it back to the file. With the offset and a flash drive, it should be a nearly constant-time operation no matter the number of IDs left. As a bonus, the offset at the beginning will tell you how many IDs you have left, if you need to know that at any point. The downside would be that you do have to access the flash memory for each ID (constant-time, but still an access), and how to handle the error case if the file is not present.

Caleb Huitt - cjhuitt
+1  A: 

Use a Maximal Linear Feedback Shift Register and store the last value you handed out. An LFSR will, given a particular starting point (not including zero) give you all the numbers in the sequence 1..2^n, in a pseudorandom order. If you start with the kth element, you will always get the same k+1th element. The implementation is tiny:

if (IsEven(sequence)) {
    sequence /= 2;
}
else {
    sequence = (sequence / 2) ^ feedback;
}

where feedback is a bit pattern from a table of maximal feedbacks for the number of bits you want to generate. This means that to generate the next number, you read the last number handed out, run it through the above code and then use it.

Alternately, why aren't you just counting up and storing the last number given out?

plinth