views:

68

answers:

4

My process runs multiple instances (processes) and multiple threads and all of them write to the same database. As soon as the request is placed, a unique req id is generated for the record to be added to the proprietary db. Here are our limitations: It cannot be more than 9 char length, needs to have hhmmss as the first 6 chars. We decided to use ms for the last 3 digits to complete the 9 chars and we are doing all this using gettimeofday() . However, with increased traffic, there are now instances of collisions when multiple requests are placed with in a ms period. This combined with the fact that gettimeofday() itself is not accurate is causing an increased number of collissions. I tried to use clock_gettime but when tested, it is also not that accurate as I observed from the following test program:

  • We couldn't use static or global variables due to threading issues
  • Unable to use random numbers as they need to be sequential

Appreciate any help.

#include <time.h>

int main( int argc, char **argv )
{
    long i;
    struct timespec start, stop;
    double gap;

    clock_gettime( CLOCK_REALTIME, &start);

    for (i =0; i< 123456789 ; i++);

    clock_gettime( CLOCK_REALTIME, &stop);

    gap = ( stop.tv_sec - start.tv_sec ) + ( stop.tv_nsec - start.tv_nsec ) / 1000000;
    printf( "%lf ms\n", gap );
    return 0;
}
A: 

Using a time stamp as a unique ID will never work reliably unless you limit yourself to only one transaction per lowest clock tick (1 millisecond in this case).

Since you are stuck using a time value for the first 6 of 9 bytes you need to try to fit as much range into the last 3 bytes as possible.

If you can get away with not using ASCII characters in the last 3 bytes then you should avoid it since that will limit the values it can have a great deal. If possible you should try to use these bytes as a 24 bit integer (range of 16777216) and just have each transaction increment the counter. You could then set it back to 0 each time that gettimeofday let you know that the time had changed. (or you could set up an repeating SIGALRM to let you know when to call gettimeofday again to update your time and 0 the 24 bit integer).

If you are forced to use ASCII printable characters for these bytes then things are a little bit more difficult. The easiest way to extend the range of this would be to use hexadecimal rather than decimal numbers. This grows your representable range from 1000 to 4096. You can do better if you use an even broader number base, though. If you tacked on the first 22 characters of the alphabet (the same way that tacking on the first 6 letters is done for hex) then you can represent 32x32x32 values, which is 32768. That would be a lot of transactions per second. You can do even better if you extend your numeric alphabet even further, but it will become more piecemeal as you do since you will probably want to restrict some characters from being appearing in the value. Using a representation that strtol or strtoul can easily work with will likely be easier to program.

If your application is multithreaded then you may want to consider taking up part of your numeric range as a thread ID and let each thread keep its own transaction counter. This will make determining the relative time between two transactions processed by different threads more difficult to calculate, but it will keep the threads from all wanting to increment the same memory location (which may require a mutex or semaphore).

nategoose
It won't even work if you limit yourself to one transaction per clock tick, because there's no guarantee that the clock doesn't occasionally run a little bit slow (e.g. when being slewed by ntpd, if it got a little ahead) and hence stay apparently "the same" for longer than that.
MarkR
If the clock is running slow then it doesn't tick, does it? I was speaking with a monotonic clock in mind. One with no actual period -- just an ever increasing value. If the value moves backwards or held in the same state for a _long_ time (ntpd adjustment) then you're pretty much screwed if you are using this clock as any part of your timestamp.
nategoose
Although the explanation doesn't give a direct solution due to my legacy limitations and limited problem description on my part, it opened up many avenues that I am yet to explore. Thank you.
Kiran
A: 

Generally using clock time on a heavy loaded system like this with a resolution under a second is a bad idea anyhow. Threads will take their timestamp and then be descheduled in the middle of the operation, so you will see things arriving out of order.

Three characters left to encode things uniquely is not much. Try at least to use some different encoding such as base64.

If you use gcc as a compiler you have thread local storage (TLS) as an extension that is quite efficient. Just prefix your static variable with __thread (or so). If you are restricted to phtreads, there are means to get thread specific keys, too, pthread_get_key. But better would be to have the information as long as possible on the stack of the thread.

To obtain a per thread counter that makes a serial number for your request use

  • your hhmmss timestamp as so far
  • as much bits that you need to identify your threads
  • the last bits for the per thread serial number as above that should only wrap round after more than a second

You could even be cheating and yield a thread that fires too many requests within the same second.

Jens Gustedt
A: 

I guess you could give each thread of each process a unique ID at startup, I guess this would take only one of the 3 available characters unless you have hundreds of threads. You can then use a local counter per-thread to set the last two characters (using base64 or even more, depending on what characters are allowed, to get enough amplitude).

In this situation the only case where a collision may happen is if the counter of a thread wraps during the same second.

Of course, this is a dirty hack. The Right Way would be to share a ressource amongst the threads/processes. It might be the simplest solution in your case tho.

simias
+1  A: 

The type of problem you are describing has already been more-or-less solved by issuing a UUID. This is a system that is designed to solve all the problems you mention and some more.

A linux library: http://linux.die.net/man/3/uuid

More information is available here: http://en.wikipedia.org/wiki/Universally_unique_identifier

teambob
Thanks, this would be helpful in the future for me.
Kiran