tags:

views:

181

answers:

1

To tag the data we create, I'm considering using uuids. Security is not an issue, so I was going to use version 1 (date-and-mac-address-based). The only concern is that each user may be creating multiple data files at once from different processes with multiple threads. Assuming Python's uuid library is thread-safe (though it doesn't look it), that still leaves the multiple process issue. I'm considering suffixing the uuid with a dash and the process number.

Since our group has little experience with uuids, are there any issues I need to keep in mind? How is the multiple process issue usually handled?

+1  A: 

Just use uuid4 for completely random UUIDs. There is no need to worry about collisions.

edit in response to comment: In my experience, redundant data leads to inconsistencies sooner or later. There is a reason that avoiding redundancy is a dogma of relational database design.

So don't use the UUID as a "redundancy backup" for the actual "originating computer" and "timestamp" data. Either use it as a pure unique ID carrying no other information, or don't use it at all.

Wim Coenen
I would prefer to have the (weak) traceability of uuid1. That metadata (creation time, computer) should be written into the files themselves, but like most coding standards, there's no guarantee.
AFoglia
Trust me, we have plenty of inconsistencies now, and we ain't writing nearly enough. I'd rather have it as a backup and never use it, than rely on all our programs to do the right thing. The latter is an impossibility. This at least give me a small chance when someone screws up the data. (I don't mean to use it programmatically, just every now and then when we need to track our mistakes.) If I was going to use it regularly, I'd use something easier to read than a uuid.
AFoglia