ansaurus

Question

Avoiding duplicate identifiers in the database

Answer 1

+1 A:

Use a UNIQUE INDEX on that column, or a PRIMARY KEY!

Benoit 2010-10-14 13:32:04

+1 Can't agree more

m.edmondson 2010-10-14 13:32:49

we already have a primary key... This identifier is very customized, and it's determined based on many factors... I would say we have about 200 different tracks that are counting up (200 different strings+####) The strings are for the users to distinguish their items based on certain creteria.. Therefore can't use a primary key for this as it would just gimme a single number.. that counts up... I'm looking into the Unique Index though

Wonder 2010-10-14 13:38:23

Can't use unique index as some of the number need to be duplicates.. See EDIT2

Wonder 2010-10-14 13:42:14

Answer 2

+5 A:

Look up the command CREATE SEQUENCE. Oracle can handle the unique number generation for you. (Virtually) Every database has a way of handling this issue although they differ somewhat in how it's implemented.

[In response to the new requirements edited into the question]

You can still use a SEQUENCE to produce the counting-up part, then combine that with the prefix and save it back to the database. And, in those rare cases in which you need a duplicate ID, don't get a number from the SEQUENCE, just use the one you already have. But SEQUENCE will solve your problem of creating a "well" from which you can draw a guaranteed unique number when needed

Larry Lustig 2010-10-14 13:34:33

like i mentioned in the edit.. This isn't a single number that just counts up.. Our identifiers contain different characters plus a number that counts up.

Wonder 2010-10-14 13:39:29

See edited answer, above.

Larry Lustig 2010-10-14 13:44:53

the counting up number doesn't count up system wide but only for that 'STR-STR-####' combination.. Do you mean that i should create a sequence for each 'STR-STR-####' combination and grab the number for each one of these STR-STR sequences.. That will mean we'll have like 3-400 sequences in our database which will be growing fast...

Wonder 2010-10-14 13:48:45

Yes, you use a separate SEQUENCE for each sequence you need. However, if you need to create new sequences based on changes to the _data_ in the system (eg, each category added to the categories table requires a new sequence) you might reconsider your design. Doing the unique-enforcement in code will be somewhat complex and probably require table-locking at some point. If you MUST do it that way, the appropriate place for the code is the INSERT trigger for the table.

Larry Lustig 2010-10-14 13:55:42

Answer 3

+1 A:

Now there is some issues with this, there is about 1000 lines of codes between getting the next number and the commit to the database. The issue is when 2 people create an item with the same criteria within seconds of each other, sometimes the same number is issued.

This is worrying. You should really be using a stored procedure for this and wrap it all in a transaction. You got nothing guaranteeing that any two records will have a different number and will cause problems. But I agree - you need to set a column as a unique identifer and use primary keys - it is a relational database I assume!

m.edmondson 2010-10-14 13:37:26

it's actually like 350 lines :) lol

Wonder 2010-10-14 13:51:46

lol any amount of lines is still going to be a delay of some sort...

m.edmondson 2010-10-14 13:56:59

Answer 4

A:

Create a PRIMARY KEY column or a UNIQUE KEY column that is auto incremented (either with an AUTOINCREMENT keyword or via SEQUENCE (as Oracle does)). That way, if 2 people add 2 exact data row at the exact same time, the database will add 2 exact values but each with their own unique ID.

The Elite Gentleman 2010-10-14 13:38:13

Answer 5

A:

Yuor code should definitely be replaced with a SEQUENCE. Oracle manages this concurrency very well.

Also, this sequence should be saved in a column that is constrained as UNIQUE.

Randy 2010-10-14 13:40:27

Answer 6

+1 A:

Hi Wonder,

You might need some sort of serialization process to go through this. One path I would recommend is leaving this field blank during the initial insert, wait for the procedure to commit and then have another process (a recurring job for example) fill this column.

This other process would fill all rows according to your business rules. This job would be the only one able to touch this column and you would therefore avoid any concurrency problem.

You could set the job to run every X seconds, this would imply a small delay during which the column would be empty in this setup. Or you could let the initial session launch the update process after the commit with some sort of serialization (a preemptive lock on a row so that no two update process can run at the same time).

Vincent Malgrat 2010-10-14 14:08:59

Hi Vincent, thanks for the reply but after the saving, email notifications go out and the number has to be included in the email notifiations, thefore even a few seconds of delay will effect the email notifications

Wonder 2010-10-14 15:21:06

@Wonder: the naming process could send the email once it has attributed the correct id :)

Vincent Malgrat 2010-10-14 15:53:35

Answer 7

+5 A:

It sounds like you've denormalised 3 pieces of data into 1 field. This is what your string field currently contains:

StringField char (12): STR-STR-####

This is what you really should have (example field names only; it would help to give these meaningful names, but I don't know what your data represents):

Str1 char (3): STR
Str2 char (3): STR
ID int: ####

You can now use a sequence in the ID field.

When you want to get your original string back, you concatenate the contents of the Str, Str2 and ID fields (with dividing hyphens).

In short: Your database design is broken, and you're now paying the price for it. My advice would be to fix the design by normalising your identity field into 3 separate fields. Alternatively, you'll find that you spend hours recreating built-in functionality of the database and end up with a solution that is buggy and has horrible problems with race conditions.

Alternatively, if PL/SQL allows all of this functionality:

Create a table that contains the following fields:

Str1 char (3)
Str2 char (3)
CurrentID int

Then:

For each possible combination of STR-STR identifiers, add an entry into the database, with the "CurrentID" value set to 0.
Write a stored procedure for retrieving the next ID. It will lock the relevant row based on the STR-STR pair passed in, get the value in CurrentID, increase the value, unlock the row and return the increased value.
Call the procedure whenever you need to generate a new ID.

You'll have no concurrency problems, as each attempt to get an ID will have to wait for any other attempts to finish. Each STR-STR pair will have its own counter.

Ant 2010-10-14 14:30:19

the second idea is not bad at all... So when i call the stored procedure, it locks that row by default but the changes that i've done to that table doesn't get reflected on it until the calling procedure finishes running, correct? So this might create some errors? How can i commit only changes to that table and not commit the changes i've made in the calling procedure... (we can't commit until the very end because sometimes if an error occurs, we need to roll back all the changes.. Although it will be acceptable not to rollback the changes to the CurredntID table)

Wonder 2010-10-14 15:19:34

If you lock a row then any other procedure attempting to read the row waits until the lock is released. So, if Joe and Bob try to get a new ID at the same time: - Joe locks the row; - Bob tries to read the row but can't, so waits; - Joe reads the row; - Joe increases the ID; - Joe unlocks the row; - Bob locks the row; - Bob reads the row; - Bob increases the ID. This will no doubt cause a bottleneck if you have a lot of simultaneous users, but you've really painted yourself into a corner with the current design.

Ant 2010-10-14 16:56:49

"If you lock a row then any other procedure attempting to read the row waits until the lock is released" I don't think that is correct, Oracle doesn't have read locks. If the readers are trying to talk their own lock (`select for update`) then they would wait.

Shannon Severance 2010-10-14 17:28:18

Yep, you're right. You'd need to use SELECT FOR UPDATE (which is what we're trying to do anyway - select the value and update it), and I think other readers attempting to use SELECT FOR UPDATE would need to wait for the lock to be released before they could continue. I could be wrong; MSSQL and MySQL are my DBs of choice, not Oracle.

Ant 2010-10-14 20:08:12

Answer 8

A:

You could add a table to hold all the identifiers that have been used and need to be unique like this:

create table unique_identifiers (id varchar2(12) primary key);

Then whenever in your 350-line logic you generate a 'STR-STR=####' value that needs to be unique, insert it into that table. If the insert fails, get another number and try again.

Tony Andrews 2010-10-14 14:45:29

Answer 9

+1 A:

This could be done using a function-based unique index. First you'd need to add a column to the table to specify whether the identifier needs to be unique for each row:

alter table ... add (identifier_must_be_unique varchar2(1)
    check (identifier_must_be_unique='Y'));

Now create a unique FBI that only holds the identifiers that need to be unique:

create unique index xxx on yyy 
(case when identifier_must_be_unique='Y' then identifier end);

Finally, in your identifier generation logic, whenever you need the identifier to be unique, set identifier_must_be_unique='Y'; otherwise leave it as null). The FBI will then implement the conditional constraint.

Tony Andrews 2010-10-14 14:51:17

Answer 10

A:

You have a broken data model. But, i'm assuming fixing it isn't a viable option. If you biggest problem is the duplicate keys when you run two quick selects against the table that contains the sequence values (you should use sequences here... but you know that)... then you can try to use the "select ... for update" which will lock the row for the session accessing it. Be careful, this logic may introduce latency into the application.

erbsock 2010-10-14 16:28:05

Answer 11

A:

I may be missing something obvious, but I'd suggest something similar to what someone else suggested earlier. Create a unique constraint on the column such that when you try to persist the record, it will fail if some other process has already used that custom generated ID. You then can trap that exception in the PL/SQL and implement some sort of retry logic until you are able to successfully insert a unique generated ID into the database.

Dave LeJeune 2010-10-14 16:59:01

Answer 12

A:

Well noone of the answers really solved the problem, mostly because i didn't convey the whole situtation properly..

But basically i put my check just before the commit into a loop and updated and committed and checked again within the same loop.. and if still existed, the loop ran again, in which case the check happened.. It's kinda hard to explain but the chances of a duplicate is extremly low (the loop has a limit of 100)

Wonder 2010-10-14 20:05:36

This is the "close your eyes and hope for the best" approach to concurrency issues...

Ant 2010-10-14 20:24:02

the only way that this would actually produce a duplicate is when the loop hits the limit of 100 because there is like 20 ppl creating an item for that particular STR-STR combo (which is almost impossible) then our support team would get the error email, and correct the duplicate by assigning a new time but this is very highly unlikely

Wonder 2010-10-16 05:13:24

Answer 13

+1 A:

I almost hate to suggest this, but since the approach that you settled on was pretty lousy then the one I'm going to show here is pretty good in comparison. Forgive the sarcasm, but when you have this many users telling you that your approach in general is wrong you ought to take that into consideration.

Somewhere you are probably running a SQL like this:

SELECT MAX(DENORMALIZED_FIELD)
INTO   BADLY_NAMED_VARIABLE
FROM   POORLY_ORGANIZED_TABLE
WHERE  DENORMALIZED_FIELD LIKE 'ABC-XYZ%';

Then later you're probably split the variable using SUBSTR, parse the second part into a NUMBER, increment it, and build a new variable with the new code.

What you can do is add the FOR UPDATE clause to the SELECT statement, placing a lock on the records in question. Sure, you're not actually updating them, but by your definition you have to serialize the operation. This is a non-performant, unscalable, and dirty way to get what you want, but it should work. Check the Oracle docs to see all the implications.

Adam Hawkes 2010-10-20 02:16:53

thanks man, no worries about the sarcasm, it's forgiven, i'm on my coop term (finished 2nd year) and i've been dealing with databases for about 5 months now.. I will use this though..

Wonder 2010-10-27 02:55:45

Also one question i have is, once we lock the table where I select the max(denormalized_field), and the second request by a user is made, what happens when it tries to select from there? Is it going to give it the MAX(Denormalized_Field) - 1 because it doesn't see the actual MAX row which is locked. Sorry still fairly new to the concept of locking as well

Wonder 2010-10-27 02:57:40

Oh, man, a student! Now I feel bad. I hope that once you've graduated you'll end up in a place where schemas like this get refactored into something reasonable. To address your concern, the second user should be waiting on the lock imposed by the `FOR UPDATE`. This is one of those cases where Oracle will wait on a read! That said, you really ought to give this a shot first to confirm. I've not recreated your entire example!

Adam Hawkes 2010-10-27 03:00:43

lol it's all good,no need to feel bad..I love my job, i get so involved.. even though i've never thought that i would be working on databases.. Regardless, it would be great if you could answer my previous question, and also I was wondering why you would 'hate to suggest' this

Wonder 2010-10-27 03:03:18

I hate to suggest this because, as demonstrated in the other answers, there are a lot of things wrong here in the data model. Fixing these things would make this particular problem simpler / non-existant. Also, locking rows is something to be avoided. Serialization like this is bad for scalability and performance (generally speaking).

Adam Hawkes 2010-10-27 03:08:03

I will test it first, database is shitty but we do still have dev schemas :) Anyways I think this should work if oracle will wait on a read and not throw and error

Wonder 2010-10-27 03:13:46

ansaurus

tags:

views:

answers:

Avoiding duplicate identifiers in the database

related questions