views:

56

answers:

2

I need to generate encoding String for each item I inserted into the database. for example:

x00001 for the first item
x00002 for the sencond item
x00003 for the third item

The way I chose to do this is counting the rows. Before I insert the third item, I count against the database, I know there're already 2 rows, so the next encoding is ended with 3. But there is a problem. If I delete the second item, the forth item will not be the x00004,but x00003.

I can add additional columns to table, to store the next encoding, I don't know if there's other better solutions ?

+3  A: 

Most databases support some sort of auto incrementing identity field. This field is normally also setup to be unique, so duplicate ids do not occur.

Consult your database documentation to see how it is done in your database and use that - don't reinvent the wheel when you have a good mechanism in place already.

Oded
Id does not work for this. There're various prefix, so if the forth item's prefix is Y, it should be Y00001.
ZZcat
@Tony - in SQL Server, you can specify an algorithm to generate the ID. It does not _have_ to be integers. Like I said, look up your database documentation.
Oded
A: 

What you want is SELECT MAX(id) or SELECT MAX(some_function(id)) inside the transaction.

As suggested in Oded's answer a lot of databases have their own methods of providing sequences which are more efficient and depending on the DBMS might support non numeric ids.

Also you could have id broken down into Y and 00001 as separate columns and having both columns make up primary key; then most databases would be able to provide the sequence.

However this leads to the question if your primary key should have a meaning or not; Y suggest that there is some meaning in the part of the key (otherwise you would be content with a plain integer id).

Unreason
O(n^2) to insert n records is not performance to be proud of.
Nick Johnson
@Nick, O(n^2)? I explicitly state that using DBMS facilities is more efficient, but if you assume MAX(id) -> O(n) then think again, these are RDBMSs; with id unique and indexed, acquiring the MAX(id) value would require only a single I/O (read the last entry in the index) and you would be done. Not that I would suggest to use MAX(id) if you are inserting any number of sequential records, but the principle should be clear.
Unreason
Really? Use your DB's explain statement and see, because many DBs can't calculate max without scanning the entire result set, even with an index. Postgres, for example, can't do it because aggregate functions are implemented using a generic map/reduce style API.
Nick Johnson
My postgres when asked to explain select max(c) from test2 says `Index Scan Backward using test2_c on test2 (cost=0.00..19831.80 rows=200000 width=13)`, but even though there are 200,000 rows to scan the query returns immediately because max value is at the end of the index and it chose backward index scan. I assure you this not done in linear time.
Unreason
@Nick, When I said not done in linear time I meant to imply that it is done in constant time.
Unreason
@Unreason Going by the postgres docs, they 'fixed' this in 8.1. Nevertheless, my original point stands: You can't simply assume that an aggregate such as max() can be executed in O(1) time, because frequently it can't.
Nick Johnson
@Nick, so postgres does it since 2005; mysql does it too. I would assume proprietary engines do it too. So I would say that O(1) is more of a rule (and it is logical, even if reading an index backwards from the drive it is a constant steps operation). Still I will grant you that when it comes to performance in RDBMS many things are possible and that everyone should test and compare different approaches in real situations. Hence I provided the alternative answer.
Unreason