ansaurus

Question

Indexing a 'non guessable' key for quick retrieval?

Answer 1

+3 A:

In this case, I'd just create an index on the GUID. Thousands of sessions a day is a completely trivial load for a modern database.

Some notes:

If you create the GUID index as non-clustered, the index will be small and probably be cached in memory. By default most databases cluster on primary key.
A GUID column is larger than an integer. But this is hardly a big issue nowadays. And you need a GUID for the application.
An index on a GUID is just like an index on a string, for example Last Name. That works efficiently.
The B-tree of an index on a GUID is harder to balance than an index on an identity column. (But not harder than an index on Last Name.) This effect can be countered by starting with a low fill factor, and reorganizing the index in a weekly job. This is a micro-optimization for a databases that handle a million inserts an hour or more.

Andomar 2009-05-24 08:01:00

so the fact that there's no 'order' to the GUIDs shouldn't concern me. the index knows that 123456 is newer than 123455, but loses that knowledge for a GUID. i guess the index is behaving like a hashtable whether you're indexing on a GUID or an integer column anyway? i.e. find the 'block' or whatever it is, and scan for the specific item requested. is that a correct assumption?

Simon_Weaver 2009-05-24 08:15:53

An index is a b-tree, not a hashtable. The order does not matter, Sql Server would "walk" the b-tree to both a GUID and an identity column. At the end of the walk it finds a reference to the row (a Lookup.) So your assumption is correct. I've edited the post after thinking it over again. :)

Andomar 2009-05-24 08:32:15

@Andormar The implementation of indexes changes from database to database, column to column and table to table. Usually its a B-Tree, but it might also be an R-Tree or a hash or something else. The database will try to guess which is best, or you can often tell it which to use. Postgres even has something called GiST http://www.postgresql.org/docs/8.1/static/gist.html

Schwern 2009-05-24 09:23:41

Answer 2

+1 A:

If I understand what you're asking, you're worrying that indexing and looking up your users by their hashed GUID might slow your application down? I'm with Andomar, this is unlikely to matter unless you're inserting rows so fast that updating the index slows things down. Only on something like a logging table might that happen, and then only for complicated indicies.

More importantly, did you profile it first? You don't have to guess why your program is slow, you can find out which bits are slow with a profiler. Otherwise you'll waste hours optimizing bits of code that are either A) never used or B) already fast enough.

Schwern 2009-05-24 09:19:07

Answer 3

+2 A:

Assuming you are using SQL Server 2005 or above, your scenario might benefit from NEWSEQUENTIALID(), the function that gives you ordered GUIDs.

Consider this quote from the article Performance Comparison - Identity() x NewId() x NewSequentialId

"The NEWSEQUENTIALID system function is an addition to SQL Server 2005. It seeks to bring together, what used to be, conflicting requirements in SQL Server 2000; namely identity-level insert performance, and globally unique values."

Declare your table as

create table MyTable( 
   id uniqueidentifier default newsequentialid() not null primary key clustered
  );

However, keep in mind, as Andomar noted that the sequentiality of the GUIDs produced also make them easy to predict. There are ways to make this harder, but non that would make this better than applying the same techniques to sequential integer keys.

Like the other authors I seriously doubt that the overheads of using straight newid() GUIDs would be significant enough for your application to notice. You would be better of focusing on minimizing roundtrips to your DB than on implementing custom caching scenarios such as the dictionary you propose.

Peter Stuer 2009-05-24 10:03:43

+1 Interesting though probably not applicable here, because sequential guids would be easy to guess?

Andomar 2009-05-24 10:12:24

@Andomar: True. http://blogs.msdn.com/sqlprogrammability/archive/2006/03/23/559061.aspx explains how NEWSEQUENTIALID is implemented. You could make it less predictable by running random interval calls to UuidCreateSequential through the system, tuning the mean frequency to the desired dispersal, but that would be a kludge and impractical depending on the desired level of unguessability.

Peter Stuer 2009-05-24 11:21:28

ansaurus

tags:

views:

answers:

Indexing a 'non guessable' key for quick retrieval?

related questions