views:

376

answers:

9

The reason I want to use a Guid is because in the event that I have to split the database into I won't have primary keys that overlap on both databases. So if I use a Guid there won't be any overlapping. I also want to use the GUID in the url also, so the Guid will need to be Indexed.

I will be using ASP.NET C# as my web server.

+3  A: 

Most any RDBMS you will use can take any number and type of columns as a PK. So, if you're storing the GUID as a CHAR(n) for some length n, you should be fine. Now, I'm not sure if this is advisable, as I'm guessing indexing on CHARs is not as efficient as on integers.

Hope that helps.

pbh101
Your guess is right. It's not even near efficient.
rogeriopvl
Also, a database that doesn't natively support GUIDs will probably not support generation of sequential GUIDs, and the "normal" random GUIDs have distribution that negatively affects indexing performance. That said, I've used a `CHAR(16)` field storing GUID values in binary as PK in a Firebird database without much trouble, so it is definitely a valid technique, if you mind the consequences.
Pavel Minaev
+1  A: 

I suppose you could store a GUID as an int128 as well.

Both mySQL and postgres are known to support GUID datatypes (I believe it's called UUID but it's the same thing).

Joshua
+12  A: 

Postgres has a UUID type. MySQL has a UUID function. Oracle has a SYS_GUID function.

Martin v. Löwis
are they all identical implementations of the same algorithm? can you mix MS-sourced GUIDs and Oracle-sourced SYS_GUIDs without fear of conflicts?
pbh101
Not at all; these are all different features. Read the references I provided.
Martin v. Löwis
A: 

Unless I have completely lost my memory, a properly designed 3rd+ normal form database schema does not rely on unique ints, or by extension GUIDs or UUIDs for primary keys. Nor does it use intermediate lookup tables of ints/GUIDS/UUIDS to relate the tables containing the data.

You should grind your schema until it expresses the relations amongst tables of data in terms of the data in the tables, not auto-generated identifiers that have no intrinsic relationship to the data.

I freely grant that you may just possibly be doing something that really really requires GUIDs (or auto-increment integers) for primary keys. But I seriously doubt that is the case - it almost never is.

Eric M
Further, most DBMS's offer ways to distribute a table across multiple physical disk volumes or across multiple physical servers - you keep the table integrity qua SQL statements but leverage more or better hardware. Splitting the database, if I understand your meaning, requires look ups against multiple distinct tables and that is a quick way to grief.
Eric M
I am using the asp.net membership provider. The provider requirs guids?
Ie multiple distinct tables that collectively represent some single table in the original design. I will stop here as I fear the hair in question is getting too slender to split further.
Eric M
@luke101 - I apologize, I was wrong in my assumptions. Please disregard my comments.
Eric M
Don't mix Conceptual model, logical model and physical model. Your conceptual model must be as high as possible in normal form. Your physical model must represent the conceptual model but the implementation can be even in 1 normal form and still be a good representation.
Rodrigo
The pros and cons of surrogate keys have been debated endlessly elsewhere. That debate doesn't belong here.
Aheho
+4  A: 

As others have said you can use GUIDs/UUIDs in pretty much any modern DB. The algorithm for generating a GUID is pretty straitforward and you can be reasonably sure that you won't get dupes however there are some considerations.

+) Although GUIDs are generally representations of 128 Bit values the actual format used differs from implementation to implemenation - you may want to consider normalizing them by removing non-significant characters (usually dashes or spaces).

+) To absolutely ensure uniqueness you can also append a value to the guid. For example if you're worried about MS and Oracle guids colliding add "MS" to the former and "Or" to the latter - now even if the guids themselves do collide they keys won't.

As others have mentioned however there is a potentially severe price to pay here: your keys will be large (128 bits) and won't index very well (although this is somewhat dependent on the implementation).

The techique works very well for small databases (especially those where the entire dataset can fit in memory) but as DBs grow you'll definately have to accept a performance trade-off.

One thing you might consider is a hybrid approach. Without more information it's hard to really know what you're trying to do so these might not help:

1) Remember that primary keys don't have to be a single column - you can have a simple numeric key to identify your rows and another row, containing a single value, that identifies the database that hosts the data or created the key. Creating the primary key as aggregate of both columns allows indexing to index fewer complex values and should be significantly faster.

2) You can "fake it" by constructing the key as a concatenated field (as in the above idea to append a DB identifier to the key). So your key would be a simple number followed by some DB identifier (perhaps a guid for each DB).

Indexing such a value (since the values would still be sequential) should be much faster.

In both cases you'll have some manual work to do if you ever do split the DB(s) - you'll have to update some keys with a new DB ID, but this would be a one-time,infrequent event. In exchange you can tune your DB much better.

There are definately other ways to ensure data integrity across mutiple databases. Many enterprise DBMSs have tools built-in for clustering data across multiple servers or databases, some have special tools or design patterns that make it easier, etc.

In short I would say that guids are nice and simple and do what you want, but that you should only consider them if either a) the dataset is small or b) the DBMS has specific features to optimize their use as keys (for example sequential guids). If the datasets are going to be very large or if you're trying to limit DBMS-specific dependencies I would play around more with optimizing a "key + identifier" strategy.

Jim Davis
I am using asp.net membership provider which requires a guid. Is there any way around this. I want to use a free/open sorce database if possible. i will be using windows
Is there a link to documentation for this you might share?I can't see how a security provider (which I assume is some kind of remote service?) would force the schema of your database to change: it seems more likely that you would just have to store that value IN your DB.You can have columns whose values are required to be unique WITHOUT making those columns primary keys. I would suggest setting up your data however best works for your data and storing the information for this provider as related information rather than foundational information.I could be missing something however.
Jim Davis
A: 

You can implement your own membership provider based on whatever database schema you choose to design. It's nowhere near as tricky as it may look at first.

google "roll your own membership provider" for plenty of pointers.

DanB
Do I need to use a guid if I roll my own provider?
The idea of the provider model is you can swap out the asp.net sql membership provider (and it's associated datastore) with one of your own choosing, maybe a database with your own design where the PK is an int, string, username field, whatever... Maybe you don't even bother with a database and store the data in xml files, or a spreadsheet if you really want to.
DanB
A: 

In my theoretical little world, you'd be able to do this with SQLite. You'd generate the Guid from .Net and write it to the SQLite database as a string. You could also index that field.

You do loose some of the index benefits because it'd be stored as a string but it should be fully backwards compatible so that you could import/export to/from SQL Server.

Frank V
I am not sure that sqllite can handle a database over 100K rows or can it?
I see no mention for a requirement of more than 100k rows but I believe it can. It may not be as efficient as client/server RDBS though...
Frank V
A: 

From looking through the comments it looks like you are trying to use a different database to MS SQL with the ASP.net membership provider - as others have mentioned you could roll your own provider to use a different DB however a quick Google search turned up a few ready made options:

MySQL Provider

MySQL Provider 2

SqlLite Provider

Hope these help

Macros
A: 

If you are using other MS technologies already you should consider Sql Server Express. http://www.microsoft.com/express/sql/default.aspx It is a real implementation of MS Sql Server and it is free. It does have significant limitations as you might imagine, but if your product can fit inside those you get the support, developer community and stability of Sql Server and a clear upgrade path if you need to grow.

Patrick Taylor