Should I use strings as primary keys for a Web site that's big on SEO?

+4 A:

The URL makes a bad primary key for a few reasons - one being that every time you insert a new record, your pages will fragment because it has to physically reorganize the entire table and stick the new record where it belongs alphabetically. Index the URL column and you'll get all the lookup performance you need. Rebuilding an index every time you insert is preferable to what is in some ways rebuilding the table.

Another (as Josh pointed out) is that URLs are highly similar, so comparisons will be extremely slow (compared to an int that is) - for two items - 2007 and 2008 inside website.com/Type/Car/Country/Usa/Manufacturer/Ford/Year/ you have to cover a lot of ground - 56 characters (plus the three for 200) before you hit a unique.

Rex M 2009-08-11 02:28:45

Indeed, URLs are particularly problematic because they always cause a large minimum number of comparisons. The first few characters ("http://www") will almost always match.

JoshJordan 2009-08-11 02:30:51

+5 A:

No, string comparisons will be significantly slower than integer comparisons (which is the major performance impact considered when choosing a primary key). What you want is an index on strings, which increases performance for searching on string columns. Primary keys are not the only way to store precomputed data for searching a column.

JoshJordan 2009-08-11 02:29:38

+3 A:

Please, no.

If you're searching on the string, index that column. But honestly, it's much better to keep an integer primary key. You have no idea whether or not you'll need to change the string.

Moreover, strings are a lot slower to compare and join. Across multiple tables and a bunch of rows, this gets painful.

I've never, ever, ever seen an implementation of string PKs that I liked. Please, please use integers and index your columns. Life will be so much better for you.

Eric 2009-08-11 02:30:27

A:

You might want to consider using a hash of the string as an index/key, instead of the string itself. That will help avoid costly string comparisons in queries.
Could the data outlive the URL scheme? You might want a simple, integer, primary key, and a secondary key/index for the string. That way, if you have to retrace your steps with the string index, foreign key relationships still hold.
Have you thought through how to normalize the URL string to ensure correct mapping to records? e.g. is Year==year and Ford/2007==2007/Ford? Should such paths map to the same record?

Oren Trutner 2009-08-11 02:38:17

Hashing is very original solution. I'm afraid it would defeat the benefits of using a standalone integer. Since inserts might come in the middle of a table, and also because if the string changes, the integer the hashing function outputs would change.About the different routes all pointing to the same location, I have thought about it and it is indeed a pain. I can handle it in my app but I know Google penalizes sites where multiple addresses point to similar content.

Abu Wan Kenobi 2009-08-11 04:12:52

+1 A:

I'd like to point out that there is a difference between Primary Keys and Clustered Indexes. A PK can be maintained by a Non-Clustered Index, in which case the whole table is NOT rearranged when a new item is inserted.

I would recommend having a clustered index on a surrogate key, but if you want to have a unique index to use as your primary key, then that's fine. You can have foreign keys on any unique index, whether it's clustered or not, and whether it's the primary key or not.

There are fairly strong arguments on using Natural keys (like what you've described) for PKs. Just don't have a clustered index underneath it, since a CIX should almost never be on a string like that.

Rob

Rob Farley 2009-08-11 02:39:27

+1 A:

If you look at the SO URLs you'll notice that they do not actually store a string for the key, but the item (question) ID. The URL is the decorated with a human readable description, but that is totally irrelevant to the mapping. So opening the url http://stackoverflow.com/questions/1258198/a-very-different-url is the same as opening http://stackoverflow.com/questions/1258198/should-i-use-strings-as-primary-keys-for-a-web-site-thats-big-on-seo which is the same as opening http://stackoverflow.com/questions/1258198

So you see, what happens in the http request and url mapping is not what is stored in the database. The database uses int IDs, for obvious performance and storage reasons that were already addressed in previous answers.

Remus Rusanu 2009-08-11 02:45:32

I did not notice that and that's a very interesting solution I would keep in mind. Thank you.However, in this case I don't think an approach like this could be used. The URL I described in my question would return a set of records, not a single record. It would return all cars available manufactured by Ford that year. I'm sorry if I was unclear about this when phrasing my question.

Abu Wan Kenobi 2009-08-11 03:05:23

A:

I think its ok to have a string as PK here because the list of car manufacturers is small (200?). Even the list of carmodels is not that large (20000?). If you drill further into details then you wont be happy with string keys anymore. The list of rims/tyres per car is realy large.

The problem you will have is that you can not have the Name in the URL "Alfa Romeo". To have it human friendly I would change it to "Alfa-Romeo" and have that as PK and URL, but keep the original name as readable name. The Problem gets even worse with car models.

Especially the french are pretty bad at this "Renault Mégane". You need to look into "Unicode decomposition" for that

The think I dont like about a surrogate key for this is, that the names of the cars will not change that often, but your surrogate key will.

Malcolm Frexner 2009-08-11 09:24:04

A:

I disagree that an integer key is ALWAYS best. Sure, it's faster to look up by an integer. But if, in fact, the accesses that you have to do will always or almost always be a text value, then the fact that if you had the record id to look up by, it would be much faster, is pretty much irrelevant. On the order of, If only you knew the winning lottery number in advance, you could buy the ticket with that number and be rich. An unquestionably true statement, but not useful if you don't happen to have the winning lottery number in advance.

So the real question is: What do you need to store in YOUR database, and how do YOU have to access it? If 99% of your accesses will be, "take a URL and look up a record", then using the URL or something you derive from the URL is arguably a good idea.

My main argument against it is not that it is a string, but that it is a string that combines many different facts. Do you ever care about the pieces? Like, would you ever want to say, "Find me all the Fords"? If so, then having "Ford" stuck in the middle of the primary key is a very very bad idea. The only way to find all Fords would then be a full-file sequential search looking for the characters "Ford" in the middle of the key. Ugly. Much better to have a separate "make" field that you can search on.

I don't know your application, but I suspect that going from a URL to a record is NOT the only access. Isn't there some sort of browse or search function, where a user might say, "Find me all the convertibles that are less then 10 years old" or some such? If so, you really need to break data out into individual fields to be able to search.

Also, what data do you get when you retrieve this URL? Are you getting just one record and displaying it, or are there many records hanging off it? If there are related records, then if the URL is the primary key of the "starting" record, then all these related records will also have to hold that big URL as a foreign key. This could get messy. You should consider the overall structure of your database -- what tables you need and how they are related -- before you decide on indexes. (Hey, this sounds like a good place to add a shameless plug for my book, "A Sane Approach to Database Design", where I discuss design considerations and the order in which you should make design decisions.)

A detail, but potentially a big one: Do you really need the subdivision names in there along with the values? That is, instead of making the URL "website.com/Type/Car/Country/Usa/Manufacturer/Ford/Year/2007", couldn't it just be "website.com/Car/Usa/Ford/2007" ? That would eliminate a lot of redundant text. And by the way, if you're only dealing with one website, so that all your URLs begin "website.com", then surely you don't need to store this in every record. Oh, and is the order significant? Can someone give the URL "webiste.com/Year/2007/Type/Car/Manufacturer/Ford/Country/Usa" and get the same information? If so, things get much more complicated.

Are there things other than motor vehicles in there? Like, can there be "website.com/Type/Pet/Kind/Dog/Breed/Poodle" or some such? (Or leaving out the labels, "/Pet/Dog/Poodle".) If so, a generic scheme of using the URL looks a little better than a more specific scheme that tries to break it out into individual fields. Maybe.

Jay 2009-08-11 16:54:39

ansaurus

tags:

views:

answers:

Should I use strings as primary keys for a Web site that's big on SEO?

related questions