views:

73

answers:

3

Within my application UI want to avoid id numbers within the urls if possible so the best way to do this would be to create a a unique version of the title that's valid for url schemas.

SO do a something the same but as the you allow duplicate questions they have the id within the URI!

http://stackoverflow.com/questions/3637971/how-to-edit-onchange-attribute-in-a-select-tag-using-jquery

Wordpress have implemented such features as well

my question is:

What's the best way to accomplish this, sticking to the URI RFC as well as keeping search engines happy.

+1  A: 

To keep search engine happy

You should use this in your head :

<link rel="canonical" href="http://yoursite.com/page/uniqueTitle/"/&gt;

This will tell search engine that all page that have that specific canonical name are the same.

For example, this page has the following line :

<link rel="canonical" href="http://stackoverflow.com/questions/3637990/foolproof-unique-title-for-urls"&gt;

If you change the title, that value will stay the same. This is how search engine really know it's all the same page.

How to generate

As for how those URL are generated, you should stick to the lower case alphanumeric characters ([a-z0-9]) and replace space with "-".

HoLyVieR
thats for the tip, but my question is more related the actual unique url itself, what chars are valid, what's the best method to keep urls unique if theres a dupe etc.
RobertPitt
You shouldn't worry about having possibly duplicate URL, as long as you have the rel canonical link it won't be a problem.
HoLyVieR
A: 

"Friendly URLs — Possibly all of what makes a good URL structure" is a nice article about that topic, and it includes a short example implementation in Python.

To make the URLs really unique without having to use a numeric ID everywhere, I'd try to generate my new URL, see if it already exists (shouldn't occur very often), and only if it does, append a short sequence number at the end.

Chris Lercher
+3  A: 

The Drupal Path/Pathauto modules do this, so I'd check that implementation. For a quick hit, if there are titles that reduce to duplicates:

CaseySoftware is awesome
CaseySoftware is awesome!

They would become:

caseysoftware-is-awesome
caseysoftware-is-awesome-0

You will definitely need to scrub out punctuation, but you may want to do the same to common articles like "a, the, is".

CaseySoftware
+1 I'd also add you can see which characters are valid by looking at RFC 3986. Follow the definition of `segment`.
Artefacto