views:

2582

answers:

5

How can I go about generating a Friendly URL in C#? Currently I simple replace spaces with an underscore, but how would I go about generating URL's like Stack Overflow?

For example how can I convert:

How do I generate a Friendly URL in C#?

Into

how-do-i-generate-a-friendly-url-in-C

A: 

all you have to do to replace " " to "-" is

a = a.Replace(' ', '-');
Armadillo
+7  A: 

Here's how we do it. Note that there are probably more edge conditions than you realize at first glance..

if (String.IsNullOrEmpty(title)) return "";

// remove entities
title = Regex.Replace(title, @"&\w+;", "");
// remove anything that is not letters, numbers, dash, or space
title = Regex.Replace(title, @"[^A-Za-z0-9\-\s]", "");
// remove any leading or trailing spaces left over
title = title.Trim();
// replace spaces with single dash
title = Regex.Replace(title, @"\s+", "-");
// if we end up with multiple dashes, collapse to single dash            
title = Regex.Replace(title, @"\-{2,}", "-");
// make it all lower case
title = title.ToLower();
// if it's too long, clip it
if (title.Length > 80)
    title = title.Substring(0, 79);
// remove trailing dash, if there is one
if (title.EndsWith("-"))
    title = title.Substring(0, title.Length - 1);
return title;
Jeff Atwood
This answer is outdated. Jeff updates his original answer with a new version of the code: http://stackoverflow.com/questions/25259/how-do-you-include-a-webpage-title-as-part-of-a-webpage-url/25486#25486
Tom Lokhorst
+3  A: 

This gets part of the way there (using a whitelist of valid characters):

new Regex("[^a-zA-Z-_]").Replace(s, "-")

It does, however, give you a string that ends with "--". So perhaps a second regex to trim those from the beginning/end of the string, and maybe replace any internal "--" to "-".

Matt Hamilton
A: 

Thanks for the answer Jeff :)

GateKiller
+10  A: 

There are several things that could be improved in Jeff's solution, though.

if (String.IsNullOrEmpty(title)) return "";

IMHO, not the place to test this. If the function gets passed an empty string, something went seriously wrong anyway. Throw an error or don't react at all.

// remove any leading or trailing spaces left over
… muuuch later:
// remove trailing dash, if there is one

Twice the work. Considering that each operation creates a whole new string, this is bad, even if performance is not an issue.

// replace spaces with single dash
title = Regex.Replace(title, @"\s+", "-");
// if we end up with multiple dashes, collapse to single dash            
title = Regex.Replace(title, @"\-{2,}", "-");

Again, basically twice the work: First, use regex to replace multiple spaces at once. Then, use regex again to replace multiple dashes at once. Two expressions to parse, two automata to construct in memory, iterate twice over the string, create two strings: All these operations can be collapsed to a single one.

Off the top of my head, without any testing whatsoever, this would be an equivalent solution:

// make it all lower case
title = title.ToLower();
// remove entities
title = Regex.Replace(title, @"&\w+;", "");
// remove anything that is not letters, numbers, dash, or space
title = Regex.Replace(title, @"[^a-z0-9\-\s]", "");
// replace spaces
title = title.Replace(' ', '-');
// collapse dashes
title = Regex.Replace(title, @"-{2,}", "-");
// trim excessive dashes at the beginning
title = title.TrimStart(new [] {'-'});
// if it's too long, clip it
if (title.Length > 80)
    title = title.Substring(0, 79);
// remove trailing dashes
title = title.TrimEnd(new [] {'-'});
return title;

Notice that this method uses string functions instead of regex functions and char functions instead of string functions whenever possible.

Konrad Rudolph