views:

140

answers:

6

What is the best way to clean a URL? I am looking for a URL like this

what_is_the_best_headache_medication

My current code

public string CleanURL(string str)
{
    str = str.Replace("!", "");
    str = str.Replace("@", "");
    str = str.Replace("#", "");
    str = str.Replace("$", "");
    str = str.Replace("%", "");
    str = str.Replace("^", "");
    str = str.Replace("&", "");
    str = str.Replace("*", "");
    str = str.Replace("(", "");
    str = str.Replace(")", "");
    str = str.Replace("-", "");
    str = str.Replace("_", "");
    str = str.Replace("+", "");
    str = str.Replace("=", "");
    str = str.Replace("{", "");
    str = str.Replace("[", "");
    str = str.Replace("]", "");
    str = str.Replace("}", "");
    str = str.Replace("|", "");
    str = str.Replace(@"\", "");
    str = str.Replace(":", "");
    str = str.Replace(";", "");
    str = str.Replace(@"\", "");
    str = str.Replace("'", "");
    str = str.Replace("<", "");
    str = str.Replace(">", "");
    str = str.Replace(",", "");
    str = str.Replace(".", "");
    str = str.Replace("`", "");
    str = str.Replace("~", "");
    str = str.Replace("/", "");
    str = str.Replace("?", "");
    str = str.Replace("  ", " ");
    str = str.Replace("   ", " ");
    str = str.Replace("    ", " ");
    str = str.Replace("     ", " ");
    str = str.Replace("      ", " ");
    str = str.Replace("       ", " ");
    str = str.Replace("        ", " ");
    str = str.Replace("         ", " ");
    str = str.Replace("          ", " ");
    str = str.Replace("           ", " ");
    str = str.Replace("            ", " ");
    str = str.Replace("             ", " ");
    str = str.Replace("              ", " ");
    str = str.Replace(" ", "_");
    return str;
}
A: 
  1. How do you define "friendly" URL - I'm assuming you mean to remove _'s etc.
  2. I'd look into a regular expression here.

If you want to persist with the method above, I would suggest moving to StringBuilder over a string. This is because each of your replace operations is creating a new string.

Martin Clarke
+2  A: 

You should consider using a regular expression instead. It's much more efficient than what you're trying to do above.

More on Regular Expressions here.

thinkzig
A: 

I can tighten up one piece of that:

while (str.IndexOf("  ") > 0)
    str = str.Replace("  ", " ");

...instead of your infinite number of " " replacements. But you almost certainly want a regular expression instead.

dnord
+2  A: 

Regular expressions for sure:

public string CleanURL(string str)
{
    str = Regex.Replace(str, "[^a-zA-Z0-9 ]", "");
    str = Regex.Replace(str, " +", "_");
    return str;
}

(Not actually tested, off the top of my head.)

Let me explain:

The first line removes everything that's not an alphanumeric character (upper or lowercase) or a space . The second line replaces any sequence of spaces (1 or more, sequentially) with a single underscore.

CaptainKeytar
Your first regex eats the spaces.
Thomas G. Mayfield
Fixed, thanks :)
CaptainKeytar
Cool. This is looks like what I have, except I prefer to replace spaces with hyphens rather than underscores. For SEO I think there is no difference.
James Lawruk
The accepted answer above is even more elaborate than what I have here and includes ample explanation.
CaptainKeytar
A: 

Or, a bit more verbose, but this only allows alphanumeric and spaces (which are replaced by '-')

string Cleaned = String.Empty;
foreach (char c in Dirty)
    if (((c >= 'a') && (c <= 'z')) ||
         (c >= 'A') && (c <= 'Z') ||
         (c >= '0') && (c <= '9') ||
         (c == ' '))
           Cleaned += c;
Cleaned = Cleaned.Replace(" ", "-");
Peter Phillips