tags:

views:

96

answers:

1

I have strings like this:

http://localhost:2055/web-site-2009/paginas/noticias/**IGP-M recua 0,36% em agosto, aponta FGV**-46.aspx

I'd like to remove all characters that could cause trouble on a URL (like ?, |, &, etc.) and the hyphen(-) on the bold part of the string. It's important that I keep the hyphen next to the 46.aspx.

What is the regex for that?

+10  A: 

Another approach would just be to URL Encode the string. If you need to use a RegEx for some other reason, I think this would get the characters you're asking about:

Regex.Replace(stringToCleanUp, "[^a-zA-Z0-9/;\-%:]", string.Empty);

Regex explanation:

  • Don't match this list of characters - [] means list, ^ means negation
  • List of characters: a-z (all characters between a and z lower case)
  • List of characters: A-Z (all characters between a and z upper case)
  • All numbers: 0-9 (all numbers)
  • After that, I've included a list of characters to allow: / ; - (have to escape it with \ since - is a reserved character) % :

You can add or remove from that final list - anything in this list will be ALLOWED in your final URL since it will not be replaced.

I recommend using an interactive RegEx tool if you need to tweak this, like RegExr.

Jon Galloway
Very nice tool you've pointed there. But the problem is, I don't understand Regex enough to tweak it the way I want it. Thank you
EduardoMello