tags:

views:

434

answers:

6
+2  Q: 

Regex replace help

Using the .NET framework, I'm trying to replace double slash characters in a string with a single slash, but it seems to be removing an extra character and I don't know why.

I have a string:

http://localhost:4170/RCRSelfRegistration//Default.aspx

My regex is:

[^(://|:\\\\)](\\\\|//|\\/|/\\)

And the return value is:

http://localhost:4170/RCRSelfRegistratio/Default.aspx

You can see that the n in RCRSelfRegistration has been removed. I am not sure why.

/// <summary>
/// Match on double slashes (//, \\, /\, \/) but do not match :// or :\\
/// </summary>
private const string strMATCH = @"[^(://|:\\\\)](\\\\|//|\\/|/\\)";

/// <summary>
/// Replace double slashes with single slash
/// </summary>
/// <param name="strUrl"></param>
/// <returns></returns>
public static string GetUrl(string strUrl)
{
    string strNewUrl
    System.Text.RegularExpressions.Regex rxReplace =
      new System.Text.RegularExpressions.Regex(strMATCH);

    strNewUrl = rxReplace.Replace(strUrl, "/");

    return strNewUrl;
}
+5  A: 

[^(://|:\\\\)] doesn't work the way you think it does.

[] is a character range - it matches a single character that is contained in the range.

[^:] will match any character other than a colon. This might be closer to what you want.

What you probably really want is a zero-width lookbehind assertion: (?<!:)

Douglas Leeder
+1  A: 

The negation part [^(://|:\\)] of your regex matches the n and thus removes it.

Gavin Miller
+4  A: 

The first part of your regex "[^(://|:\\)]" matches any character which is not "(:/|\" (as tomalak points out, the negset matches all the characters within it, with no futher processing logic), which includes the "n" immediately before "//default.aspx" - it's not a zero-width assertion.

What you probably want to do is change that part of the pattern to a zero-width lookbehind to make sure the slash character is not preceded by a colon.

annakata
I see what you are saying. I've simplified the string to "[^:](\\\\|//|\\/|/\\)" but can you tell me the syntax for a zero width lookbehind?
Jeremy
It's all over the place now. =)
Instantsoup
in your case - "(?<!:)" the "?<!" part meaning negative lookbehind
annakata
In fact, "[^(://|:\\)]" matches anything but these characters: "(|:/\" - it is a character class, not an alternation, even if it looks like one.
Tomalak
That did it! Thanks.
Jeremy
yeah quite right tomalak, edited
annakata
The possibly minimal regex to solve the given problem would be: "(?<!:)[/\\]{2}", replaced by "/" globally.
Tomalak
@Tomalak, that was slick, good catch.
Jeremy
+1  A: 

Have you tried using The replace method of string. It's not as elegant as regex replace but so long as you aren't doing it on huge strings hundreds of times in a loop it should serve your purpose:

string myString = oldString.Replace(@"\\", @"\").Replace("//", "/");

Otherwise you could spend aged fidlign with Regex.

Omar Kooheji
+2  A: 

What you need is a negative look behind group like this:

(?<!:)(\\\\|//|\\/|/\\)
Martin Brown
A: 

I think you just need a simple string replace with a loop. Replace all "//" with "/". You need a function that saves the search position and lets you walk through the string. Once you've reached the end of the string do it again, until you don't make any replacements on a pass.

eg:

///a//a/a////

pass 1

//a/a/a//

pass 2

/a/a/a/

justinhj
this would be brutally slow, and this is the kind of task regex is meant for
annakata
I'll shut up now, I read the question properly. Need more coffee.
justinhj