tags:

views:

72

answers:

4

I have strings in the form: "[user:fred][priority:3]Lorem ipsum dolor sit amet." where the area enclosed in square brackets is a tag (in the format [key:value]). I need to be able to remove a specific tag given it's key with the following extension method:

public static void RemoveTagWithKey(this string message, string tagKey) {
    if (message.ContainsTagWithKey(tagKey)) {
        var regex = new Regex(@"\[" + tagKey + @":[^\]]");
        message = regex.Replace(message , string.Empty);
    }
}
public static bool ContainsTagWithKey(this string message, string tagKey) {
    return message.Contains(string.Format("[{0}:", tagKey));
}

Only the tag with the specified key should be removed from the string. My regex doesn't work because it's daft. I need help to write it properly. Alternatively, an implementation without regex is welcome.

+5  A: 
Dale Halliwell
+1  A: 

Try this instead:

new Regex(@"\[" + tagKey + @":[^\]+]");

The only thing I changed was to add + to the [^\] pattern, meaning that you match one or more characters that are not a backslash.

Drew Noakes
+2  A: 

If you want to do this without a Regex it isn't difficult. You're already searching for a specific tag key, so you can just search for "[" + tagKey, then search from there for the closing "]", and remove everything between those offsets. Something like...

int posStart = message.IndexOf("[" + tagKey + ":");
if(posStart >= 0)
{
    int posEnd = message.IndexOf("]", posStart);
    if(posEnd > posStart)
    {
        message = message.Remove(posStart, posEnd - posStart);
    }
}

Is that better than a Regex solution? Since you're only looking for a specific key I think it probably is, on the grounds of simplicity. I love Regexes but they're not always the clearest answer.

Edit: Another reason the IndexOf() solution could be seen as better is that it means there is only one rule for finding the start of the tag, whereas the original code uses a Contains() which searches for something like '[tag:' and then uses a regex which uses a slightly different expression to do the substitution / removal. In theory you could have text which matches one criterion but not the other.

AAT
It looks like the original question is trying to cater for escaped `[` or `]` within the tag too.
Drew Noakes
I ended up with a variation of this because it turned out I needed more extension methods for other stuff (eg: Dictionary<string, string> GetTags(this message m), string GetTagValue(this message m, string tagKey), etc...). The refactoring meant that there was no need for regexes.
grenade
@Drew Noakes: I don't think that 'Regex(@"\[" + tagKey + @":[^\]]");' is doing quite what you thought (maybe not what grenade thought!). If you imagine that tagKey is say "Zippy" then the regex becomes '\[Zippy:[^\]]' which means "match Zippy followed by a colon followed by one character which isn't a "]".' (Or as you noted in your answer you can put a + on the end to make it match one or more characters which aren't a "]".) But it doesn't do anything clever about avoiding escaped brackets as far as I can work out (using my mental regex parser...).
AAT
+1  A: 

I think this is the regex you're looking for:

string regex = @"\[" + tag + @":[^\]+]\]";

Also, you don't need to do a separate check to see if there are tags of that type. Just do a regex replace; if there are no matches, the original string is returned.

public static string RemoveTagWithKey(string message, string tagKey) {
    string regex = @"\[" + tag + @":[^\]+]\]";
    return Regex.Replace(message, regex, string.Empty);
}

You seem to be writing an extension method, but I wrote this as a static utility method to keep things simple.

Alan Moore