views:

702

answers:

6

I'm looking for a regular expression to remove a single parameter from a query string, and I want to do it in a single regular expression if possible.

Say I want to remove the foo parameter. Right now I use this:

/&?foo\=[^&]+/

That works as long as foo is not the first parameter in the query string. If it is, then my new query string starts with an ampersand. (For example, "foo=123&bar=456" gives a result of "&bar=456".) Right now, I'm just checking after the regex if the query string starts with ampersand, and chopping it off if it does.

Example edge cases:

Input                    |  Output
-------------------------+-----------------
foo=123                  |  (empty string)
foo=123&bar=456          |  bar=456
bar=456&foo=123          |  bar=456
abc=789&foo=123&bar=456  |  abc=789&bar=456


Edit

OK as pointed out in comments there are there are way more edge cases than I originally considered. I got the following regex to work with all of them:

/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/

This is modified from Mark Byers's answer, which is why I'm accepting that one, but Roger Pate's input helped a lot too.

Here is the full suite of test cases I'm using, and a Perl script which tests them.

Input                    | Output
-------------------------+-------------------
foo                      | 
foo&bar=456              | bar=456
bar=456&foo              | bar=456
abc=789&foo&bar=456      | abc=789&bar=456
foo=                     | 
foo=&bar=456             | bar=456
bar=456&foo=             | bar=456
abc=789&foo=&bar=456     | abc=789&bar=456
foo=123                  | 
foo=123&bar=456          | bar=456
bar=456&foo=123          | bar=456
abc=789&foo=123&bar=456  | abc=789&bar=456
xfoo                     | xfoo
xfoo&bar=456             | xfoo&bar=456
bar=456&xfoo             | bar=456&xfoo
abc=789&xfoo&bar=456     | abc=789&xfoo&bar=456
xfoo=                    | xfoo=
xfoo=&bar=456            | xfoo=&bar=456
bar=456&xfoo=            | bar=456&xfoo=
abc=789&xfoo=&bar=456    | abc=789&xfoo=&bar=456
xfoo=123                 | xfoo=123
xfoo=123&bar=456         | xfoo=123&bar=456
bar=456&xfoo=123         | bar=456&xfoo=123
abc=789&xfoo=123&bar=456 | abc=789&xfoo=123&bar=456
foox                     | foox
foox&bar=456             | foox&bar=456
bar=456&foox             | bar=456&foox
abc=789&foox&bar=456     | abc=789&foox&bar=456
foox=                    | foox=
foox=&bar=456            | foox=&bar=456
bar=456&foox=            | bar=456&foox=
abc=789&foox=&bar=456    | abc=789&foox=&bar=456
foox=123                 | foox=123
foox=123&bar=456         | foox=123&bar=456
bar=456&foox=123         | bar=456&foox=123
abc=789&foox=123&bar=456 | abc=789&foox=123&bar=456

Test script (Perl)

@in = ('foo'     , 'foo&bar=456'     , 'bar=456&foo'     , 'abc=789&foo&bar=456'
      ,'foo='    , 'foo=&bar=456'    , 'bar=456&foo='    , 'abc=789&foo=&bar=456'
      ,'foo=123' , 'foo=123&bar=456' , 'bar=456&foo=123' , 'abc=789&foo=123&bar=456'
      ,'xfoo'    , 'xfoo&bar=456'    , 'bar=456&xfoo'    , 'abc=789&xfoo&bar=456'
      ,'xfoo='   , 'xfoo=&bar=456'   , 'bar=456&xfoo='   , 'abc=789&xfoo=&bar=456'
      ,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
      ,'foox'    , 'foox&bar=456'    , 'bar=456&foox'    , 'abc=789&foox&bar=456'
      ,'foox='   , 'foox=&bar=456'   , 'bar=456&foox='   , 'abc=789&foox=&bar=456'
      ,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
      );

@exp = (''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,'xfoo'    , 'xfoo&bar=456'    , 'bar=456&xfoo'    , 'abc=789&xfoo&bar=456'
       ,'xfoo='   , 'xfoo=&bar=456'   , 'bar=456&xfoo='   , 'abc=789&xfoo=&bar=456'
       ,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
       ,'foox'    , 'foox&bar=456'    , 'bar=456&foox'    , 'abc=789&foox&bar=456'
       ,'foox='   , 'foox=&bar=456'   , 'bar=456&foox='   , 'abc=789&foox=&bar=456'
       ,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
       );

print "Succ | Input                    | Output                   | Expected                \n";
print "-----+--------------------------+--------------------------+-------------------------\n";

for($i=0; $i <= $#in; $i++)
{
  $out = $in[$i];
  $out =~ s/_PUT_REGEX_HERE_//;

  $succ = ($out eq $exp[$i] ? 'PASS' : 'FAIL');
  #if($succ eq 'FAIL')
  #{
    printf("%s | %- 24s | %- 24s | %- 24s\n", $succ, $in[$i], $out, $exp[$i]);
  #}
}
+1  A: 

Having a query string that starts with & is harmless--why not leave it that way? In any case, I suggest that you search for the trailing ampersand and use \b to match the beginning of foo w/o taking in a previous character:

 /\bfoo\=[^&]+&?/
JSBangs
Using a trailing ampersand will give a problem with the third example.
catchmeifyoutry
Note that the trailing ampersand is optional in the regex that I gave.
JSBangs
Kip
+2  A: 
Roger Pate
having some problems with this one, but i'm working on it. yes, there is no \?, my string is only the query string
Kip
Kip
Yes, I know, that's why I said my approach fails. :)
Roger Pate
+1 for providing test code. Even though your solution didn't quite work, the test code is useful.
Mark Byers
+3  A: 

If you want to do this in just one regular expression, you could do this:

/&foo(=[^&]*)?|^foo(=[^&]*)?&?/

This is because you need to match either an ampersand before the foo=..., or one after, or neither, but not both.

To be honest, I think it's better the way you did it: removing the trailing ampersand in a separate step.

Mark Byers
Roger Pate
@Roger Pate: both is valid input, but you only want to match exactly one of them (because i'm replacing whatever is matched with empty string)
Kip
Try running this pattern against Roger's test cases.
Greg Bacon
Kip
gbacon: the only cases it failed on were those containing 'foo' without a value. I've updated the regex to handle this, and it passes all cases now.
Mark Byers
Kip
+1  A: 

It's a bit silly but I started trying to solve this with a regexp and wanted to finally get it working :)

$str[] = 'foo=123';
$str[] = 'foo=123&bar=456';
$str[] = 'bar=456&foo=123';
$str[] = 'abc=789&foo=123&bar=456';

foreach ($str as $string) {
 echo preg_replace('#(?:^|\b)(&?)foo=[^&]+(&?)#e', "'$1'=='&' && '$2'=='&' ? '&' : ''", $string), "\n";
}

the replace part is messed up because apparently it gets confused if the captured characters are '&'s

Also, it doesn't match afoo and the like.

kemp
A: 

How would one get this to work in asp.net?

The following is not working:

string pattern = "/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/";
return Regex.Replace(queryString, pattern, "");

I get "unrecognized escape sequence". If I escape the backslashes it compiles but fails to remove the parameter.

Thanks

Adeel
Kip
You may not need the leading and trailing `/` either.
Kip
A: 

Thanks. Yes it uses backslashes for escaping, and you're right, I don't need the /'s.

This seems to work, though it doesn't do it in one line as requested in the original question.

    public static string RemoveQueryStringParameter(string url, string keyToRemove)
    {
        //if first parameter, leave ?, take away trailing &
        string pattern = @"\?" + keyToRemove + "[^&]*&?"; 
        url = Regex.Replace(url, pattern, "?");
        //if subsequent parameter, take away leading &
        pattern = "&" + keyToRemove + "[^&]*"; 
        url =  Regex.Replace(url, pattern, "");
        return url;
    }
Adeel