tags:

views:

1063

answers:

3

About to work through this one, but thought someone may have already had to tackle it, so...

I'm looking for an elegant (and isapi rewrite compatible) regular expression to look for three known parameter/value pairs in a querystring, regardless of order, and also extract all other parameters while stripping out those three.

abc=123 def=456 and ghi=789 are all known, fixed strings. They may appear in any order in the querystring, and may or may not be the only parameters, may or may not be adjacent. It should be smart and not match a**aabc=123** or abc=1234 (so each searched parameter should be bracketed by &, ?, #, or end of string). The output I want is a new query string with the remaining params stripped out.

I'll probably be taking a stab at the logic in the morning, so bonus points if you can solve it before I try to then.

+1  A: 
s/(\?|\#|\&)(abc=123|def=456|ghi=789)(\&|\#|$)//g

This is approximate and untested, but presents a working (I think) concept. Basically, look for starting border, literal string, then end border, replacing each with null, globally, and using | to give alternate options for each.

jess
Stephan202
+2  A: 

I think regexes shouldn't be used for problems of this type. Just tokenize the string, and compare every parameter's name to what you are looking for.

Igor Oks
Sure, and that's what I in other places, but I'm looking for something I can plug into rewrite rules (isapi rewrite2 to be specific).
entropi
A: 

Here's what I've come up with:

RewriteRule ^/oldpage.htm\?(.*)(?<=\?|&)(?:abc=123&|def=456&|ghi=789&)(.*)(?<=&)(?:abc=123&|def=456&|ghi=789&)(.*)(?<=&)(?:(?:abc=123|def=456|ghi=789)(?:&|#|$))(.*) /newpage.htm?$1$2$3 [I,RP,L]

which I think works. the lookAhead/lookbehind qualifiers, (?<= and (?= , seem to be the key to allowing me to look for the encompassing & or ? without "consuming it" to mess up the next match.

One gotcha is that if the old page url only has the three params, I still end up with a trailing ? with no parameters on the redirected url, "/newpage.htm?". I'm currently planning to avoid that by using a RewriteCond to only look at urls with 4+ params before this fires, and have a simpler match regex for the ones with exactly three..so the full ruleset comes out to:

RewriteCond URL ^/oldpage.htm\?([^#]*=[^#]*&){3,}[^#]*=[^#]*.*

RewriteRule ^/oldpage.htm\?(.*)(?<=\?|&)(?:abc=123&|def=456&|ghi=789&)(.*)(?<=&)(?:abc=123&|def=456&|ghi=789&)(.*)(?<=&)(?:(?:abc=123|def=456|ghi=789)(?:&|#|$))(.*) /newpage.htm?$1$2$3 [I,RP,L]

RewriteRule ^/oldpage.htm\?(?:abc=123|def=456|ghi=789)&(?:abc=123|def=456|ghi=789)&(?:abc=123|def=456|ghi=789)(.*) /newpage.htm$1 [I,RP,L]

(the $1 at the end is for #additions to the url...do I really need it?) The other issue is I suppose a url of /oldpage.htm?abc=123&abc=123&abc=123 would trigger this, but I don't see any easy way around that, and am not too worried about it..

Can anyone think of a better way to approach this, or see any other issues?

entropi