views:

235

answers:

4

Many URL rewriting utilities allow Regex matching. I need some URLs to be matched against a couple of main querystring parmeter values no matter what order they appear in. For example let's consider an URL having two key parameters ID= and Lang= in no specific order, and maybe some other non-key params are interspersed.

An Example URL to be matched with key params in any order:

Maybe with some interspersed non-key params:

Is there a good regex pattern to match against querystring param value in any order, or is it best to duplicate some rules, or in general should I look to other means?

Note: The main querystring values will also be captured using brackets i.e. ID=(3)&Lang=(500) and substituted into the destination URL, but that's not the focus of the question.

A: 

Regex matching depends highly on the sequential nature of a string. Position of the match is not important, but order definitely is.

This means you cannot write a regex pattern that matches its different parts in any arbitrary order. You can write a pattern that matches its parts in any pre-defined order, though - you would have to include every possible permutation in the pattern. This gets inconvenient very fast:

  • to match (a,b) you would need a,b|b,a
  • to match (a,b,c) you would need a,b,c|a,c,b|b,a,c|b,c,a|c,a,b|c,b,a
  • and so on

And this means you would best try to approach the problem sequentially, matching one parameter at a time. It depends on the capabilities of your rewriting engine how this would work.

Tomalak
Indeed that seems to be what I'm facing. I could start using OR `|` conditions but that is effectively inlining all sequential options and starts becoming a PITA. Hmmm... trying to stick in the config file and not go programmatic, but might have to consider more options and change my mind on this one.
John K
A: 

I would suggest parsing the query string into a dictionary and working from there, but if you want regex, you can use alternation+repetition to match in any order (without inlining all possible sequences). Python example:

>>> import re
>>> p = re.compile(r'(?:[?&](?:abc=([^&]*)|xyz=([^&]*)|[^&]*))+$')
>>> p.findall('x?abc=1&jjj=2&xyz=3')
[('1', '3')]
>>> p.findall('x?abc=1&xyz=3&jjj=2')
[('1', '3')]
>>> p.findall('x?xyz=3&abc=1&jjj=2')
[('1', '3')]
Max Shawabkeh
A: 

This is outside of the capabilities of (most flavours of) regex. You would indeed need to duplicate each rewrite rule for every possible order of parameters, which is practical for two and... less practical for ten.

Also, regexes wouldn't do the kind of parsing you'd need to handle all possible parameter inputs. For example:

http://www.example.com/SurveyController.aspx?ID=500&L%61ng=4

would normally be a valid synonym, and

http://www.example.com/SurveyController.aspx?Hello=3&ID=400&Lang=4&ID=500

might often be a synonym for ID 400 or 500 depending on the parser. The simple regex matches might be OK if you are only wanting to 301 a load of deprecated old-format address to the shiny new one, but not enough if they are to catch all possible inputs.

So for more complex cases like this, you'd be better off having a real SurveyController.aspx that looks at its parameters and redirects you where you need to go.

bobince
A: 

If the underlying regular expression implementation understands both named groups and zero-width look-aheads you may be able to make something work, using something like aspx\?(?=ID=(?<ID>\d+))(?=Lang=(?<Lang>\d+)) (this is untested speculation), but the result is likely to be both unmaintainable and likely under-performs even a naive implementation that uses multiple regexes to parse the string.

I might suggest that query strings are best parsed by a simple tokenizer or even just split operations may be the best things for it.

ig0774