ansaurus

Question

Regex to parse querystring values to named groups

Answer 1

A:

What language, JavaScript?

2008-11-23 21:37:25

I need it in Regex so any will do... .NET would be best (C#, VB.NET, ...)

2008-11-23 21:40:05

Answer 2

+2 A:

Why use regex to split it out?

You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =

John Nilsson 2008-11-23 22:13:39

Answer 3

A:

Using regex to first find the key value pairs and then doing splits... doesn't seem right.

I'm interested in a complete regex solution.

Anyone?

2008-11-23 23:14:20

Answer 4

A:

Check this out

\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>

You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]

2008-11-23 23:17:33

Answer 5

A:

Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):

/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/

(By the way, the XHTML is malformed; & should be & in the attributes.)

strager 2008-11-23 23:23:49

Answer 6

+1 A:

You didn't specify what language you are working in, but this should do the trick in C#:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string subjectString = @"... some text ...
                <a href=""file.aspx?userId=123&section=2"">link</a> ... some text ...
... some text ...
<a href=""file.aspx?section=5&user=678"">link</a> ... some text ...
... some text ...";
            Regex regexObj = 
               new Regex(@"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
            Match matchResults = regexObj.Match(subjectString);
            while (matchResults.Success)
            {
                string user = matchResults.Groups["user"].Value;
                string section = matchResults.Groups["section"].Value;
                Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
                matchResults = matchResults.NextMatch();
            }
            Console.ReadKey();
        }
    }
}

Mike Moore 2008-11-24 00:00:23

Answer 7

A:

Another approach is to put the capturing groups inside lookaheads:

Regex r = new Regex(@"<a href=""file\.aspx\?" +
                    @"(?=[^""<>]*?user=(?<user>\w+))" +
                    @"(?=[^""<>]*?section=(?<section>\w+))";

If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.

By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.

Alan Moore 2008-11-24 05:34:16

Answer 8

A:

You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex

userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)

This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.

Jan Goyvaerts 2008-11-24 09:36:19

Answer 9

A:

a simple python implementation overcoming the ordering problem

In [2]: x = re.compile('(?:(userId|section)=(\d+))+')

In [3]: t = 'href="file.aspx?section=2&userId=123"'

In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]

In [5]: t = 'href="file.aspx?userId=123&section=2"'

In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]

hinoglu 2008-11-24 14:18:03

ansaurus

tags:

views:

answers:

Regex to parse querystring values to named groups

related questions