tags:

views:

208

answers:

7

I'm pretty new to regular expressions, but I'm sure a regex would give me something much more elegant than anything I could do with string methods.

I've got a hyperlink in the form of:

<a href="http://server.com/default.aspx?abc=123"&gt;hello&lt;/a&gt;

I want to yank out just the querystring portion. Also, what's a good reference for .net regular expressions (sheepish grin)? I find the MSDN reference very hard to follow.

A: 

here's the best regex reference out there

roman m
+1  A: 

For regex development, I recommend Expresso. As for the regex itself, search the ? and match until the next ".

Scoregraphic
+1  A: 
 /<a\s+href="[^?]+\?(.*)">/

or even this should work:

/\?(.+)"/

Edit: watchout for greediness.. For laziness (in case there are other attributes), use this.

/\?(.+?)"/

Thanks @Guffa

Amarghosh
Make it /\?(.+?)"/ so that it stops at the first quoation mark, in case there are more attributes after the href.
Guffa
Oh.. the greedy regex. Good point. thanks Guffa.
Amarghosh
I wud further modify it to /\?([^\s]*?)"/ - because http://server.com/default.aspx? (empty query string) is still a valid URL
Amarghosh
/href=".*\?([^\s]*?)"/ -- modified to accommodate ? in previous attributes as in '<a title="click?sure" href="server.com/default.aspx?asd=e3" id="asd">hello</a>'
Amarghosh
Better use `/<a\s+href="[^"?]*\?([^"]*)">/`.
Gumbo
@Gumbo wudn't that fail if href is not the first attribute?
Amarghosh
A: 

Perhaps something simple?

/\?([^\"]+)\"/

It'd match abc=123.

Tordek
Don't u need a + after the character class [] ?
Amarghosh
+2  A: 

the following code will extract the query string

string html = "<a href=\"http://server.com/default.aspx?abc=123\"&gt;hello&lt;/a&gt;";
Match m = Regex.Match(html, "<a[^>]+href=\".*?\\?(.*?)\">");
string querystring = m.Groups[1].ToString();

regex explained:

take only strings starting with <a href="
between the a and href there can be other attributes, spaces, it ignores them
make a group of the the url, from the first question mark to the ending quotes - this is your query string
Am
A: 

This regex should work

(?<=href="[^"]+\?)[^"]+

Here it is with test cases in Regex Hero. (You can also use this tool to generate .NET code for you.)

And as for a reference, I'm working on a clean & concise .NET regex reference. It's not quite done but it's pretty close. You can also click a regular expression to see an example slide down with a neat jQuery animation, but I digress.

And then there's a more detailed reference site at http://www.regular-expressions.info/ with quick references as well as a narrative style to explain everything.

Steve Wortham
A: 

Instead of a regular expression, couldn't you just use the Uri class, specifically the Uri.Query property?

Example:

Uri uri = new Uri("http://server.com/default.aspx?abc=123");
Console.WriteLine(uri.Query);

Prints:

?abc=123

hmemcpy
Good suggestion. This should be doable too. But it sounds like he still needs to parse the HTML file to get the link to begin with.
Steve Wortham