ansaurus

Question

regexp should match site category URL, but matches /

Answer 1

+1 A:

That is because your regex in the second group matches reluctantly (a.k.a. ungreedy matching). More info on this see: http://www.regular-expressions.info/repeat.html escpecially paragraph Laziness Instead of Greediness.

That's why it doesn't work as you expected it to.

Now, as to fixing your problem: use a proper parser for this or some existing tool to get attributes from html (jQuery can do this quite nicely, I heard). Don't try to do this with regex: you may get it working for this case, but next week you'll be here again because something else broke.

Best of luck!

Bart Kiers 2009-11-30 14:54:21

Answer 2

+1 A:

I'm definitely not one of those "omg, you said HTML and regex in the same sentence, you must die" -types, but this is clearly not a situation where regex is the best tool for the job. (Nor is it even a good tool, nor a functioning tool here).

Parse it with an XML/HTML parser, and save yourself a lot of hassle and abuse from your colleagues.

nickf 2009-11-30 14:58:33

Exactly my thoughts. I am not one to jump on the bandwagon of the *Parser Police* (the name is not my invention!), but this is definitely not suited for a regex.

Bart Kiers 2009-11-30 15:07:31

Answer 3

+1 A:

The problem is this...

(.*?)

Why are you placing a question mark here? With that present, you're only getting the '/' in your search, because ? ensures zero or one return. If you replace it with the following...

([^"]+)

Which looks for all values that aren't a double quotation you should be getting everything, the stackoverflow href, and the other href you mentioned.

I'm not entirely sure why you're doing this. It's possible that you're using regular expressions when you don't have to. What is the purpose of this regular expression, it seems like overkill.

MillsJROSS 2009-11-30 15:04:55

I think you meant (.+?). Now, it works without html parser. Thanks!

Delirium tremens 2009-11-30 15:42:57

ansaurus

tags:

views:

answers:

regexp should match site category URL, but matches /

related questions