tags:

views:

500

answers:

5

I have a HTML page with

<a class="development" href="[variable content]">X</a>

The [variable content] is different in each place, the rest is the same.
What regexp will catch all of those links? (Although I am not writing it here, I did try...)

+3  A: 

Try this regular expression:

<a class="development" href="[^"]*">X</a>
Gumbo
single-quoted attributes are also valid html. and, depending on the source, you can even have invalid html, by which point you're screwed.
kch
+1  A: 

What about the non-greedy version:

<a class="development" href="(.*?)">X</a>
vrish88
You're doing a capture that likely won't be used. Other than that, I dont't see much difference in using this or Gumbo's version.
kch
+1  A: 

Regex is generally a bad solution for HTML parsing, a topic which gets discussed every time a question like this is asked. For example, the element could wrap onto another line, either as

<a class="development" 
  href="[variable content]">X</a>

or

<a class="development" href="[variable content]">X
</a>

What are you trying to achieve?

Using JQuery you could disable the links with:

$("a.development").onclick = function() { return false; }

or

$("a.development").attr("href", "#");
CoverosGene
this solution would assume that Itay Moav is using the jquery library and that it's a client side parsing that he wishes to acheive
vrish88
@vrish88: Correct. Thus the question "What are you trying to achieve?" and the comment "Using JQuery you could..."
CoverosGene
+1  A: 

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Chas. Owens
+1  A: 

Here's a version that'll allow all sorts of evil to be put in the href attribute.

/<a class="development" href=(?:"[^"]*"|'[^']*'|[^\s<>]+)>.*?<\/a>/m

I'm also assuming X is going to be variable, so I added a non-greedy match there to handle it, and the /m means . matches line-breaks too.

kch