views:

93

answers:

2

i want to capture only the first match through the expression

<p>.*?</p>

i have tried <p>.*?</p>{1} but it is not working it returns all the p tags which are in the html document, please help

+4  A: 

It looks like you are using a method which returns every match in the string given a regex, that being the case you need to anchor the regex to the beggining of the string so it doesn't return every match, but only the first one:

^.*?<p>.*?</p>

Use parentheses to capture what you want to capture.

PS: Here goes the standard 'avoid using regex to parse HTML, use a proper HTML parser' advice. This simple regex will fail for nested <p> sections (which I don't recall if are valid in HTML, but still you can probably get them even if they aren't).

Vinko Vrsalovic
i tried it, its not working, thanks anyway
shabby
Add more context to the question then, what language are you using, what's the code you are trying, what's your input data...
Vinko Vrsalovic
I can only echo Vinko's warning, but it may be possible to fine-tune the regex. Which language are you using?
pavium
c#, im using regex in c#
shabby
Then you'll need someone with C# knowledge (not me)
pavium
Use what Lukáš Lalinský said, you are probably using Matches() (which returns everything) instead of Match() (which returns first match)
Vinko Vrsalovic
gr8 vinko...ur right i tried match() and its exactly what i was looking for...
shabby
+2  A: 

The Regex.Match method does this by default, and the regular expression is correct.

Regex regex = new Regex("<p>(.*?)</p>");
Match match = regex.Match("<p>1</p><p>2</p>");
Console.WriteLine("{0}", match.Value);

Running this program will print 1.

Lukáš Lalinský