views:

284

answers:

5

I have searched and searched about Regex but I can't seem to find something that will allow me to do this.

I need to get the 12.32, 2,300, 4.644 M and 12,444.12 from the following strings in C#:

<td class="c-ob-j1a" property="c-value">12.32</td>
<td class="c-ob-j1a" property="c-value">2,300</td>
<td class="c-ob-j1a" property="c-value">4.644 M</td>
<td class="c-ob-j1a" property="c-value">12,444.12 M</td>

I got up to this:

MatchCollection valueCollection = Regex.Matches(html, @"<td class=""c-ob-j1a"" property=""c-value"">(?<Value>P{</td>})</td>");

Thanks!

A: 
"value">(.*?)<\/td>

should do it for you. The value you require would be held in the capturing group denoted by the parentheses

ennuikiller
Thanks alot! For some reason that is not in RegEx tutorials
A: 

Something like this should work:

/<td[.]*?>(.+)<\/td>/

Regarding your code sample, this would probably be more maintainable:

MatchCollection valueCollection = Regex.Matches(html, @"<td[^>]*?>(?<Value>.*?)</td>")

If your html consists of other td's which you don't want to extract data from, your original regex should be fine.

Sune Rievers
+1  A: 

You should not use regexp to parse HTML. See this post on howto parse html http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c or you could use HtmlAgilityPack http://www.codeplex.com/htmlagilitypack

but if you really want to use regex this should work.

<td[^>](.+?)<\/td>
Tjofras
what is better for parsing HTML?
An HTML parser. Duh.
Anon.
A: 

I'd probably start with a very strict match to avoid accidentally capturing other parts of the document:

    static void Main(string[] args)
    {
        string html = @"<td class=""c-ob-j1a"" property=""c-value"">12.32</td>
<td class=""c-ob-j1a"" property=""c-value"">2,300</td>
<td class=""c-ob-j1a"" property=""c-value"">4.644 M</td>
<td class=""c-ob-j1a"" property=""c-value"">12,444.12 M</td>";

        var matches = Regex.Matches(html, @"<td class=""c-ob-j1a"" property=""c-value"">([^<]*)</td>");
        foreach (Match match in matches)
            Console.WriteLine(match.Groups[1].Value);
    }

(And I would also like to take this opportunity to recommend the Html Agility Pack if you haven't tried it yet.)

Mark Byers
A: 

If all you need is to parse the td tag in the formats you presented you might get away with a regex.

In general parsing html with regex is not working. You can find many questions here on SO explaining why

mfeingold