tags:

views:

74

answers:

2

My HTML looks like:

<td class="price" valign="top"><font color= "blue">&nbsp;&nbsp;$&nbsp;      5.93&nbsp;</font></td>

I tried:

String result = "";
        Pattern p =  Pattern.compile("\"blue\">&nbsp;&nbsp;$&nbsp;(.*)&nbsp;</font></td>");

        Matcher m = p.matcher(text);

        if(m.find())
            result = m.group(1).trim();

Doesn't seem to be matching.

Am I missing an escape character?

+1  A: 

May be you need to escape $ (I think, with two slashes)?

ZyX
+1  A: 

Unless escaped at the regex level, $ means match the end of line. And to get the single \ needed to escape the $ it needs to be escaped in the String literal; i.e. two \ characters. So ...

... Pattern.compile("\"blue\">&nbsp;&nbsp;\\$&nbsp;(.*)&nbsp;</font></td>");

But the folks who commented that you shouldn't use regexes to parse HTML are absolutely right!! Unless you want chronically fragile code, your code should use a strict or non-strict HTML parser.

Stephen C
I tried using HtmlParser, but got stuck so I am going the regex route!
Blankman
@Blankman - I think you should go back to HtmlParser. Or if the problem is that you have malformed HTML, switch to a non-strict parser like HtmlCleaner.
Stephen C
here is the htmlParser question: http://stackoverflow.com/questions/2660866/parsing-html-using-htmlparser thanks!
Blankman