ansaurus

Question

Java Regexp question

Answer 1

+2 A:

Parsing HTML with a regex is difficult; you might have better luck using an XML parser such as SAX.

Nick 2010-10-12 21:56:09

Answer 2

+1 A:

Don't try to use regexps, since HTML isn't regular, and the number of edge cases will make coding a regexp impossible. Instead you'll have a more reliable solution using an HTML parser such as JTidy.

Brian Agnew 2010-10-12 22:49:12

Answer 3

A:

If you insists to use regex, I make this regex for you:

Search for:

  <tr\b[^><]*>\s*<td\b[^><]*>\s*<a\b[^><]*>\s*<b>\s*(WNVZ)\s*<\/b>\s*<\/a>\s*-\s*(\w+)<\/td>\s*<td\b[^><]*>\s*(Norfolk)\s*<\/td>\s*<td\b[^><]*>\s*(Virginia)\s*</td>\s*<td\b[^><]*>\s*<img\b[^><]*>\s*</td>\s*<td\b[^><]*>\s*<a\b[^><]*href\s*=\s*["']([^"'><]+)["'][^><]*>[^><]*<\/a>\s*<\/td>\s*<td\b[^><]*>([^><]*)</td>

Replace with:

  $1 - $2#$3#$4#$5#$6

Vantomex 2010-10-13 04:24:56

ansaurus

tags:

views:

answers:

Java Regexp question

related questions