ansaurus

Question

Extract a certain div in HTML using Java+RegEx

Answer 1

+4 A:

Just use find() in a while loop:

while (matcher.find()) {
    System.out.println("Group found:\n"+matcher.group(1));
}

It's the matches you need to iterate through, not the capture groups.

Alan Moore 2009-09-08 12:26:20

Answer 2

+4 A:

Are you sure that you do not want to use an xml parser? Regular expressions are really not suitable for non-regular languages like xml.

soulmerge 2009-09-08 12:38:51

That would only work if the document was XHTML.

JG 2009-09-08 12:54:33

There are also plenty of HTML parsers: http://stackoverflow.com/search?q=java+html+parser

Adam Paynter 2009-09-08 12:55:20

Answer 3

+1 A:

I would strongly recommend against using regexps for all but the simplest cases, since HTML is not regular and there are numerous edge cases to trip up your expressions (see numerous answers passim).

Take a look at JTidy, which will parse the HTML and present a DOM interface for you to interrogate.

Brian Agnew 2009-09-08 13:46:10

Answer 4

A:

how can I praser nested div tags using thing code i can parser html page and extract single div tag content but when page content nested div tag. please con any one tell me how to sol this problem.

shyam 2009-10-21 13:40:54

ansaurus

tags:

views:

answers:

Extract a certain div in HTML using Java+RegEx

related questions