tags:

views:

238

answers:

3

I have a html file with one <pre>...</pre> tag. What regex is necessary to match all content within the pre's?

QString pattern = "<pre>(.*)</pre>";
QRegExp rx(pattern);
rx.setCaseSensitivity(cs);

int pos = 0;
QStringList list;
while ((pos = rx.indexIn(clipBoardData, pos)) != -1) {
  list << rx.cap(1);
  pos += rx.matchedLength();
}

list.count() is always 0

+1  A: 

DO NOT PARSE HTML USING Regular Expressions!

Instead, use a real HTML parser, such as this one

SLaks
+3  A: 

HTML is not a regular language, you do not use regular expressions to parse it.

Instead, use QXmlSimpleReader to load the XML, then QXmlQuery to find the PRE node and then extract its contents.

Juliano
<html> <body> <pre>some text to excerpt</pre> </body> </html>for such a simple file? maybe its simpler to substring the content...
tfl
A: 

i did it using substrings:

int begin = clipBoardData.indexOf("<pre");
int end = clipBoardData.indexOf("</body>");

QString result = data.mid(begin, end-begin);

The result includes the <pre's> but i found out thats even better ;)

tfl