ansaurus

Question

Little Regular Expression (against HTML) help

Answer 1

+1 A:

I assume that you have special knowedge about the application which generated the HTML you are venturing to parse, otherwise you would not be even considering regular expressions for the task. (Part of that is also, I assume, knowledge that  tags always appear after a newline and that  closing tags always appear before a newline.)

The above having been said, you cannot easily or efficiently achieve what you are trying to achieve with regular expressions alone (you would have to use complex nested look-behind and look-ahead assertions to validate that your ... occurrence is not inside a [code]...[/code] block, and non-fixed-length look-behind assertions are particularly limited, and IIRC plain buggy prior to JDK1.6.)

You should first iterate over the input sequence, breaking it down into code and non-code chunks, and transferring the chunks into the output sequence either unchanged (in the case of code chunks) or with ...-substitution applied via regex or simple string replacement (in the case of non-code chunks.)

Up to you if you will have to (or how you want to) deal with nested or mismatched code chunks.

Cheers, V.

vladr 2010-04-17 20:25:57

Answer 2

A:

The syntax for negative lookahead is (?!).

(?![code.*?]([^\[]|\[\/[^c]|\[\/c[^o]|\[\/co[^d]|\[\/cod[^e]|\[\/code[^\]])*).*?

SHiNKiROU 2010-04-17 20:27:07

This only leaves the first line of my code block with the tags, everything else hasn't it, and only works for one code block on the page

Marcos Placona 2010-04-17 20:33:09

The code you posted doesn't seem to do anything now

Marcos Placona 2010-04-17 20:43:36

Answer 3

+1 A:

I know I shouldn't be using regular expressions to parse HTML at all. I'm fully aware of that, but still, for this specific case, I'd like to use regex.

Can you explain this a bit more?

Will 2010-04-17 23:38:57

"I know I shouldn't pound nails with a screwdriver, but this time, I'd like to use a screwdriver." Just Say No!

TrueWill 2010-04-17 23:49:14

ansaurus

tags:

views:

answers:

Little Regular Expression (against HTML) help

related questions