ansaurus

Question

Multi-line regex with overlapping matches

Answer 1

+1 A:

Depending on deep nuances of your regex engine, you may be able to do this by embedding capturing parens in lookaheads, i.e. something like:

\.(\w+)(?=.*?{([^}]*)})

I'd expect figuring out the meaning of the match groups to be quite an exercise.

chaos 2009-04-17 21:16:48

match groups can be named, and that's exactly what I'm already doing. I can match and extract the class name and the class body...that's not an issue. My issue is that I was looking for a way to match multiple class names that share a common body. I probably won't be able to, so I'll just match the whole line and split at the commas.

Rich 2009-04-17 21:41:41

Right, what I'm saying is that using a pattern like the one above should do that for you, *if* the capture inside a lookahead works. What the lookahead is doing is allowing you to scan forward to the class body and (if the capture works) extract it, *without* moving the actual current position of the regex forward, so it can go on continuing to match class names.

chaos 2009-04-17 21:46:16

Your pattern didn't match exactly, but the idea of capturing inside the look ahead was what eventually worked. I didn't get it at first. +1

Rich 2009-04-20 15:53:16

Answer 2

+2 A:

You need a CSS parser, not a regex. You should probably read Is there a CSS Parser for C#.

Chas. Owens 2009-04-17 21:24:24

Answer 3

A:

This is not a good problem for regexes.

On the other hand, you only need a couple of passes to write a basic CSS parser, surely.

CSS syntax is just [some stuff], [open curly bracket], [some other stuff], [close curly bracket] after all.

You find those two chunks of stuff, you split the first one on commas and the second one on semicolons and you're pretty much done.

AmbroseChapel 2009-04-18 04:18:56

Answer 4

+2 A:

Here's a regex that works with your sample data:

@"([^,{}\s]+(?:\s+[^,{}\s]+)*)(?=[^{}]*(\{[^{}]+\}))"

The first part matches and captures a selector (td.class1) in group #1, then the lookahead skips over any remaining selectors and captures the associated style rules in group #2. The next match attempt starts where the lookahead started the previous time, so it matches the next selector (td.class2) and the lookahead captures the same block of rules again.

This won't handle @-rules or comments, but it works fine on the sample data you provided. I even checked it out on some real-world stylesheets and it did remarkably well.

Alan Moore 2009-04-18 06:04:29

Thanks. Similar to chaos' answer, capturing inside of the look ahead was the solution. I gave you the accepted answer because your regex actually works on all sorts of sample data that I threw at it (and GREATLY simplified the way I was doing it). I'm stripping comments out before processing anyway, so it seems to be all good now.

Rich 2009-04-20 15:56:56

ansaurus

tags:

views:

answers:

Multi-line regex with overlapping matches

related questions