views:

37

answers:

1

I am trying to create a .NET Regex to parse a CSS font declaration, which takes following form:

font: italic small-caps bold xx-small 3.0em "Times New Roman", Times, serif;

According to the CSS specification, all of the elements of the declared value are optional, and I have successfully created Regexes that match the first five elements (in all their different permitted forms), but I am having trouble creating a Regex that matches the list of font names, which is always the last element in the property value. I don't need to identify the individual elements in the font names list; I just want to match the list as a whole.

The font names list is comma separated list (with optional whitespace between elements) and each member of the list is either a single-word name or multiple words enclosed in quotes.

So far, I have come up with the following expression ...

(?<NAME_LIST>(?<QUOTED_NAME>"[\w ]+")|(?<SIMPLE_NAME>\w+)(?:,\s*(?<QUOTED_NAME>"\w ]+")|(?<SIMPLE_NAME>\w+))*)

... but it matches each member of the list individually, instead of matching the entire list.

Any ideas would be appreciated.

Thanks,

Tim

+1  A: 

Perhaps something like this (assuming you already have some regex in place before this bit to match the stuff before the font list)?

(?<FONTS>(?:['"]?(?:\w+\s*)+["']?(?:,\s*|\s*;))+)

Note that this matches the semicolon at the end as well, but that can easily be trimmed off using string operations.

EDIT: Since you're only evaluating the value part of the declaration, you'll want this regex instead, which also has some fixes because of other problems I noticed with my original pattern.

(?<FONTS>(?:\s*(?:(?:['"](?:\w|\s)+["'])|\w+)\s*(?:,|$))+)
JAB
@JAB: Thanks for your answer. I've accepted it, because it certainly matches the complete list of font names, which is more than I have achieved myself. However, I misled you a little ... the semicolon is not present in my input string because I am actually just parsing the value part of the declaration. I have removed the semicolon from your pattern and it still matches correctly. Am I likely to see any side effects from this usage? Thanks again.
Tim Coulter
@Tim: Yeah, it won't match the trailing font name, and I just discovered another problem with my regex that I hadn't noticed before (matches extra stuff at the beginning if the first font name isn't quoted), so I'm fixing it up now.
JAB
Okay, updated. (Had to step away from my computer for a bit while working on it.)
JAB
Thanks for the update - it passes my 49 unit tests so I guess it's a pretty robust solution! Thanks again for your help.
Tim Coulter
You're quite welcome.
JAB