views:

6131

answers:

6

I'm creating a CSS editor and am trying to create a regular expression that can get data from a CSS document. This regex works if I have one property but I can't get it to work for all properties. I'm using preg/perl syntax in PHP.

Regex

(?<selector>[A-Za-z]+[\s]*)[\s]*{[\s]*((?<properties>[A-Za-z0-9-_]+)[\s]*:[\s]*(?<values>[A-Za-z0-9#, ]+);[\s]*)*[\s]*}

Test case

body { background: #f00; font: 12px Arial; }

Expected Outcome

Array(
    [0] => Array(
            [0] => body { background: #f00; font: 12px Arial; }
            [selector] => Array(
                [0] => body
            )
            [1] => Array(
                [0] => body
            )
            [2] => font: 12px Arial; 
            [properties] => Array(
                [0] => font
            )
            [3] => Array(
                [0] => font
            )
            [values] => Array(
                [0] => 12px Arial
                [1] => background: #f00
            )
            [4] => Array(
                [0] => 12px Arial
                [1] => background: #f00
            )
        )
)

Real Outcome

Array(
    [0] => Array
        (
            [0] => body { background: #f00; font: 12px Arial; }
            [selector] => body 
            [1] => body 
            [2] => font: 12px Arial; 
            [properties] => font
            [3] => font
            [values] => 12px Arial
            [4] => 12px Arial
        )
    )

Thanks in advance for any help - this has been confusing me all afternoon!

+11  A: 

That just seems too convoluted for a single regular expression. Well, I'm sure that with the right extentions, an advanced user could create the right regex. But then you'd need an even more advanced user to debug it.

Instead, I'd suggest using a regex to pull out the pieces, and then tokenising each piece separately. e.g.,

/([^{])\s*\{\s*([^}])\s*}/

Then you end up with the selector and the attributes in separate fields, and then split those up. (Even the selector will be fun to parse.) Note that even this will have pains if }'s can appear inside quotes or something. You could, again, convolute the heck out of it to avoid that, but it's probably even better to avoid regex's altogether here, and handle it by parsing one field at a time, perhaps by using a recursive-descent parser or yacc/bison or whatever.

Tanktalus
Agreed on the first part -- break down the problem. But I don't see why this can't be a job for regex.
harpo
"Any sufficiently advanced regex is indistinguishable from magic" - Misquote of Arthur C. Clarke (I think)It could be a job for regex - the question is if regex is the right tool.
Ken Gentle
@harpo It is a job for a parser, the parser may use regexes to help it identify tokens, but you need more than just regexes to implement a parser.
Chas. Owens
+8  A: 

You are trying to pull structure out of the data, and not just individual values. Regular expressions might could be painfully stretched to do the job, but you are really entering parser territory, and should be pulling out the big guns, namely parsers.

I have never used the PHP parser generating tools, but they look okay after a light scan of the docs. Check out LexerGenerator and ParserGenerator. LexerGenerator will take a bunch of regular expressions describing the different types of tokens in a language (in this case, CSS) and spit out some code that recognizes the individual tokens. ParserGenerator will take a grammar, a description of what things in a language are made up of what other things, and spit out a parser, code that takes a bunch of tokens and returns a syntax tree (the data structure that you are after.

Glomek
+6  A: 

I would recommend against using regex's to parse CSS - especially in single regex!

If you insist on doing the parsing in regex's, split it up into sensible sections - use one regex to split all the body{..} blocks, then another to parse the color:rgb(1,2,3); attributes.

If you are actually trying to write something "useful" (not trying to learn regular expressions), look for a prewritten CSS parser.

I found this cssparser.php which seems to work very well:

$cssp = new cssparser;
$cssp -> ParseStr("body { background: #f00;font: 12px Arial; }");
print_r($cssp->css);

..which outputs the following:

Array
(
    [body] => Array
        (
            [background] => #f00
            [font] => 12px arial
        )
)

The parser is pretty simple, so should be easy to work out what it's doing. Oh, I had to remove the lines that read if($this->html) {$this->Add("VAR", "");} (it seems to be a debugging thing that was left in)

I've mirrored the script here, with the above changes in

dbr
+5  A: 

Do not use your own regex for parsing CSS. Why reinvent the wheel while there is code waiting for you, ready to use and (hopefully) bug-free?

There are two generally available classes that can parse CSS for you:

HTML_CSS PEAR package at pear.php.net

and

CSS Parser class at PHPCLasses:

http://www.phpclasses.org/browse/package/1289.html

Jacek Lange
A: 

I am using the regex below and it pretty much works... of course this question is old now and I see that you've abandoned your efforts... but in case someone else runs across it:

(?(?:(?:[^,{]+),?)?){(?:(?[^}:]+):?(?[^};]+);?)?}

(hafta remove all of the /* comments */ from your CSS first to be safe)

Nick Franceschina
A: 

Try this

function trimStringArray($stringArray){
    $result = array();
    for($i=0; $i < count($stringArray); $i++){
        $trimmed = trim($stringArray[$i]);
        if($trimmed != '') $result[] = $trimmed;
    }
    return $result;
}
$regExp = '/\{|\}/';
$rawCssData = preg_split($regExp, $style);

$cssArray = array();
for($i=0; $i < count($rawCssData); $i++){
    if($i % 2 == 0){
        $cssStyle['selectors'] = array();
        $selectors = split(',', $rawCssData[$i]);
        $cssStyle['selectors'] = trimStringArray($selectors);
    }
    if($i % 2 == 1){
        $attributes = split(';', $rawCssData[$i]);
        $cssStyle['attributes'] = trimStringArray($attributes);
        $cssArray[] = $cssStyle;
    }

}
//return false;
echo '<pre>'."\n";
print_r($cssArray);
echo '</pre>'."\n";
Poseidon