tags:

views:

116

answers:

2

Given a string with attribute/value pairs such as

attr1="some text" attr2 = "some other text" attr3= "some weird !@'#$\"=+ text"

the goal is to parse it and output an associative array, in this case:

array('attr1' => 'some text',
      'attr2' => 'some other text',
      'attr3' => 'some weird !@\'#$\"=+ text')

Note the inconsistent spacing around the equal signs, the escaped double quote in the input, and the escaped single quote in the output.

+6  A: 

Try something like this:

$text = "attr1=\"some text\" attr2 = \"some other text\" attr3= \"some weird !@'#$\\\"=+ text\"";
echo $text;
preg_match_all('/(\S+)\s*=\s*"((?:\\\\.|[^\\"])*)"/', $text, $matches, PREG_SET_ORDER);
print_r($matches);

which produces:

attr1="some text" attr2 = "some other text" attr3= "some weird !@'#$\"=+ text"

Array
(
    [0] => Array
        (
            [0] => attr1="some text"
            [1] => attr1
            [2] => some text
        )

    [1] => Array
        (
            [0] => attr2 = "some other text"
            [1] => attr2
            [2] => some other text
        )

    [2] => Array
        (
            [0] => attr3= "some weird !@'#$\"=+ text"
            [1] => attr3
            [2] => some weird !@'#$\"=+ text
        )

)

And a short explanation:

(\S+)               // match one or more characters other than white space characters
                    // > and store it in group 1
\s*=\s*             // match a '=' surrounded by zero or more white space characters 
"                   // match a double quote
(                   // open group 2
  (?:\\\\.|[^\\"])* //   match zero or more sub strings that are either a backslash
                    //   > followed by any character, or any character other than a
                    //   > backslash
)                   // close group 2
"                   // match a double quote
Bart Kiers
What about the third example?
Gumbo
Yes, I forgot to double escape the backslash (and double check the output). I'm afraid I am sometimes too confident in myself. Thanks.
Bart Kiers
Is there any difference between the way php and actionscript, that is ecmascript/js btw, handles regex? Because this regex gave only the first two attrs in actionscript.
Amarghosh
Next to no experience in ECMA-ish regex flavours, but you might want to try `var regex = /(\S+)\s*=\s*"((?:\\.|[^\\"])*)"/g;`, or even `var regex = /(\S+)\s*=\s*"((?:\\.|[^\"])*)"/g;` (not tested!).
Bart Kiers
Both works well with all three cases given by OP, but not with a trailing backslash :(
Amarghosh
A quick JS test produced the desired result with a (quoted) trailing backslash: `document.write("attr4=\"\\\\\"".match(/(\S+)\s*=\s*"((?:\\.|[^\\"])*)"/i));`
Bart Kiers
Thanks so much, Bart! Is it much of a monkey wrench if I want the quotes to be optional in the case that the value has no spaces?
dreeves
You're welcome Dreeves. No, it shouldn't be much hassle to include that. Try: `(\S+)\s*=\s*([^"\s]+|"(?:\\.|[^\\"])*")`. If you need clarification, just say so and I'll include a brief explanation in my original post.
Bart Kiers
+2  A: 

EDIT: This regex fails if the value ends in a backslash like attr4="something\\"

I don't know PHP, but since the regex would be essentially the same in any language, this is how I did it in ActionScript:

var text:String = "attr1=\"some text\" attr2 = \"some other text\" attr3= \"some weird !@'#$\\\"=+ text\"";

var regex:RegExp = /\s*(\w+)\s*=\s*(?:"(.*?)(?<!\\)")\s*/g;

var result:Object;
while(result = regex.exec(text))
    trace(result[1] + " is " + result[2]);

And I got the following out put:

attr1 is some text
attr2 is some other text
attr3 is some weird !@'#$\"=+ text

Amarghosh
Just a small nitpick: if the value contains a backslash itself, like `attr3 = "\\"` (which will likely need escaping too), it won't work with a negative look behind. Of course, that might never happen, the OP didn't mention such corner cases.
Bart Kiers
Yeah, you are right. And that's not a nitpick - apparently this fails if the string ends with a backslash - like `attr4="something\\"`
Amarghosh