views:

987

answers:

4

What's the best way to extract the key and value from a string like this:

var myString = 'A1234=B1234';

I originally had something like this:

myString.split('=');

And that works fine, BUT an equal (=) sign could be used as a key or value within the string plus the string could have quotes, like this:

var myString = '"A123=1=2=3=4"="B1234"';

The string could also only have one pair of quotes and spaces:

var myString = ' "A123=1=2=3=4" = B1234 ';

I'm not very good at regular expressions but I'm guessing that's the way forward?

What I want to end up with is two variables, key and value, in the case above, the key variable would end up being A123=1=2=3=4 and the value variable would be B1234.

If there is no value present, for example if this were the original string:

var myString = 'A1234';

Then I would want the key variable to be 'A1234' and for the value variable to be null or false - or something I can test against.

Any help is appreciated.

+4  A: 

can't help with a one-liner, but I'll suggest the naive way:

var inQuote = false;
for(i=0; i<str.length; i++) {
   if (str.charAt(i) == '"') {
      inQuote = !inQuote;
   }
   if (!inQuote && str.charAt(i)=='=') {
      key = str.slice(0,i);
      value = str.slice(i+1);
      break;
   }
}
Jimmy
Don't forget backslashes for escaping enclosed quotes! But this is pretty much the same approach I'd take. Regular expressions are not the right tool here. This is a job for a parser.
benjismith
Thanks for this, I've saved it for the future - for this particular issue I'm gonna just ignore those 'equal' signs and thinking about it, there is no real need for the user to have the opportunity to include quotes. - I'll strip them at user entry instead.
J-P
In Internet Explorer you can't use str[x], but can use str.charCode(x)
some
Damn stupid browser incompatibilities :-) - this is why I love jQuery with a passion.
paxdiablo
good call @some... I actually asked a SO question about IE and str[x] before, but I keep forgetting :)
Jimmy
+2  A: 

What I've tended to do in config files is ensure that there's no possibility that the separator character can get into either the key or value.

Sometimes that's easy if you can just say "no '=' characters allowed" but I've had to resort to encoding those characters in some places.

I generally hex them up so that if you wanted a '=' character, you would have to put in %3d (and %25 for the '%' character so you don't think it's a hex-starter character). You can also use %xx for any character but it's only required for those two.

That way you can check the line to ensure it has one and only one '=' character then post-process the key and value to turn the hex'ed characters back into real ones.

paxdiablo
You're right - it's too much of a hastle to cater to the people who will want to use the equal sign. I ended up just using the split thing - I might come up with some unicode thing in the future. Thanks for the advice! :)
J-P
+3  A: 
/^(\"[^"]*\"|.*?)=(\"[^"]*\"|.*?)$/
Adrian
Put a $ at the end of that, otherwise the lazy matcher lives up to its name and matches nothing.
nickf
+2  A: 

If we make a rule that all keys with equal signs need to be embedded within quotes, then this works well (I can't imagine any good reason for letting escaped quotes within a key.)

/ ^               # Beginning of line
  \s*             # Any number of spaces
  ( " ( [^"]+) "  # A quote followed by any number of non-quotes, 
                  # and a closing quote
  | [^=]*         # OR any number of not equals signs 
    [^ =]         # and at least one character that is not a equal or a space
  )               
  \s*             # any number of spaces between the key and the operator
  =               # the assignment operator
  \s*             # Any number of spaces 
  (.*?\S)         # Then any number of any characters, stopping at the last non-space
  \s*             # Before spaces and...
  $               # The end of line.

/

Now in Java, properties files (they break at the first ':' or '=', though) you can have multiple lines in a property by putting '\' at the end of the line, so it would be a little trickier.

Axeman