views:

74

answers:

4

I want to match following pattern:

key="value" key="value" key="value" key="value" ...

where key and value are [a-z0-9]+, both should be grouped (2 groups, the " - chars can be matched or skipped)

input that should not be matched:
key="value"key="value" (no space between pairs)

For now I got this(not .NET syntax):

([a-z0-9]+)=(\"[a-z0-9]+\")(?=\s|$)

the problem with that, that it matches key4="value4" in input:

 key3="value3"key4="value4"
A: 

The spec isn't very clear, but you can try:

(?<!\S)([a-z0-9]+)=("[a-z0-9]+")(?!\S)

Or, as a C# string literal:

"(?<!\\S)([a-z0-9]+)=(\"[a-z0-9]+\")(?!\\S)"

This uses a negative lookarounds to ensure that the the key-value pair is neither preceded nor followed by non-whitespace characters.

Here's an example snippet (as seen on ideone.com):

   var input = "key1=\"value1\" key2=\"value2\"key3=\"value3\" key4=\"value4\"";
   Console.WriteLine(input);
   // key1="value1" key2="value2"key3="value3" key4="value4"

   Regex r = new Regex("(?<!\\S)([a-z0-9]+)=(\"[a-z0-9]+\")(?!\\S)");
   foreach (Match m in r.Matches(input)) {
     Console.WriteLine(m);
   }
   // key1="value1"
   // key4="value4"

Related questions


On validating the entire input

You can use Regex.IsMatch to see if the input string matches against what should be the correct input pattern. You can also use the same pattern to extract the keys/values, thanks to the fact that .NET regex lets you access individual captures.

   string[] inputs = {
      "k1=\"v1\" k2=\"v2\" k3=\"v3\" k4=\"v4\"",
      "k1=\"v1\" k2=\"v2\"k3=\"v3\" k4=\"v4\"",
      "    k1=\"v1\"      k2=\"v2\"     k3=\"v3\"     k4=\"v4\"     ",
      "     ",
      " what is this? "
   };

   Regex r = new Regex("^\\s*(?:([a-z0-9]+)=\"([a-z0-9]+)\"(?:\\s+|$))+$");
   foreach (string input in inputs) {
     Console.Write(input);
     if (r.IsMatch(input)) {
        Console.WriteLine(": MATCH!");
        Match m = r.Match(input);
        CaptureCollection keys   = m.Groups[1].Captures;
        CaptureCollection values = m.Groups[2].Captures;
        int N = keys.Count;
        for (int i = 0; i < N; i++) {
           Console.WriteLine(i + "[" + keys[i] + "]=>[" + values[i] + "]");
        }
     } else {
        Console.WriteLine(": NO MATCH!");
     }
   }

The above prints (as seen on ideone.com):

k1="v1" k2="v2" k3="v3" k4="v4": MATCH!
0[k1]=>[v1]
1[k2]=>[v2]
2[k3]=>[v3]
3[k4]=>[v4]
k1="v1" k2="v2"k3="v3" k4="v4": NO MATCH!
    k1="v1"      k2="v2"     k3="v3"     k4="v4"     : MATCH!
0[k1]=>[v1]
1[k2]=>[v2]
2[k3]=>[v3]
3[k4]=>[v4]
     : NO MATCH!
 what is this? : NO MATCH!

References


Explanation of the pattern

The pattern to validate the entire input is essentially:

maybe leading
spaces       ___ end of string anchor
  |         /
^\s*(entry)+$
|          \
beginning   \__ one or more entry
of string
anchor

Where each entry is:

key=value(\s+|$)

That is, a key/value pair followed by either spaces or the end of the string.

polygenelubricants
This is good, a small question: is it possible using regex not match anything if there is a miss match in input? Or I must validate it after regex is completed?
ilann
@ilann: see latest update.
polygenelubricants
+1  A: 

Use a lookbehind like you used your lookahead:

(?<=\s|^)([a-z0-9]+)=(\"[a-z0-9]+\")(?=\s|$) 
Jens
+1  A: 

I second Jens' answer (but am still too puny to comment on others' answers).

Also, I've found this Regular Expressions Reference site to be quite awesome. There's a section on Lookaround about halfway down on the Advanced page, and some further notes about Lookbehind.

KlaymenDK
+2  A: 

I think SilentGhost proposal is about using String.Split()

Like this :

String keyValues = "...";

foreach(String keyValuePair in keyValues.Split(' '))
    Console.WriteLine(keyValuePair);

This is definitively faster and simpler.

kbok