tags:

views:

48

answers:

2

Hello,

I have a regex which includes a list of commands. But I don't know what kind of parameter behind it is, so it can be a string or a number or nothing.
And there can be the possibility, that I don't know the command.

In my first version there wasn't any strings, so (abc|def|[a-z]+)([0-9]*) works fine. But now I want to allow strings, too. (abc|def|[a-z]+)([0-9]*|[a-z]*) doesn't work.

String 1: abc20def20ghi20
String 2: abcdddef20ghi20
String 3: abcdddef2d0ghi20abcdd

String 1:
Example with regex 1: abc20def20ghi20
Example with regex 2: abc20def20ghi20

String 2:
Example with regex 1: abcdddef20ghi20
Example with regex 2: abcdddef20ghi20

I want to get following result: abc20def20ghi20 and abcdddef20ghi20

Thanks for your help.

A: 

Do you always want to capture strings whose length is 5? If so, you can do this:

([a-z]{3})([0-9a-z]{2})

If not, maybe you can clarify that what exactly is the criteria to "cut" the string between "abcdd" and "def20"?

reko_t
No I'm sorry, but the commands and parameters haven't any specified length.
H3llGhost
+1  A: 

Based on your latest comment, maybe this will do the trick for you:

(abc|def)(\d+|(?:(?!(?1))[a-z])+)?|((?:(?!(?1))[a-z])+)((?2))?

EDIT. Oops, meant to edit my previous answer instead of posting new one.

TEST CASE:

<?php

$r = '#(abc|def)(\d+|(?:(?!(?1))[a-z])+)?|((?:(?!(?1))[a-z])+)((?2))?#';
$s1 = 'abc20def20ghi20';
$s2 = 'abcdddef20ghi20';
$s3 = 'abcdddef2d0ghi20abcdd';

preg_match_all($r, $s1, $m1);
preg_match_all($r, $s2, $m2);
preg_match_all($r, $s3, $m3);
var_dump($m1[0], $m2[0], $m3[0]);

Output:

array(3) {
  [0]=>
  string(5) "abc20"
  [1]=>
  string(5) "def20"
  [2]=>
  string(5) "ghi20"
}
array(3) {
  [0]=>
  string(5) "abcdd"
  [1]=>
  string(5) "def20"
  [2]=>
  string(5) "ghi20"
}
array(5) {
  [0]=>
  string(5) "abcdd"
  [1]=>
  string(4) "def2"
  [2]=>
  string(2) "d0"
  [3]=>
  string(5) "ghi20"
  [4]=>
  string(5) "abcdd"
}

As you can see, it catches all parts from both strings correctly.

reko_t
Nearly. :) I have thought about the lookahead, but I don't get it. The only thing is, that it doesn't work completly with String 1 the dd after abc will be ignored and the ghi is ignored in both Strings.
H3llGhost
Okay, updated the regexp to take those cases into account, should work now. I also changed the look-ahead to use `(?1)` so you only need to edit the first command list if you want to add new commands.
reko_t
Very nice work, but my RegexBuddy says that there are some errors and I don't know how to fix it. Simultaneous I developed your regex in a very simple version `(abc|def|[a-z]*).+?(?!(abc|def))`. :D
H3llGhost
What regex flavor are you using? I tried with PCRE and it works. Might be that the `(?1)` aren't working in RegexBuddy, you can try to replace them with `(abc|def)`
reko_t
I use PCRE with JGsoft program version 3.4.2. I don't know why it doesn't work.
H3llGhost
Here you can see the output of my RegexBuddy http://a.imageshack.us/img25/7753/regex.jpg
H3llGhost
When ?1 your syntax for backreference 1 is, I fixed it. But it doesn't work finally. The regex forgets the 20 at the end of String 1 and in String 2 all numbers were ignored and the first two commands with parameter was take together. Sorry for my comments, but thanks for your help, yet!
H3llGhost
You said that you don't need to get the parameter for "ghi" in your comment in your original question, so I left that part off. If you want to capture the 20 from the ghi as well, just change the last `[a-z]` to just `.` It should match all the parts correctly, I'll edit my answer with a test-case and the output.
reko_t
Thanks very much! I found the solution `(abc|def)([0-9]+|(?:(?!(abc|def))[a-z])+)?|(?:(?!(abc|def)).)+`. I replaced the last [a-z] with the `([a-z]|[0-9])` to capture alphanumberic letters.
H3llGhost
Yeah, changing to `([a-z]|[0-9])` does the same thing as changing it to just `.`. Btw you can just do `[a-z0-9]` instead of having two separate character classes.
reko_t
The short edit time of a comment is annoying! My solution `(abc|def)([0-9]+|(?:(?!(abc|def))[a-z])+)?|(?:(?!(abc|def))[a-z]+[0-9]*)+` has a problem. It recognized abcdd def2 d0ghi20 abcdd and that isn't correct. d0ghi20 musst be splitted. I do this to force the order of the letters. First the alpha then the numbers. ;)
H3llGhost
I have a new testcase. String 3: abcdddef2d0ghi20abcdd
H3llGhost
What should be the correct captures of string 3?
reko_t
I want it to def2 d0 ghi20 abcdd, when it is possible. :)
H3llGhost
Updated the regexp + the test case. You might need to change the `(?1)` to `(asd|def)` and `(?2)` to `(\d+|(?:(?!(?1))[a-z])+)` if you try it with regexbuddy.
reko_t
Perfect! It works :) So complex regex pattern I haven't created before! But I don't understand what the (?1) makes. When I replaced it with \1 for backreference it doesn't work.
H3llGhost
(?1) is basically like a function call. (?1) is same as using the corresponding subpattern in its place. \1 refers to an already captured part.
reko_t
Interesting. Perhaps it isn't implemented in Java. I don't know.
H3llGhost