views:

66

answers:

2

Hi,

I have got some strings to be matched via RegEx. We have a java application which reads the regex from a config file and takes two groups of strings, number of which are specified in the same config.

E.g.

CustomAction.523274ca945f.dialogLabel=Executing Custom Code...

will be matched with

(?m)^(?!#)\s*(\S*)\s*=\s*(\S*.*)

What I need is to pick the first group "CustomAction.523274ca945f.dialogLabel" and exclude the random string in the middle so I end up with something like "CustomAction.dialogLabel" or "CustomAction..dialogLabel" well any other combination but the random string.

I don't have the source for the java application I am using. This is an app for which I can create a config file in which I specify a pattern and two groups and app picks them

pattern: (?m)^(?!#)\\s*([^.=\\s]*)\\.(?:[^.=\\s]*\\.)?([^.=\\s]*)\\s*=\\s*(.*?)\\s*$
key_group:  1
value_group:    2

I can only specify one group per key and one per value. According to this pattern app picks the key_group to be the key and value_group to be the value for it.

I don't want the garbage in the middle as it is random this changes the key every time.

Thanks

+1  A: 

Two approaches; first assuming your property is three items long a simple replacement of your first (\S*) with:

(\S+?)\.\S+?\.(\S+)

Note I also changed the * to + since it doesn't make sense to have ".." as part of a property, I've also used non-greedy qualifiers, but it should still work fine without them. Then you can just use the appropriate group numbers to reconstruct the adjusted property. A second approach assuming that your random string is a hex number (which it appears to be) and the non-random portions of the property do not include numbers:

((?:\S+.)*)(?:[0-9A-Fa-f]+.)?((?:\S+.?)+)

So the first group should pickup everything before the random number (including a trailing dot) the second group will eat the random number, and then the third will match the remaining string (or the whole thing if there is no random number portion).

EDIT

With the updated description of the problem and only matching two groups my answer is this is not possible. In a regular expression there is no mechanism to "erase" part of a match. From the problem definition the part of the key that is not to be included is in the middle of other text i.e. the general pattern to match is:

((a)(?:b)(c))

Since we can not pre or post-process "b" will always be a part of the larger match group that includes both a and c, the fact that it is a non-matching group does not effect the larger group.

M. Jessup
Hi, I need the result be first and third group concatenated not in different groups. In an app I have I can specify two groups only, one for the Key and one for the value.
rojanu
Then I will change my answer to "not possible" (see edit)
M. Jessup
A: 

The specification isn't very clear, but here's what I'm going to assume:

  • # at the beginning of the line is a comment
  • The "key" can have up to 3 parts, separated by a literal .
    • The middle part is an optional "garbage"
  • The "key" is followed by =, then the "value"
  • . and = are special markers at least until the "value" part, where then everything goes
  • Allow whitespaces

Then perhaps the pattern is something like this works:

    String text = 
        "  some.stuff.here  =    blah blah  \n" +
        "  awesome.key  =    { level = 10 }  \n" +
        "# awesome.key  =    { level = 11 }  \n" +
        "  awesome..key =    { level = 12 }  \n" +
        "  !@#$.)(*&.$%& =   a=b=c.d=f ";

    Pattern p = Pattern.compile(
        "(?m)^(?!#) (key)@(?:key@)?(key) = (value) $"
            .replace("@", "\\.")
            .replace(" ", "\\s*")
            .replace("key", "[^.=\\s]*")
            .replace("value", ".*?")
    );

    Matcher m = p.matcher(text);
    while (m.find()) {
        System.out.printf("%s.%s => [%s]%n",
            m.group(1),
            m.group(2),
            m.group(3)
        );
    }

This prints:

some.here => [blah blah]
awesome.key => [{ level = 10 }]
awesome.key => [{ level = 12 }]
!@#$.$%& => [a=b=c.d=f]

Note the replace approach to generate the final regex pattern; it's used to enhance readability of the overall big picture "pattern"

polygenelubricants
I am sorry, I should have been more precise. I am updating the question
rojanu