ansaurus

Question

RegEx to match a pattern and exclude a part of the string

Answer 1

+1 A:

Two approaches; first assuming your property is three items long a simple replacement of your first (\S*) with:

(\S+?)\.\S+?\.(\S+)

Note I also changed the * to + since it doesn't make sense to have ".." as part of a property, I've also used non-greedy qualifiers, but it should still work fine without them. Then you can just use the appropriate group numbers to reconstruct the adjusted property. A second approach assuming that your random string is a hex number (which it appears to be) and the non-random portions of the property do not include numbers:

((?:\S+.)*)(?:[0-9A-Fa-f]+.)?((?:\S+.?)+)

So the first group should pickup everything before the random number (including a trailing dot) the second group will eat the random number, and then the third will match the remaining string (or the whole thing if there is no random number portion).

EDIT

With the updated description of the problem and only matching two groups my answer is this is not possible. In a regular expression there is no mechanism to "erase" part of a match. From the problem definition the part of the key that is not to be included is in the middle of other text i.e. the general pattern to match is:

((a)(?:b)(c))

Since we can not pre or post-process "b" will always be a part of the larger match group that includes both a and c, the fact that it is a non-matching group does not effect the larger group.

M. Jessup 2010-06-17 11:35:22

Hi, I need the result be first and third group concatenated not in different groups. In an app I have I can specify two groups only, one for the Key and one for the value.

rojanu 2010-06-18 14:20:00

Then I will change my answer to "not possible" (see edit)

M. Jessup 2010-06-21 12:01:54

Answer 2

A:

The specification isn't very clear, but here's what I'm going to assume:

# at the beginning of the line is a comment
The "key" can have up to 3 parts, separated by a literal .
- The middle part is an optional "garbage"
The "key" is followed by =, then the "value"
. and = are special markers at least until the "value" part, where then everything goes
Allow whitespaces

Then perhaps the pattern is something like this works:

    String text = 
        "  some.stuff.here  =    blah blah  \n" +
        "  awesome.key  =    { level = 10 }  \n" +
        "# awesome.key  =    { level = 11 }  \n" +
        "  awesome..key =    { level = 12 }  \n" +
        "  !@#$.)(*&.$%& =   a=b=c.d=f ";

    Pattern p = Pattern.compile(
        "(?m)^(?!#) (key)@(?:key@)?(key) = (value) $"
            .replace("@", "\\.")
            .replace(" ", "\\s*")
            .replace("key", "[^.=\\s]*")
            .replace("value", ".*?")
    );

    Matcher m = p.matcher(text);
    while (m.find()) {
        System.out.printf("%s.%s => [%s]%n",
            m.group(1),
            m.group(2),
            m.group(3)
        );
    }

This prints:

some.here => [blah blah]
awesome.key => [{ level = 10 }]
awesome.key => [{ level = 12 }]
!@#$.$%& => [a=b=c.d=f]

Note the replace approach to generate the final regex pattern; it's used to enhance readability of the overall big picture "pattern"

polygenelubricants 2010-06-17 12:48:27

I am sorry, I should have been more precise. I am updating the question

rojanu 2010-06-18 14:41:24

ansaurus

tags:

views:

answers:

RegEx to match a pattern and exclude a part of the string

related questions