tags:

views:

93

answers:

2

Hello everyone, I have this string :

<meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd">

How can I get lastStop property value in JAVA?

This regex worked when tested on http://www.myregexp.com/

But when I try it in java I don't see the matched text, here is how I tried :

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SimpleRegexTest {
    public static void main(String[] args) {
        String sampleText = "<meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">";
        String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*";
        Pattern p = Pattern.compile(sampleRegex);
        Matcher m = p.matcher(sampleText);
        if (m.find()) {
            String matchedText = m.group();
            System.out.println("matched [" + matchedText + "]");
        } else {
            System.out.println("didn’t match");
        }
    }
}

Maybe the problem is that I use escape char in my test , but real string doesn't have escape inside. ?

UPDATE

Does anyone know why this doesn't work when used in java ? or how to make it work?

+3  A: 
(?<=lastStop=[\"']?)[^\"]+
Hun1Ahpu
@Hun1Ahpu tnx, I just tried it at http://www.myregexp.com/ its not working.
Gandalf StormCrow
try again without named grouping
Hun1Ahpu
@Hun1Ahpu how would I do that?
Gandalf StormCrow
I've changed regex
Hun1Ahpu
Thank you that worked
Gandalf StormCrow
For `lastStop=123 id="1"`, it will capture `123 id=`, not to mention `uri="localhost/naro-nei?lastStop=4"`. I'm sure Java has better XML abilities.
Kobi
One more thing its working on this website for regex but in java I get syntax error, `String regex = "(?<=lastStop=["']?)[^"']*";` , how to correct that?
Gandalf StormCrow
@Kobi this is a string, not xml in this case parsing with xml is just an overhead
Gandalf StormCrow
You need to escape the quotes: `String regex = "(?<=lastStop=[\"']?)[^\"']*";`
Kobi
@Kobi thank you it compiles now
Gandalf StormCrow
@Hun1Ahpu I updated my question
Gandalf StormCrow
How to match expression with \"something\"?
Gandalf StormCrow
Changed regex again. try it.
Hun1Ahpu
this is it thank you
Gandalf StormCrow
+2  A: 

The reason it doesn't work as you expect is because of the * in [^\"']*. The lookbehind is matching at the position before the " in lastStop=", which is permitted because the quote is optional: [\"']?. The next part is supposed to match zero or more non-quote characters, but because the next character is a quote, it matches zero characters.

If you change that * to a +, the second part will fail to match at that position, forcing the regex engine to bump ahead one more position. The lookbehind will match the quote, and [^\"']+ will match what follows. However, you really shouldn't be using a lookbehind for this in the first place. It's much easier to just match the whole sequence in the normal way and extract the part you want to keep via a capturing group:

String sampleRegex = "lastStop=[\"']?([^\"']*)";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
    String matchedText = m.group(1);
    System.out.println("matched [" + matchedText + "]");
} else {
    System.out.println("didn’t match");
}

It will also make it easier to deal with the problem @Kobi mentioned. You're trying to allow for values contained in double-quotes, single-quotes or no quotes, but your regex is too simplistic. For one thing, a quoted value can contain whitespace, but an unquoted one can't. To deal with all three possibilities, you'll need two or three capturing groups, not just one.

Alan Moore
Your points might be valid I really don't know , but your regex doesn't work try it. I get this result `lastStop="bendi` instead of `bendi`
Gandalf StormCrow
It works for me. Notice that I'm calling `m.group(1)`, not `m.group()`.
Alan Moore
what is the difference between these 2 calls?
Gandalf StormCrow
`group()` returns the whole match, while `group(1)` returns the contents of the first capturing group. In this case, there's only the one capturing group: `([^\"']*)`. ref: http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html#group%28int%29, http://www.regular-expressions.info/brackets.html
Alan Moore