views:

45

answers:

3

I'm building a small Java library which has to match units in strings. For example, if I have "300000000 m/s^2", I want it to match against "m" and "s^2".

So far, I have tried most imaginable (by me) configurations resembling (I hope it's a good start)

"[[a-zA-Z]+[\\^[\\-]?[0-9]+]?]+"

To clarify, I need something that will match letters[^[-]numbers] (where [ ] denotes non obligatory parts). That means: letters, possibly followed by an exponent which is possibly negative.

I have studied regex a little bit, but I'm really not fluent, so any help will be greatly appreciated!

Thank you very much,

EDIT: I have just tried the first 3 replies

String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";

and it doesn't work... I know the code which tests the patterns work, because if I try something simple, like matching "[0-9]+" in "12345", it will match the whole string. So, I don't get what's still wrong. I'm trying with changing my brackets for parenthesis where needed at the moment...

CODE USED TO TEST:

public static void main(String[] args) {
    String input = "30000 m/s^2";

//    String input = "35345";

    String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
    String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
    String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";
    String regex10 = "[0-9]+";
    String regex = "([a-zA-Z]+)(?:\\^\\-?[0-9]+)?";
    Pattern pattern = Pattern.compile(regex3);
    Matcher matcher = pattern.matcher(input);

    if (matcher.matches()) {
        System.out.println("MATCHES");
        do {
            int start = matcher.start();
            int end = matcher.end();
//            System.out.println(start + " " + end);
            System.out.println(input.substring(start, end));
        } while (matcher.find());
    }

}
+2  A: 
([a-zA-Z]+)(?:\^(-?\d+))?

You don't need to use the character class [...] if you're matching a single character. (...) here is a capturing bracket for you to extract the unit and exponent later. (?:...) is non-capturing grouping.

KennyTM
+1  A: 

You're mixing the use of square brackets to denote character classes and curly brackets to group. Try this instead:

[a-zA-Z]+(\^-?[0-9]+)?

In many regular expression dialects you can use \d to mean any digit instead of [0-9].

Mark Byers
Thank you very much for your help and patience!
M. Joanis
A: 

Try

"[a-zA-Z]+(?:\\^-?[0-9]+)?"
S.Mark
You missed the `-`.
Mark Byers
Thanks, I have added, actually it was intentional, because I dont see - in the string to match
S.Mark