tags:

views:

114

answers:

2

I made, what I believed to be, an error in a regular expression in Java recently but when I test my code I don't get the error I expect.

The expression I created was meant to replace a password in a string that I received from another source. The pattern I used went along the lines of: "password: [^\\s.]*", the idea being that it would match the word "password" the colon, a space, then any characters except for a space or a full-stop (period). I would then replace the instance with "password: XXXXXX" and therefore mask it.

The obvious error should be that I have forgotten to escape the full-stop. In otherwords the proper expression should have been "password: [^\\s\\.]*". Thing is, if I don't escape the full-stop the code still works!

Here's some sample code:

import java.util.regex.*;

public class SimpleRegexTest {

    public static void main(String[] args) {
        Pattern simplePattern = Pattern.compile("password: [^\\s.]*");
        Matcher simpleMatcher = simplePattern.matcher("password: newpass. Enjoy.");
        String maskedString = simpleMatcher.replaceAll("password: XXXXXX");
        System.out.println(maskedString);
    }

}

When I run the above code I get the following output:

password: XXXXXX. Enjoy.

Is this a special case, or have I completely missed something?

(edit: changed to "escape the full-stop")

Michael Borgwardt: I couldn't think of another term to describe what I was doing apart from "negation group", sorry for the ambiguity.

Aviator: In this case, no, a space won't be in the password. I didn't make the rules ;-).

(edit: doubled up the slashes in the non-code text so it displays properly, added the ^ which was in the code, but not the text :-/)

Sundar: Fixed the double slashes, SO seems to have it's own escape characters.

+10  A: 

A period ('.' character) does not need to be escaped inside a character class [] in a regular expression.

From the API:

Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.

Avi
That's true - but it's not his original problem, since he did originally try it without escaping. +1 for the reference anyway, though. :)
Andrzej Doyle
@dtsazza: I think that _is_ his original problem - the question is about why an unescaped period works, and Avi gave the answer for it.
sundar
This was the answer I was looking for, I've looked at the Regex page in the java api's a lot of times and still never noticed the paragraph Avi posted. Thanks Avi.
Kurley
A: 

It looks like you got the negation operator mixed up for regex ranges.

In particular, my understanding is that you used the snippet [\s.]* to mean "any characters except for a space or a full-stop (period)." This would in fact be expressed as [^ .]*, using the caret to negate the characters in the set.

I don't know if this was just a typo in your post or what was actually in your code, but the regex as it stands in your question will match the word "password", a colon, a space, then any sequence of backslash characters, "s" characters or periods.

Andrzej Doyle