tags:

views:

99

answers:

3

I'm learning to use Java's Pattern and Matcher, and this is an example code snippet in my book. It works as the author describes, but what I don't get is why \\. ends up being a dot instead of a backslash (the \\ part) and a dot (the . part). Does the compiler not read from left to right?

import java.util.regex.*;
public class SplitTest {
   public static void main(String[] args)  {
       String input= "www.cs.cornell.edu";                          

      Pattern p = Pattern.compile("\\.");
      String pieces[] = p.split(input);
      for (int i=0; i<pieces.length; i++){
            System.out.println(pieces[i]);    
            }



   }
}
+6  A: 

It's being interpreted once when parsing the string literal, and once by the regex compiler.

"\\." -> "\." - string literal
"\." -> literal . - regex compiler

Matthew Flaschen
Oh. Duh! *facepalm* Thanks!
Anita
This is the reason Java needs Regex support like other languages, where it would not use quotes but instead `/\./`.
mathepic
Java need the @-string like .NET languages had. It won't break the annotation syntax.
SHiNKiROU
+4  A: 

You must double-escape the string literal. "\\\\." Because Java interprets the string literal "\\." as \., which is not what you expect. Try this: System.out.println("\\."), and what you see is what you get in the regexp.

EDIT: Your input string is "www.cs.cornell.edu". Do you know what you are doing? Maybe you are trying to split by the dot (\.), which its Java literal is "\\." as you typed.

Maybe you are trying to match a BACKSLASH then a DOT, which means its regex is \\\., which its Java literal is "\\\\\\."

SHiNKiROU
Sorry I was unclear. I *was* trying to split by the dot. Or rather, the author was. And the code does it correctly.
Anita
And people wonder why I refuse to use java for regex...
TheLQ
A: 

Your code can be simplified a bit, like so:

public class SplitTest {
    public static void main(String[] args) {
        String input = "www.cs.cornell.edu";
        String[] pieces = input.split("\\.");
        for (String piece : pieces) {
            System.out.println(piece);
        }
    }
}
The "double back slash period" works just as expected in this case, but the formatting on stackoverflow requires "quadruple back slash period" which is kind of odd.

Mondain