views:

3377

answers:

5

I need to write a regular expression that finds javascript files that match

<anypath><slash>js<slash><anything>.js

For example, it should work for both :

  • c:\mysite\js\common.js (Windows)
  • /var/www/mysite/js/common.js (UNIX)

The problem is that the file separator in Windows is not being properly escaped :

pattern = Pattern.compile(
     "^(.+?)" + 
     File.separator +
     "js" +
     File.separator +
     "(.+?).js$" );

Throwing

java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence

Is there any way to use a common regular expression that works in both Windows and UNIX systems ?

+2  A: 

Does Pattern.quote(File.separator) do the trick?

EDIT: This is available as of Java 1.5 or later. For 1.4, you need to simply escape the file separator char:

"\\" + File.separator

Escaping punctuation characters will not break anything, but escaping letters or numbers unconditionally will either change them to their special meaning or lead to a PatternSyntaxException. (Thanks Alan M for pointing this out in the comments!)

Tomalak
Great, what a pity it is only available since Java 1.5+ (I still need it to work in 1.4)
Guido
+2  A: 

Can't you just use a backslash to escape the path separator like so:

pattern = Pattern.compile(
     "^(.+?)\\" + 
     File.separator +
     "js\\" +
     File.separator +
     "(.+?).js$" );
Peter van der Heijden
+1  A: 

Why don't you escape File.separator:

... +
"\\" + File.separator +
...

to fit Pattern.compile requirements? I hope "\/" (unix case) is processed as a single "/".

gimel
A: 

I've tested gimel's answer on a Unix system - putting "\\" + File.separator works fine - the resulting "\/" in the pattern correctly matches a single "/"

Alnitak
+1  A: 

Is there any way to use a common regular expression that works in both Windows and UNIX systems ?

Yes, just use a regex that matches both kinds of separator.

pattern = Pattern.compile(
    "^(.+?)" + 
    "[/\\\\]" +
    "js" +
    "[/\\\\]" +
    "(.+?)\\.js$" );

It's safe because neither Windows nor Unix permits those characters in a file or directory name.

Alan Moore