The most direct translation would be:
Pattern p = Pattern.compile(
"\\w+://([\\x21-\\x22\\x24-\\x2E\\x30-\\x3A\\x40-\\x5A\\x5F\\x61-\\x7A]+)(/?\\S*)",
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Java has no equivalent for C#'s verbatim strings, so you always have to escape backslashes. And Java's regexes don't support named groups, so I converted those to simple capturing groups (named groups are due to be added in Java 7).
But there are a few problems with the original regex:
The RegexOptions.Compiled
modifier doesn't do what you probably think it does. Specifically, it's not related to Java's compile()
method; that's just a factory method, roughly equivalent to C#'s new Regex()
constructor. The Compiled
modifier causes the regex to be compiled to CIL bytecode, which can make it match a lot faster, but at a considerable cost in upfront processing and memory use--and that memory never gets garbage-collected. If you don't use the regex a lot, the Compiled
option is probably doing more harm than good, performance-wise.
The IgnoreCase/CASE_INSENSITIVE
modifier is pointless since your regex always matches both upper- and lowercase variants wherever it matches letters.
The Singleline/DOTALL
modifier is pointless since you never use the dot metacharacter.
In .NET regexes, the character-class shorthand \w
is Unicode-aware, equivalent to [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]
. In Java it's ASCII-only -- [A-Za-z0-9_]
-- which seems to be more in line with the way you're using it (you could "dumb it down" in .NET by using the RegexOptions.ECMAScript
modifier).
So the actual translation would be more like this:
Pattern p = Pattern.compile("\\w+://([\\w!\"$.:@]+)(?:/(\\S*))?");