After doing some research, I've understood the issues now: Perl had to use a different symbol for pattern backreferences and replacement backreferences, and while java.util.regex.*
doesn't have to follow suit, it chooses to, not for a technical but rather traditional reason.
On the Perl side
(Please keep in mind that all I know about Perl at this point comes from reading Wikipedia articles, so feel free to correct any mistakes I may have made)
The reason why it had to be done this way in Perl is the following:
- Perl uses
$
as a sigil (i.e. a symbol attached to variable name).
- Perl string literals are variable interpolated.
- Perl regex actually captures groups as variables
$1
, $2
, etc.
Thus, because of the way Perl is interpreted and how its regex engine works, a preceding slash for backreferences (e.g. \1
) in the pattern must be used, because if the sigil $
is used instead (e.g. $1
), it would cause unintended variable interpolation into the pattern.
The replacement string, due to how it works in Perl, is evaluated within the context of every match. It is most natural for Perl to use variable interpolation here, so the regex engine captures groups into variables $1
, $2
, etc, to make this work seamlessly with the rest of the language.
References
On the Java side
Java is a very different language than Perl, but most importantly here is that there is no variable interpolation. Moreover, replaceAll
is a method call, and as with all method calls in Java, arguments are evaluated once, prior to the method invoked.
Thus, variable interpolation feature by itself is not enough, since in essence the replacement string must be re-evaluated on every match, and that's just not the semantics of method calls in Java. A variable-interpolated replacement string that is evaluated before the replaceAll
is even invoked is practically useless; the interpolation needs to happen during the method, on every match.
Since that is not the semantics of Java language, replaceAll
must do this "just-in-time" interpolation manually. As such, there is absolutely no technical reason why $
is the escape symbol for backreferences in replacement strings. It could've very well been the \
. Conversely, backreferences in the pattern could also have been escaped with $
instead of \
, and it would've still worked just as fine technically.
The reason Java does regex the way it does is purely traditional: it's simply following the precedent set by Perl.