We do a lot of lexical processing with arbitrary strings which include arbitrary punctuation. I am divided as to whether to use magic characters/strings or symbolic constants.
The examples should be read as language-independent although most are Java.
There are clear examples where punctuation has a semantic role and should be identified as a constant:
File.separator
not "/"
or "\\"
; // a no-brainer as it is OS-dependent
and I write XML_PREFIX_SEPARATOR = ":"
;
However let's say I need to replace all examples of ""
with an empty string ``. I can write:
s = s.replaceAll("\"\"", "");
or
s = s.replaceAll(S_QUOT+S_QUOT, S_EMPTY);
(I have defined all common punctuation as S_FOO (string) and C_FOO (char))
In favour of magic strings/characters:
- It's shorter
- It's natural to read (sometimes)
- The named constants may not be familiar (
C_APOS
vs'\''
)
In favour of constants
- It's harder to make typos (e.g. contrast
"''" + '"'
withS_APOS+S_APOS + C_QUOT
) - It removes escaping problems Should a regex be
"\\s+"
or"\s+"
or"\\\\s+"
? - It's easy to search the code for punctuation
(There is a limit to this - I would not write regexes this way even though regex syntax is one of the most cognitively dysfunctional parts of all programming. I think we need a better syntax.)