views:

85

answers:

2

We do a lot of lexical processing with arbitrary strings which include arbitrary punctuation. I am divided as to whether to use magic characters/strings or symbolic constants.

The examples should be read as language-independent although most are Java.

There are clear examples where punctuation has a semantic role and should be identified as a constant:

File.separator not "/" or "\\"; // a no-brainer as it is OS-dependent

and I write XML_PREFIX_SEPARATOR = ":";

However let's say I need to replace all examples of "" with an empty string ``. I can write:

s = s.replaceAll("\"\"", "");

or

s = s.replaceAll(S_QUOT+S_QUOT, S_EMPTY);

(I have defined all common punctuation as S_FOO (string) and C_FOO (char))

In favour of magic strings/characters:

  1. It's shorter
  2. It's natural to read (sometimes)
  3. The named constants may not be familiar (C_APOS vs '\'')

In favour of constants

  1. It's harder to make typos (e.g. contrast "''" + '"' with S_APOS+S_APOS + C_QUOT)
  2. It removes escaping problems Should a regex be "\\s+" or "\s+" or "\\\\s+"?
  3. It's easy to search the code for punctuation

(There is a limit to this - I would not write regexes this way even though regex syntax is one of the most cognitively dysfunctional parts of all programming. I think we need a better syntax.)

+1  A: 

If the definitions may change over time or between installations, I tend to put these things in a config file, and pick up the information at startup or on-demand (depending on the situation). Then provide a static class with read-only interface and clear names on the properties for exposing the information to the system.

Usage could look like this:

s = s.replaceAll(CharConfig.Quotation + CharConfig.Quotation, CharConfig.EmtpyString);
Fredrik Mörk
I am pleased to see that I am not alone in advocating this level of verbosity on occasion
peter.murray.rust
I tend to get known in teams that I work with for using long names on things...
Fredrik Mörk
+1  A: 

For general string processing, I wouldn't use special symbols. A space is always going to be a space, and it's just more natural to read (and write!):

s.replace("String", " ");

Than:

s.replace("String", S_SPACE);

I would take special care to use things like "\t" to represent tabs, for example, since they can't easily be distinguished from spaces in a string.

As for things like XML_PREFIX_SEPARATOR or FILE_SEPARATOR, you should probably never have to deal with constants like that, since you should use a library to do the work for you. For example, you shouldn't be hand-writing: dir + FILE_SEPARATOR + filename, but rather be calling: file_system_library.join(dir, filename) (or whatever equivalent you're using).

This way, you'll not only have an answer for things like the constants, you'll actually get much better handling of various edge cases which you probably aren't thinking about right now

Edan Maor
I completely agree about libraries. I use them and I write them. There are, however, occasions when I need access to the primitives
peter.murray.rust
@peter.murray.rust: Yeah, definitely there are many cases when you *do* need the primitives (for example, when writing a library!). But I've noticed that many times when people ask about primitives, the answer is a higher level of abstraction, so I like to offer it as another option.
Edan Maor