views:

44

answers:

1

I want to know what is the 'terminology name' of the character that designates a start of a literal in a lexing process.

For example:

  • a string starts and ends with an " character.
  • a regular expression literal - with an / character.
+2  A: 

I've always called them delimiters. That's as close as a "terminology name" as I can think of.

Frédéric Hamidi
For "simply quoted" things, delimiter might be ok, and so might "introductory quote". But literals can be introduced by all kinds of indicators: 0xDEADBEEF is a numeric literal with a leading hint, u"ABC" is often used for Unicode strings, and DEADBEEFh is a numeric literal with a *trailing* indicator. I don't think there's any particularly good name.
Ira Baxter
@Ira Baxter, you've used `hint` and `indicator` in your comment, and those are good names too :)
Frédéric Hamidi
@Frederic: I've written *lots* of lexers (think nearly a hundred) for different programming languages. Ultimately what distinguishes tokens isn't the "leading characters" but simply that they have non-intersecting sets of syntax. And this goes back to abstract computer science langauge theory: what defines a "langauge" (for this discussion, a token) is abstractly just the complete set of strings that make up the langauge (token). All that matters to distinguish one token type from another, is that the sets of abstract strings for each don't intersect. ....
Ira Baxter
@Frederic: continued... Most languages actually do allow such sets to intersect, and have a rule that when such intersection takes place, that the token is interpreted as being a member of just one. The classic such rule is for identifiers, typically [A-Z0-9]+ which clearly overlaps with keywords IF, GOTO, ... with the overlap rule that if a token string can be interpreted as both an identifier and a keyword, that it should be interpreted as a keyword. No "hint" or "indicator" in this case.
Ira Baxter