How does the C/C++ compiler manipulate the escape character ["\"] in source code? How is compiler grammar written for processing that character? What does the compiler do after encountering that character?
Most compilers are divided into parts: the compiler front-end is called a lexical analyzer or a scanner. This part of the compiler reads the actual characters and creates tokens. It has a state machine which decides, upon seeing an escape character, whether it is genuine (for example when it appears inside a string) or it modifies the next character. The token is output accordingly as the escape character or some other token (such as a tab or a newline) to the next part of the compiler (the parser). The state machine can group several characters into a token.
escape character with a following character (like \n
) is a single character for C compiler - scanner presents it to parser as character token, so there is no need in special syntax rules in parser for escape character.
It generally escapes the following character:
- In a string literal or character literal, it means escape the next character.
\a
means 'alert' (flashing the terminal, beeping or whatever),\n
means 'linefeed',\xNUM
means an hexadecimal number for example. - If it appears as the last visible character before a newline, whether within a string or not (and even within a line-wide comment!), it acts as a line-continuation: The following newline character is ignored, and the next line is merged with the current line.
An interesting note on this subject is On Trusting Trust [PDF link].