Can anyone please explain the usage of the character constant \000 and \xhh ie octal numbers and hexadecimal numbers in a character constant?
In C, octals are mostly used for bit fiddling. Can you be a little more specific?
Octal is base 8 (using digits 0-7) so each digit is 3 bits:
\0354 = 11 101 100
Hexadecimal is base 16 (using digits 0-9,A-F) and each digit is 4 bits:
\x23 = 0010 0011
Inside C strings (char arrays/pointers), they are generally used to encode bytes that can't be easily represented.
So, if you want a string which uses ASCII codes like STX and ETX, you can do:
char *msg = "\x02Here's my message\x03";
In C, strings are terminated by a character with the value zero (0). This could be written like this:
char zero = 0;
but this doesn't work inside strings. There is a special syntax used in string literals, where the backslash works as an escape sequence introduction, and is followed by various things.
One such sequence is "backslash zero", that simply means a character with the value zero. Thus, you can write things like this:
char hard[] = "this\0has embedded\0zero\0characters";
Another sequence uses a backslash followed by the letter 'x'
and one or two hexadecimal digits, to represent the character with the indicated code. Using this syntax, you could write the zero byte as '\x0'
for instance.
EDIT: Re-reading the question, there's also support for such constants in base eight, i.e. octal. They use a backslash followed by the digit zero, just as octal literal integer constants. '\00'
is thus a synonym for '\0'
.
This is sometimes useful when you need to construct a string containing non-printing characters, or special control characters.
There's also a set of one-character "named" special characters, such as '\n'
for newline, '\t'
for TAB, and so on.
Those would be used to write otherwise nonprintable characters in the editor. For standard chars, that would be the various control characters, for wchar it could be characters not represented in the editor font.
For instance, this compiles in Visual Studio 2005:
const wchar_t bom = L'\xfffe'; /* Unicode byte-order marker */
const wchar_t hamza = L'\x0621'; /* Arabic Letter Hamza */
const char start_of_text = '\002'; /* Start-of-text */
const char end_of_text = '\003'; /* End-of-text */
Edit: Using octal character literals has an interesting caveat. Octal numbers can apparantly not be more than three digits long, which artificially restricts the characters we can enter.
For instance:
/* Letter schwa; capital unicode code point 0x018f (octal 0617)
* small unicode code point 0x0259 (octal 1131)
*/
const wchar_t Schwa2 = L'\x18f'; /* capital letter Schwa, correct */
const wchar_t Schwa1 = L'\617'; /* capital letter Schwa, correct */
const wchar_t schwa1 = L'\x259'; /* small letter schwa, correct */
const wchar_t schwa2 = L'\1131'; /* letter K (octal 113), incorrect */