tags:

views:

80

answers:

3

I was trying to check whether or not an alphabet perpended by a \ would form an escape character in C. What would be the easiest way to check this?

I tried to append "\" with ASCII of the character set but it failed

Edit: I dont want to manually append the characters. If I could somehow iterate within the ASCII values and append and then print to check, it would be great!

A: 

What about checking against all possible cases? For letters those are \a, \b, \f, \n, \r, \t, \v - not too many...

usta
`\' \" \? \\ `. `\xAB` `\123` `\uABCD` `\U00012345`
KennyTM
Yes I could do that or I could check the documentation. However the challenge is to figure out the iterator.
kunjaan
@KennyTM: Yes, I mentioned only for letters as I think that's what kunjaan meant by "an alphabet perpended by a \". I might have misinterpreted though.
usta
@kunjaan Then it's not possible unless you generated a separate C file for each escape-sequence candidate, ran a compiler on it and checked the compilation result.
usta
+1  A: 

I think OP may be confused and think it's possible to programmatically generate these string escape sequences within a C program and have them be specially interpreted (perhaps by printf or by the language environment itself), e.g.

char str[3] = "\";
str[1] = 'n';
printf(str);

This is not possible. All it will do is print the literal characters backslash and the letter "n". If you want to test whether an escape sequence is interpreted by your compiler, the only way to do this is to write out a .c file and run the compiler on it. However, the set of escape sequences is completely standardized, so there's no reason to test. Just read the language specification or your compiler's manual.

R..
A: 

Output of the script:

ascii letters allowed in escape sequences: a, b, e, f, n, r, t, u, v, x, E, U
Non-escape letters: A, B, C, D, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, V, W,
                       X, Y, Z, c, d, g, h, i, j, k, l, m, o, p, q, s, w, y, z

NOTE: '\U', '\x', '\u' by themselves do not form escape sequences. \, ', ", ? and digits are not considered due to they are not alphabetic. '\e' is GCC only.

The sequences are produced by compiling C code that contains the string "\a\b...(for all ascii letters)...\z" and parsing compiler warnings:

#!/usr/bin/env python
import re, string, subprocess, sys

def _find_non_escape_chars(compiler="cc -x c -".split(), verbose=False):
    # prepare C code to compile
    test_code = 'char *s = "%s";' % ''.join('\\'+c for c in string.ascii_letters)
    # compile it
    p = subprocess.Popen(compiler,
                         stdin=subprocess.PIPE,
                         stdout=subprocess.PIPE,
                         stderr=subprocess.STDOUT)
    stdout, _ = p.communicate(test_code)
    if verbose:
        print stdout
    # find all non-escape characters
    return set(re.findall(r"'\\(.)'", stdout))

def is_escape_char(c, non_escape=_find_non_escape_chars()):
    """Whether `c` letter may be present in an escape sequence in C.

    >>> f = is_escape_char
    >>> f("a")
    True
    >>> f("g")
    False
    """
    return c not in non_escape

def main():
    escape_chars = filter(is_escape_char, string.ascii_letters)
    print "ascii letters allowed in escape sequences:", ', '.join(escape_chars)
    print "Non-escape letters:", ', '.join(
        sorted(set(string.ascii_letters)-set(escape_chars)))

if __name__=="__main__":
    import doctest; doctest.testmod()
    main()
J.F. Sebastian