views:

103

answers:

3

Hi, I need a function like string ToLiteral(string input) from this post. Such that

char *literal = to_literal("asdf\r\n");

Would yield literal ==> "asdf\\r\\n".

I've googled around, but not been able to find anything (guess that I've must be using the wrong terms). However, I assume that a library with this functionality must be out there somewhere...

Thank you for the interresting answers. Googling "c string escape function" by the way seems to be the key to obtaining even more examples and GLIB provides g_strescape () which seems to be exactly what I need.

A: 

I think you are confusing the terminology, you can initialize a pointer to char(acter) like this:

char *literal = "asdf\r\n"

Amendment: However, C Strings are capable of using escape quotes for example:

char *literal = "\basdf\x1b\r\n\v\t";

That will print out

<backspace>asdf<escape-character><carriage-return><linefeed><vertical-tab><tab>

Those characters will not be shown depending on the console capabilities, you may see an arrow for the escape, and a spacing for the tab... you can get around this by using a simple logic, for every \ encountered, insert another \ such that it will display

asdf\\r\\n

something like the following code should suffice:

void ToLiteral(const char *pStr){
    char *p = (char*)pStr;
    while (*p){
       /* if (*p == '\\') putchar('\\');  */
       /* PERFORM THE LOOK UP */
       putchar(*p++);
    }
}

But looking at it, it did not feel right as the pointer was holding the actual \n and \r so it might be easier to use a look-up table that compares the actual hexadecimal code for the escape sequences and to display the appropriate code... the lookup table could be something like this

struct LookUp{
    int codeLiteral;
    char *equivCodeLiteral;
};

struct LookUp look[] = { { 0xa, "\\r"}, { 0xd, "\\n" }, { 0x9, "\\t" } };
tommieb75
Looking at the referenced post, I do not think this answers his question. He wants to convert a string such as "asdf\r\n" into a string that would produce the output expected from the string "asdf\\r\\n"
kbrimington
Yes, that's true, but view the post he's referencing. He apparently wants to turn things likes tabs and newlines into their respective escape sequences, something like this I suppose: "Hello\nworld" => "Hello\\nworld"
Brian
Caveat Emptor: The code is not exactly fool proof I'll admit - this was top of my head... so be careful! :)
tommieb75
+4  A: 

There's no built-in function for this, but you could whip one up:

/* Expands escape sequences within a C-string
 *
 * src must be a C-string with a NUL terminator
 *
 * dest should be long enough to store the resulting expanded
 * string. A string of size 2 * strlen(src) + 1 will always be sufficient
 *
 * NUL characters are not expanded to \0 (otherwise how would we know when
 * the input string ends?)
 */

void expand_escapes(char* dest, const char* src) 
{
  char c;

  while (c = *(src++)) {
    switch(c) {
      case '\a': 
        *(dest++) = '\\';
        *(dest++) = 'a';
        break;
      case '\b': 
        *(dest++) = '\\';
        *(dest++) = 'b';
        break;
      case '\t': 
        *(dest++) = '\\';
        *(dest++) = 't';
        break;
      case '\n': 
        *(dest++) = '\\';
        *(dest++) = 'n';
        break;
      case '\v': 
        *(dest++) = '\\';
        *(dest++) = 'v';
        break;
      case '\f': 
        *(dest++) = '\\';
        *(dest++) = 'f';
        break;
      case '\r': 
        *(dest++) = '\\';
        *(dest++) = 'r';
        break;
      case '\\': 
        *(dest++) = '\\';
        *(dest++) = '\\';
        break;
      case '\"': 
        *(dest++) = '\\';
        *(dest++) = '\"';
        break;
      default:
        *(dest++) = c;
     }
  }

  *dest = '\0'; /* Ensure nul terminator */
}

Note that I've left out translation of an escape sequence for the "escape" character, since this isn't standardized in C (some compilers use \e and others use \x). You can add in whichever applies to you.

If you want a function that allocates your destination buffer for you:

/* Returned buffer may be up to twice as large as necessary */
char* expand_escapes_alloc(const char* src)
{
   char* dest = malloc(2 * strlen(src) + 1);
   expand_escapes(dest, src);
   return dest;
}
Tyler McHenry
all to easy for the caller to call it wrong - i would malloc the result and return it
pm100
That's another way to write it, but nearly all of the C standard library string functions are written this way.
Tyler McHenry
It would probably be friendly for `expand_escapes()` to encode any other non-printing character as an octal escape. That handles the common ASCII ESC character as `\033`. Of course, your output buffer could be up to 4 times the size of the input, so it may make sense to allocate that much then realloc before returning. You could also handle embedded NUL characters with a length parameter, where a negative length would mean stop at the first NUL.
RBerteig
All of those are great ideas, but I'll leave some of it as an exercise to the OP. ;) (hint: you can determine if a character is printable or not using the `isprint` function in `ctype.h`)
Tyler McHenry
I'm right behind leaving the details as an exercise... I just thought they were worth mentioning as I've been annoyed by other string escaping logic that did the minimal amount of work and didn't actually give you something you could safely print on a terminal or paste into a C source file.
RBerteig
@Tyler: "C standard library string functions are written this way". To a point. The length of the result is semi-unpredictable (in the sense that it's deterministic, but if you can predict the length then you clearly know so much that you hardly need the function). As you indicate there's an easy upper bound, but double the input length feels like overkill. So I'd do it like `snprintf` - pass in a buffer and a length, return the number of bytes that would be written had the length been sufficient. Caller can still pass double-the-length-plus-1 if they can't be bothered calling it twice.
Steve Jessop
+1  A: 

I think I'd do the conversion something like this:

// warning: untested code.
void make_literal(char const *input, char *output) { 
    // the following two arrays must be maintained in matching order:
    static char inputs[] = "\a\b\f\n\r\t\v\\\"\'";
    static char outputs[] = "abfnrtv\\\"\'";

    char *p, *pos;

    for (;*input;input++) {
        if (NULL!= (pos=strchr(inputs, *input))) {
            *output++ = '\\';
            *output++ = outputs[pos-inputs];
        }
        else
            *output++ = *input;
    }
    *output = '\0';
}

In theory, this could be a bit slower than (for one example) Tyler McHenry's code. In particular, his use of a switch statement allows (but doesn't require) constant time selection of the correct path. In reality, given the sparsity of the values involved, you probably won't get constant time selection, and the string involved is so short that the difference will normally be quite small in any case. In the other direction, I'd expect this to be easier to maintain (e.g., if you want to support more escape sequences, adding them should be pretty easy as long as the form remains constant).

Jerry Coffin
"the following two arrays must be maintained in matching order" - so initialize them with the `{}` syntax, and line up corresponding entries vertically ;-)
Steve Jessop
@Steve: that's certainly a reasonable possibility...
Jerry Coffin