tags:

views:

658

answers:

16

Dear all;

I will be coaching an ACM Team next month (go figure), and the time has come to talk about strings in C. Besides a discussion on the standard lib, strcpy, strcmp, etc., I would like to give them some hints (something like str[0] is equivalent to *str, and things like that).

Do you know of any lists (like cheat sheets) or your own experience in the matter?

I'm already aware of the books for the ACM competition (which are good, see particularly this), but I'm after tricks of the trade.

Thank you.

Edit: Thank you very much everybody. I will accept the most voted answer, and have duly upvoted others which I think are relevant. I expect to do a summary here (like I did here, asap). I have enough material now and I'm certain this has improved the session on strings immensely. Once again, thanks.

+3  A: 

The following functions can be used to implement a non-mutating strtok:

strcspn(string, delimiters)
strspn(string, delimiters)

The first one finds the first character in the set of delimiters you pass in. The second one finds the first character not in the set of delimiters you pass in.

I prefer these to strpbrk as they return the length of the string if they can't match.

MSN
+23  A: 
kmm
+1 for the format string example.
ezpz
If you are really using printf and not a wrapper macro which does additional things, puts/fputs are the functions you are looking for.
jbcreix
very informative: +1
dfa
Is strlcpy() standard C? It's probably important to know that for a competition. If not, be prepared to write it. Also, strcpy etc. is safe if you can prove that the destination is sufficiently long.
David Thornley
I'm personally fond of using strncpy, followed by writing a NUL to the end of the destination array. That way I know it wasn't over-written, and I know it's terminated. Since strlcpy is not (to my knowledge) yet a standard, I don't like to rely on it when I'm bouncing between environments.
Michael Kohne
@David Thorley: strlcpy is indeed not standard and that idiot Drepper refuses to put it in glibc. But it turns out really great, because the strlcpy I wrote is faster than strcpy. I don't like strncpy because it overwrites the whole array, instead of just what size I give.
kmm
+3  A: 

str[0] is equivalent to 0[str], or more generally str[i] is i[str] and i[str] is *(str + i).

NB

this is not specific to strings but it works also for C arrays

dfa
I don't find this incredibly important, though.
GMan
yes, it isn't; ignore it :)
dfa
Things like `3["hello"]` being equivalent to `"hello"[3]`, while true, are really just quirky trivia that nobody ever uses.
Adam Rosenfield
It's all because addition is commutative. x[y] is *(x + y) and y[x] is *(y + x)
smcameron
+3  A: 

The strn* variants in stdlib do not necessarily null terminate the destination string.

As an example: from MSDN's documentation on strncpy:

The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If count is greater than the length of strSource, the destination string is padded with null characters up to length count.

MSN
Wow... just... wow.
Dervin Thunk
Actually, it's not the complete strn* family, only strncpy. strncat got its own problems too though. Still, writing the null wouldn't necessarily make your program safer. What if you wanted to transfer the contents of the file /etc/passwd-archive/public-data, but your data gets truncated by strncpy to /etc/passwd?
kmm
Yes, the general problem of using strings safely in an unmanaged dynamic memory environment is itself a Master's thesis in and of itself. Assuming you still want to do it :)
MSN
+2  A: 

strtok is not thread safe, since it uses a mutable private buffer to store data between calls; you cannot interleave or annidate strtok calls also.

A more useful alternative is strtok_r, use it whenever you can.

dfa
strtok is a function from hell.toking out a string like this "asdf",,,,"fdsa"with , as a delimiter, gets you 2 results instead of 5
EvilTeach
strtok_r() may not be available in the contest. However, avoid strtok() if you can.
David Thornley
A: 

You could mention indexed addressing.

An elements address is the base address + index * sizeof element

SP
You should clarify: in C arrays and pointers, `* sizeof(element)` is done for you by the compiler, and the generated assembly will reflect the `sizeof(element)` factor. But what does this have to do with strings? `sizeof(char) == 1`
Chris Lutz
Just because the compiler does it for you and the size of a char happens to be one doesn't mean the implementation isn't important.
SP
`sizeof(char)` doesn't _happen_ to be 1 - it's specified in the standard.
Chris Lutz
You're right, nobody should ever know this information because characters are one byte.
SP
I'm not saying it doesn't matter, I'm saying it has nothing to do with strings.
Chris Lutz
A: 

A common error is:

char *p;
snprintf(p, 3, "%d", 42);

it works until you use up to sizeof(p) bytes.. then funny things happens (welcome to the jungle).

Explaination

with char *p you are allocating space for holding a pointer (sizeof(void*) bytes) on the stack. The right thing here is to allocate a buffer or just to specify the size of the pointer at compile time:

char buf[12];
char *p = buf;
snprintf(p, sizeof(buf), "%d", 42);
dfa
Your first example should never work, even if you use less than `sizeof(*p)` bytes, because `snprintf` won't copy a string into the pointer, but the memory that the pointer _points to_. A `char *p` is not the same as a `char p[]`. In your second example, `*p` is superfluous, as `buf` could be passed to `snprintf` directly to make the code clearer.
Chris Lutz
the former example works, try it with a compiler :) In the latter I know that `*p` is superflous but it serve to the purpose of show "how to allocate memory"
dfa
It works because `*p`, upon declaration, holds a random value, and therefore points to a random segment of memory that may happen to be writable, thus giving you the illusion that it works when writing small amounts of text to it, and thus why it breaks when you try to write too much.
Chris Lutz
Also, just tried it on my compiler. First example: `Bus error`. (GCC 4.0, OS X Leopard)
Chris Lutz
you can try with older compiler on an older UNIX, problably macosx randomizes segments to minimize bad things like buffer overflows, etc
dfa
Things like "works on your machine" or "you can make it work on an older UNIX" does NOT mean it's correct. Your code is undefined behavior according to the C standard, which means it might work, it might crash, or it might erase your hard drive. That's what undefined behavior is.
Adam Rosenfield
+2  A: 

confuse strlen() with sizeof() when using a string:

char *p = "hello!!";
strlen(p) != sizeof(p)

sizeof(p) yield, at compile time, the size of the pointer (4 or 8 bytes) whereas strlen(p) counts, at runtime, the lenght of the null terminated char array (7 in this example).

dfa
+1  A: 

I have found that the char buff[0] technique has been incredibly useful. Consider:

struct foo {
   int x;
   char * payload;
};

vs

struct foo {
   int x;
   char payload[0];
};

see http://stackoverflow.com/questions/295027

See the link for implications and variations

ezpz
A: 

I'd point out the performance pitfalls of over-reliance on the built-in string functions.

char* triple(char* source)
{
   int n=strlen(source);
   char* dest=malloc(n*3+1);
   strcpy(dest,src);
   strcat(dest,src);
   strcat(dest,src);
   return dest;
 }
AShelly
...along with the pitfalls of premature optimization ? :-)
Andrew Y
+1  A: 

I would discuss when and when not to use strcpy and strncpy and what can go wrong:

char *strncpy(char* destination, const char* source, size_t n);

char *strcpy(char* destination, const char* source );

I would also mention return values of the ansi C stdlib string functions. For example ask "does this if statement pass or fail?"

if (stricmp("StrInG 1", "string 1")==0)
{
    .
    .
    .
}
bn
`stricmp()` is not an ANSI C standard function, it's an extension provided by MS VC++ and perhaps some other implementations. In GCC, the function is called `strcasecmp()` (probably the one time I'll actually side with Microsoft on something), but is still not standard.
Chris Lutz
+1  A: 

perhaps you could illustrate the value of sentinel '\0' with following example

char* a = "hello \0 world"; char b[100]; strcpy(b,a); printf(b);

I once had my fingers burnt when in my zeal I used strcpy() to copy binary data. It worked most of the time but failed mysteriously sometimes. Mystery was revealed when I realized that binary input sometimes contained a zero byte and strcpy() would terminate there.

Rohin
Did your fingers heal?`</snark>`
Chris Lutz
Oh that was long ago .. since then I have even grown new ones ;-)
Rohin
+5  A: 

Abusing strlen() will dramatically worsen the performance.

for( int i = 0; i < strlen( string ); i++ ) {
    processChar( string[i] );
}

will have at least O(n2) time complexity whereas

int length = strlen( string );
for( int i = 0; i < length; i++ ) {
    processChar( string[i] );
}

will have at least O(n) time complexity. This is not so obvious for people who haven't taken time to think of it.

sharptooth
A: 

Pointers and arrays, while having the similar syntax, are not at all the same. Given:

char a[100]; char *p = a;

For the array, a, there is no pointer stored anywhere. sizeof(a) != sizeof(p), for the array it is the size of the block of memory, for the pointer it is the size of the pointer. This become important if you use something like: sizeof(a)/sizeof(a[0]). Also, you can't ++a, and you can make the pointer a 'const' pointer to 'const' chars, but the array can only be 'const' chars, in which case you'd be init it first. etc etc etc

A: 

If possible, use strlcpy (instead of strncpy) and strlcat.

Even better, to make life a bit safer, you can use a macro such as:

#define strlcpy_sz(dst, src) (strlcpy(dst, src, sizeof(dst)))
Sint
+2  A: 

kmm has already a good list. Here are the things I had problems with when I started to code C.

  1. String literals have an own memory section and are always accessible. Hence they can for example be a return value of function.

  2. Memory management of strings, in particular with a high level library (not libc). Who is responsible to free the string if it is returned by function or passed to a function?

  3. When should "const char *" and when "char *" be used. And what does it tell me if a function returns a "const char *".

All these questions are not too difficult to learn, but hard to figure out if you don't get taught them.

quinmars
Bear in mind also that string literals are const char *, and it's undefined behavior if you try to change them.
David Thornley