tags:

views:

928

answers:

16

I know that buffer overruns are one potential hazard to using C-style strings (char arrays). If I know my data will fit in my buffer, is it okay to use them anyway? Are there other drawbacks inherent to C-style strings that I need to be aware of?

EDIT: Here's an example close to what I'm working on:

char buffer[1024];
char * line = NULL;
while ((line = fgets(fp)) != NULL) { // this won't compile, but that's not the issue
    // parse one line of command output here.
}

This code is taking data from a FILE pointer that was created using a popen("df") command. I'm trying to run Linux commands and parse their output to get information about the operating system. Is there anything wrong (or dangerous) with setting the buffer to some arbitrary size this way?

+3  A: 

Character encoding issues tend to surface when you have an array of bytes instead of a string of characters.

Tomalak
Unfortunately std::string does not help in this matter either, but there is of course wstring...
divideandconquer.se
wstring also doesn't care about encoding unfortunately
Johannes Schaub - litb
+7  A: 

The memory management etc needed to grow string (char array), if necessary, is kinda boring to reinvent.

divideandconquer.se
This is not the fault of C-style strings. An std::string implementation may use C-style strings (in fact, most use a combination of C-style and Pascal-style strings), and it grows and shrinks automatically.
strager
um, that was his point. c++ hides the "boring" aspects of the memory management required around c-strings.
Evan Teran
"This is not the fault of C-style strings." How is this not the fault of C-style strings?
Max Lybbert
+6  A: 

There is no way to embed NUL characters (if you need them for something) into C style strings.

Tomalak
hm, is it possible with std::string? And if, how does string::c_str() work?
quinmars
I haven't tried it, but I think it's possible with std::string. string::c_str() will return a character pointer to a C string with an embedded null char which any C-style code will interpret at the end of the string.
Ferruccio
Ah, yes true. I was just a bit confused :). That you can have NULs in the middle of a string, doesn't exclude that the std::string saves an extra terminating NUL internally.
quinmars
A: 

c strings have opportunities for misuse, due to the fact that that one has to scan the string to determine where it ends.

strlen - to find the length, scan the string, until you hit the NUL, or access protected memory

strcat - has to scan to find the NUL, in order to determine where to begin concatenating. There is no knowledge within a c string, to tell if there will be a buffer overrun or not.

c strings are risky, but generally faster than string objects.

EvilTeach
strncat can be used to prevent overruns.
SoapBox
A "string object" may be implemented exactly as a C-string. I'm sure the OP is looking at the concept of C-style strings and not their actual use in C.
strager
@strager: the concept of C-style strings is their actual use in C.
Max Lybbert
+1  A: 

I think IT IS OKAY to use them, people've been using them for years. But I would rather use std::string if possible because 1) you don't have to be so cautious every time and can think about problems of your domain, instead of thinking that you need to add another parameter every time...memory management and that kinda stuff...it is just safer to code on a higher level... 2) there are probably some other small concerns which are not big deal but still...like people already mentioned...encoding, unicode...all those "related" kinda stuff people creating std::string thought of...:)

Update

I worked on a project for half a year. Somehow I was stupid enough to never compile in release mode before delivery....:) Well...luckily there was just one error I found after 3 hours. It was a very simple string buffer overrun.

badbadboy
Absolutely agree. If all Bill's trying to do is parse the output of a *nix command, C++ strings are thousands of times better for this. In fact, Stroustrup's got an example of something similar in one of his FAQ. Perl would also shine for this kind of application.
Joe Pineda
+14  A: 

Not having the length accessible in constant-time is a serious overhead in many applications.

Will Dean
You could store the begin and end pointer, if it's an issue.
Jasper Bekkers
It is no longer "C-style strings" in that case, but a new kind of object.
bortzmeyer
I think he means store it in a temp variable before you use the string. There's a popular example that Joel gave on his blog that talks about this issue, where he's using a for loop and getting the length of a string in the condition. This makes the loop O(n^2), when it could be O(n).
Bill the Lizard
+13  A: 

C strings lack the following aspects of their C++ counterparts:

  • Automatic memory management: you have to allocate and free their memory manually.
  • Extra capacity for concatenation efficiency: C++ strings often have a capacity greater than their size. This allows increasing the size without many reallocations.
  • No embedded NULs: by definition a NUL character ends a C string; C++ string keep an internal size counter so they don't need a special value to mark their end.
  • Sensible comparison and assignment operators: even though comparison of C string pointers is permitted, it's almost always not what was intended. Similarly, assigning C string pointers (or passing them to functions) creates ownership ambiguities.
efotinis
And the fact that many "obvious" string operations seem to compile, but do something completely different than expected (== compares the pointers to the strings, not the strings themselves. And + doesn't concatenate)
jalf
and of course, assignment doesn't do what you might expect either. :)The fundamental problem with C-style strings is that they just don't behave as strings.
jalf
No reason why you couldn't have extra capacity on a c-string. Just allocate more than you need an put an early NUL.
Evan Teran
@Evan Teran: sure you could over-allocate, but then you'd need a separate variable to keep track of the capacity. std::basic_string has this built-in.
efotinis
@jalf: Nice one, I'm adding that too. Thanks!
efotinis
You can pass a std::string as a return value from your module / class and not have to worry about the "so who has to delete this buffer?" issue. The many "solutions" to this problem with C strings lead to much interface complexity in many libraries.
Tom Leys
A: 

Imho, the hardest point of cstrings is the memory management, because you need to be carefully if you need to pass a copy of a cstring or if you can pass a literal to a function, ie. will the function free the passed string or will it keep a reference longer then for the function call. The same applies to cstring return values.

So without big effort it is not possible to share cstring copys. This ends in many cases with unnecessary copiess of the same cstring in the memory.

quinmars
A: 

This question is not really have an answer.
If you writing in C what over options you have ?
If you writing in C++ why are you asking ? What is the reason not to use C++ primitives ?
The only reason i can think is: Linking C and C++ code and have char * somewhere in interfaces. It sometimes just easy to use char * instead doing conversion back and forward all the time (especially if it's really 'good' C++ code that have 3 different C++ string objects types).

Ilya
If you write in C, you can always declare your own type as a struct, with all the operations (length_of, etc) you need and your own conventions (for instance that the encoding is UTF-32). But C does not make it very convenient.
bortzmeyer
Actually you are right :) http://bstring.sourceforge.net/. I was about to stay nobody do this, but decided to search a little bit first. Wise decision it was :)
Ilya
+5  A: 

Well, to comment on your specific example, you don't know that the data returned by your call to df will fit into your buffer. Never trust un-sanatized input into your application, even when it is supposedly from a known source like df.

For example, if a program named 'df' is placed somewhere in your search path so that it is executed instead of the system df it could be used to exploit your buffer limit. Or if df is replaced by a malicious program.

When reading input from a file use a function that lets you specify the maximum number of bytes to read. Under OSX and Linux fgets() is actually defined as char *fgets(char *s, int size, FILE *stream); so it would be safe to use on those systems.

Brian C. Lane
A: 

C strings, like many other aspects of C, give you plenty of room to hang yourself. They are simple and fast, but unsafe in the situation where assumptions such as the null terminator can be violated or input can overrun the buffer. To do them reliably you have to observe fairly hygenic coding practices.

There used to be a saying that the canonical definition of a high-level language was "anything with better string handling than C".

ConcernedOfTunbridgeWells
+2  A: 

In your specific case, it's not the c-string that dangerous, so much as the reading an indeterminate amount of data into a fixed-size buffer. Don't ever use gets(char*) for example.

Looking at your example though, it doesn't seem at all correct - try this:

char buffer[1024];
char * line = NULL;
while ((line = fgets(buffer, sizeof(buffer), fp)) != NULL) {
    // parse one line of command output here.
}

This is a perfectly safe use of c-strings, although you'll have to deal with the possibility that line does not contain an entire line, but was rather truncated to 1023 characters (plus a null terminator).

Eclipse
Thanks. My example code wouldn't compile. I was more concerned with the char buffer issue and wrote the while loop (very lazily) from memory.
Bill the Lizard
+7  A: 

You may know that today 1024 bytes is enough to contain any input, but you don't know how things will change tomorrow or next year.

If premature optimization is the root of all evil, magic numbers are the stem.

John Dibling
A: 

No Unicode support is reason enough these days...

Hoffmann
A: 

Another consideration is who will be maintaining your code? What about in two years? Will that person be as comfortable with C-stlye strings as you are? As the STL gets more mature, it seems like people will be increasingly more comfortable with with STL strings than with C-style strings.

JohnMcG
+11  A: 

There are a few disadvantages to C strings:

  1. Getting the length is a relatively expensive operation.
  2. No embedded nul characters are allowed.
  3. The signed-ness of chars is implementation defined.
  4. The character set is implementation defined.
  5. The size of the char type is implementation defined.
  6. Have to keep track separately of how each string is allocated and so how it must be free'd, or even if it needs to be free'd at all.
  7. No way to refer to a slice of the string as another string.
  8. Strings are not immutable, meaning they must be synchronized separately.
  9. Strings cannot be manipulated at compile time.
  10. Switch cases cannot be strings.
  11. The C preprocessor does not recognize strings in expressions.
  12. Cannot pass strings as template arguments (C++).
Walter Bright