tags:

views:

178

answers:

8

I have a function like:

// string is a null-terminated char array. Replace all a in the string with b
void ReplaceCharInString(char *string, char a, char b)
{
// loop over the string char by char, to find all "a"s and replace them with "b"
}

I'm doing the defensive programming. The problem is the implementation replies on the client to really pass in an array of chars. If an address of a single char is passed in, the program definitely runs into a bad state(probably crashes). How do I check and avoid this? (I know if I pass in the std::string object, the problem goes away of course)

+12  A: 

No, you cannot check this and have to trust the user of the function to pass an actual correctly null-terminated string.

This is also the way all the C standard library functions like strcpy(), strlen(), ... work.

sth
This is what "unmanaged" code infamous for.
Jaywalker
And usually you have all the rights to crash there, because if it happened, the program is already in crippled state.
ruslik
@Jaywalker: Making assumptions about the validity of your data is not really a problem. There is a place for many different programming paradigms.
Ed Swangren
A: 

Best is to use std::string.

Armen Tsirunyan
Please note that the question has a 'C' tag. That's why your answer does not fit.
harper
@harper: It also happens to be marked with a C++ tag :)
Armen Tsirunyan
A: 

The function just gets a pointer to somewhere in memory. It's defensive programming practice to check that it is not NULL. This is a usually value if the caller has "forgotten" to allocate the memory or some allocation failed.

There are more checks for invalid char arrays possible, but these are platform dependent.

At least you cannot distinguish between pointer to char array and pointer to char.

harper
Checking against NULL is not defensive, it's just wasteful clutter.
R..
@R Thanks for your opinon. As I already stated and even if it doesn't give a check for Eric's question: there are good reasons to treat NULL pointers special. There are a lot C functions, that return a NULL pointer to indicate a non-success. Example: strstr. Others can fail like malloc. It's just a defensive approach when the program doesn't crash, if a NULL pointer is passed as argument.
harper
A: 

Your prototype is wrong. When you pass an array to a function, you always must pass the length of the array since inside the function you never know the size of the array (it is always the size of the pointer). So you should declare an int variable for the size of the array so that the declaration itself makes explicit the use of an array.

void ReplaceCharInString(char *string, int len, char a, char b)

This way you do not depend on reading up to NULL character of string input arg, and just use the length. If you can not do this, and you have to keep the declaration as is, then you should have an explicit contract on what the users of the function should do, to not have problems.Then the responsibility is delegated to the caller of the function to honor the contract

Though as long as you're only replacing single characters with another single character and stop on null, you don't need the length.
Roger Pate
@Roger:From the post, he wants to start reading arg string upto NULL terminating character.If the user passes in the address of a single char, if he does that (i.e. start reading up to NULL) he'll get a mem corruption.By using length, he does not need to depend on a NULL character and the caller has been explicitely been notified that the function expects an array
Providing the length does not help in the case that the length of the buffer is provided by the client (along with the potentially suspect data).
CodeButcher
@user: It would be silly for the caller to pass in a single character with a length of 1 instead of doing `if (c == a) c = b;`.
Roger Pate
@CodeButcher:You have a point.But in a function it is not possible to use sizeof to get the size of a passed array.It is standard to pass in the length.If the length is wrong, then the responsibility is certainly of the caller. I think Eric wants just to make sure that if the function crashes, there is a way to delegate the responsibility in the calling code and not the defensive part of his function
@Roger:No it is not silly.But now the implementation of the function does not depend on the memory contents of the string passed in (i.e. if it is really an array).
Why the downvote?There should be an explanation otherwise, how would the downvoted be of benefit by learning?
A: 

You really can't trust user-inputted data at all. Using std::string is an improvement, since it prevents buffer overflows, but it's still not entirely safe. What's to stop the user from inputting a 10GB string that consumes all of your memory?

You really need to validate the input when you first receive it from the user, (whether it's coming from a socket, or through stdin.) For example, you can enforce a maximum number of bytes to read, or ensure that all input characters are in the ASCII range.

If you have to use C-strings, you can memset the input buffer to 0, and make sure the last character in the buffer is always a NULL byte, to ensure that your string is properly null-terminated.

Charles Salvia
+1  A: 

No you can't check if you're running out of the allocated memory if the string is not null terminated.
As sth said C standard library functions like strcpy(), strlen(), also rely on the fact that the string is valid (null terminated).

... however, one solution could be Mudflap. It is costly (in term of performance) and is only valid with GCC.
Mudflap is a library that instruments all pointer/array operations. With this you will be able to check if a specific location is valid memory or not.

In fact the only reason I see for using Mudflap is if security is a very big issue for your application. But even in this case GCC provides a better alternative against buffer overflow (see -fstack-protector[-all]).

Ugo
A: 

You avoid this by documenting your code properly. Nobody but a complete beginner at C (who should not be writing serious code) would ever take the address of a single char variable (or even declare such a variable), so your fear is not very relevant.

One way in C99 to partly guard against this is to declare the function as:

void ReplaceCharInString(char string[static 2], char a, char b);

This means that string must be a valid pointer to at least 2 chars, which would protect against the single-char-pointer issue you mentioned at the cost of making the function fail to support the empty string. But it still cannot protect you from incorrect malloc, etc.

Writing safe C is really a matter of proper code documentation and auditing, and firing incompetent people (or better yet never hiring them to begin with). The language is designed to be light and efficient. If you go trying to put your own huge safety harness layers on it, you'll probably do a worse job than the people who designed higher-level languages, and then you might as well have just used a higher-level language to begin with.

R..
A: 

With C++, you should both use function overloading and make sure no one will send you the address of a char.

void ReplaceCharInString(std::string & p_string, char p_a, char p_b)
{
   for(size_t i = 0, iMax = p_string.size(); i < iMax; ++i)
   {
      // replace if needed : p_string[i]
   }
}

void ReplaceCharInString(char & p_char, char p_a, char p_b)
{
   // replace if needed : p_char
}

template<size_t size>
void ReplaceCharInString(char (& p_string)[size], char p_a, char p_b) ;
{
   for(size_t i = 0; i < size; ++i)
   {
      // replace if needed : p_string[i]
   }
}

With only those three functions (and no function taking char * as its first parameter), no one will be able to call them with a raw char * :

void foo()
{
   char         a[4] ;
   std::string  b  ;
   char         c  ;
   char *       d ;

   // initialize a, b, c and d to some values

   ReplaceCharInString(a, 'x', 'z') ; // compiles
   ReplaceCharInString(b, 'x', 'z') ; // compiles
   ReplaceCharInString(c, 'x', 'z') ; // compiles
   ReplaceCharInString(d, 'x', 'z') ; // COMPILATION ERROR
}

So there's no way someone will call your function with the address of a single char.

Anyway, in C++, there are very few reasons to use char * as a string (you would use instead a std::string), and even less a char * as an address to a single char (you would use instead a char &), so no one will complain about the lack of support for char * in the functions above.

paercebal