views:

555

answers:

6
+2  Q: 

Buffer size in C

When provided with a buffer size in C, how do I know how much is left and when do I need to stop using the memory?

For example, if the function I am writing is this:

void ascii_morse (lookuptable *table, char* morse, char* ascii, int morse_size) {

}

In this application I will be passed a string (ascii) and I will convert it to morse using some other function to convert each ascii char to morse. The problem I'm facing is how to make sure I am not exceeding the buffer size. I don't even know when to use the buffer size or how I decrease it everytime I use it.

Of course the output will be to morse (so i will be adding string to morse, but I guess I know how to do that, it is just the buffer size is what is hard to understand to me)

If you need any more information to understand the problem please tell me, I tried my best to explain it.

+1  A: 

You need to pass the buffer size along with the pointer.

int
ascii_to_morse(lookuptable *table,
               char* morse, int morse_size,
               char* ascii);

The buffer size is not necessarily the same as the current length of the string (which you can find using strlen).

The function as given above will read the ascii string (don't need to know the buffer size, so that is not passed) and writes into a buffer pointed to by morse, of size morse_size. It returns the number of bytes written (not counting the null).

Edit: Here's an implementation of this function which, while it fails to use the right values for morse code, shows how to manage the buffer:

typedef void lookuptable; // we ignore this parameter below anyway
// but using void lets us compile the code

int
ascii_to_morse(lookuptable *table,
               char* morse, int morse_size,
               char* ascii)
{
  if (!ascii || !morse || morse_size < 1) { // check preconditions
    return 0; // and handle it as appropriate
    // you may wish to do something else if morse is null
    // such as calculate the needed size
  }
  int remaining_size = morse_size;
  while (*ascii) { // false when *ascii == '\0'
    char* mc_for_letter = ".-"; //BUG: wrong morse code value
    ++ascii;
    int len = strlen(mc_for_letter);
    if (remaining_size <= len) { // not enough room
      // 'or equal' because we must write a '\0' still
      break;
    }
    strcpy(morse, mc_for_letter);
    morse += len; // keep morse always pointing at the next location to write
    remaining_size -= len;
  }
  *morse = '\0';
  return morse_size - remaining_size;
}

// test the above function:
int main() {
  char buf[10];
  printf("%d \"%s\"\n", ascii_to_morse(0, buf, sizeof buf, "aaa"), buf);
  printf("%d \"%s\"\n", ascii_to_morse(0, buf, sizeof buf, "a"), buf);
  printf("%d \"%s\"\n", ascii_to_morse(0, buf, sizeof buf, "aaaaa"), buf);
  return 0;
}
Roger Pate
so for example if i am passed a buffer size of 10 and the string is "hello world" do i say buffer-size-- everytime i read on of the chars ?
c2009l123
No, since strings in C are terminated by a null byte you can just use `strlen` to get the length. Worrying about buffer sizes applies to strings you're writing to.
Schwern
The morse_size is the size of the result. You have to count how many chars you put into 'morse' and stop when you get to morse_size -1 (since you want to reserve the last character for the nul terminator). When you read the chars from 'ascii' you just read until the end which will be the nul character.
nos
It's considered bad form giving code for homework questions. The OP can't use it anyway since these sites will no doubt be monitored, and it does the people that use it little good in the long run.
paxdiablo
i had the whole problem solved i just needed help with the buffer :) so that example was great :) i will be trying to implement and will ask more questions if needed, but thank you for now !
c2009l123
It doesn't solve the complete problem. If he had wanted to cheat, I can't stop him anyway, but I can show how to manage the buffer size in working code he can test, explore, and prove. That, IMHO, does much more good in the long run.
Roger Pate
If he had wanted to murder someone, you couldn't have stopped him either. But it would still be a bad idea to hand him the gun :-)
paxdiablo
+1  A: 

The buffer size cannot be inferred from the pointer alone. It needs to either be passed as an argument, or be somehow know (as from DEFINE values or other constants) or implicitly known... (this latter, implicit approach is "dangerous" for if the size is somehow changed but such changes are not reflected in places where the buffer is used...)

Alternatively, and more typically in the the case of input buffers (buffers which the function will read from), the end of the buffer may be marked by a special character or a sequence of such charcters.

mjv
it is passed, but how do i use it, do i decrease it each time i read a char or what?
c2009l123
the way you use the explicitly passed size of the buffer may vary. Your suggestion of decreasing by the number of character added to the buffer is workable. Another approach is to calculate, before anything is added maximum insertion point in the buffer and to check that this pointer will remain smaller than the current insertion pointer.
mjv
I've always wondered why the buffer size can't be determined from the pointer alone. It must be known by something or free() wouldn't work. Is there a technical reason why there couldn't be an "int allocated_to(void *ptr)" function? Is it just one of those holes in the standard C API?
Schwern
@Schwern, yes, indeed, this information exists somewhere/somehow, lest managing the heap would be a bloody mess ;-) However the C and C++ standards do not offer any requirements in this area, and therefore the way the size info if kept is implementation specific, and hence off-limit to [prudent and wise] programmers. Furhermore, this issue is not specific to the dynamically allocated blocs, we can imagine an application passing some pointer in the static memory areas, for which even the compiler wouldn't know. And, even with dynamically allocated blocs, the program may pass a pointer to...
mjv
... somewhere in the middle of the string, and/or have expectation that the called function would not use more than a specific length, even if the bloc itself is bigger. In a nutshell, may no so contrived ways that indicate that passing the length explicitly is a good idea!
mjv
A: 

One of the possible (slow) solutions is to allow function to handle NULL buffer pointer and return the required buffer size. Then call it second time with buffer of proper size

BostonLogan
Thanks, edited.
BostonLogan
Tags are C, not C++.
Pavel Minaev
+1  A: 
void ascii-morse (lookuptable *table, char* morse, char* ascii, int morse-size)

You have the size of the output buffer already passed in, by the looks of that prototype above.

ascii will no doubt be a null terminated string and morse will be the output buffer: morse_size (not morse-size as you have it, since that's not a valid identifier) will be how many characters you are allowed to write.

The pseudocode will be something like:

set apointer to start of ascii, mpointer to start of morse.
while apointer not at end of ascii:
    get translation from lookuptable, using the character at apointer.
    if length of translation is greater than morse_size:
        return an error.
    store translation to mpointer.
    add 1 to apointer.
    add length of translation to mpointer.
    subtract length of translation from morse_size.
if morse_size is zero:
    return an error.
store string terminator to mpointer.

You'll have to convert that to C and implement the lookup function but that should be a good start.

The pointers are used to extract from, and insert into, the relevant strings. For every character, you basically check whether there is enough room left in the output buffer for adding the morse code segment. And, at the end, you also need to check there's enough room for the string terminator character '\0';

The way in which you check if there is enough room is by reducing the morse_size variable by the length of the string you're adding to morse each time through the loop. That way, morse_size will always be the size remaining in the buffer for your use.

paxdiablo
ohh i think i get it but one more thing, so i will be adding to morse a char by char because that's how i translate the ascii to morse. how do i check each time i want to add a char that there is enough memory ? i mean if i was passed "hello world" buffer enough only for "hel" how do i know that i should stop there? shouldn't i be decreasing the buffer size everytime i am examining a char or something like that ?
c2009l123
See the last paragraph. You continuously reduce the morse_size variable by the length of the morse code segment you're adding. The instant you get a 3-character morse code segment and morse_size is only two (for example), you have an error condition. Likewise for the final string terminator character.
paxdiablo
+2  A: 

It sounds like there's some confusion about the "buffer". There is no buffer. morse-size is telling you how much memory has been allocated to morse (technically, the chunk of memory that morse points to). If morse-size is 20 then you have 20 bytes. This is 19 bytes of usable space, because strings are terminated by a null byte. You can think of morse-size as "maximum length of the string plus one".

You need to check morse-size to make sure you're not writing more bytes into morse than it can hold. morse is nothing more than a number pointing to a single spot in memory. Not a range, but a single spot. What's been allocated to morse comes after that. If you put more than that into morse you risk overwriting someone else's memory. C will NOT check this for you, this is the price of maximum performance.

Its like if you went to a theater and the usher tells you, "you can have seat A3 and the next 5" and then leaves. You have to be polite and not take 6 seats, somebody else was given A8.

Tools such as valgrind are invaluable to spot memory mistakes in C and keep your sanity.

Aren't strings in C a hoot? Welcome to the single largest root cause of bugs in the entire computing world.

Schwern
Wow. "There is no buffer" made me go all metaphysical, a la the Matrix's "There is no spoon" :-)
paxdiablo
"Putting the destination string before the source is bad form." ???? Do you consider the standard library functions "bad form"?
pmg
@pmg Since you asked, yes. :) I am not a native C programmer, so my style is not native C. The standard C library conventions were laid down over 30 years ago when the computing world was a very different place. They're pretty antiquated and out of step with the rest of the universe. I guess it depends on how much of your surrounding code follows C conventions. Like, if you're heavily invested in glib you should probably stick to dest, source.
Schwern
While I disagree with your opinion on dest-first (following the same order as `dest = source` is consistent), you back it up solidly.
Roger Pate
pmg
I pulled the bit about changing the calling convention. Not worth complicating the issue, memory allocation is complicated enough.
Schwern
*You can think of morse-size as "maximum length of the string minus one"*. I'd have said that `morse-size` is the max length, **plus** one. On account of the max string length (excluding nul-terminator) being morse-size **minus** one.
Steve Jessop
@Steve Right, fixed.
Schwern
@Roger Huh, I never made that connection. Donald Norman in "How Long Is Noon" related an argument with Henry Petroski about how 12 noon should be labeled. Petroski said it should be 12m which is consistent with a.m. (ante meridian) and p.m. (post meridian) and noon is the meridian. Its logically consistent and totally wrong. It only works you already think a certain way (and know what am/pm really mean). Not many people do. Everyone else will think it arbitrary and never form a gestalt about the interface (and in this case actively clash and think 12m is midnight).
Schwern
A: 

Another solution is instead of passing in a pre-allocated destination string to be written to, your function does the allocation and returns a pointer to that. This is a whole lot safer as the caller doesn't have to guess how much memory your function will need.

char *ascii2morse(const char *ascii, lookuptable *table)

You still have to allocate enough memory for the Morse code. Since Morse code isn't fixed length there's two strategies. The first is to simply figure out the maximum possible memory needed for the given length string (longest Morse sequence * number of characters in ascii) and allocate that. This might seem like a waste, but its what the caller will have to do for your original plan anyway.

The alternative is to use realloc to continually grow the string as you need it. You figure out how many bytes you need to encode the next character, reallocate that much and append it to the string. This might be slower, memory allocators are pretty sophisticated these days, but it will use exactly as much memory as you need.

BOTH avoid the trap where the user has to preallocate an unknown amount of memory and BOTH eliminate the unnecessary "user didn't allocate enough memory" error condition.

If you really wanted to save memory I'd store each dot/dash in the Morse code as 2 bits rather than 8 bits. You have three "words", short and long letter break. That's a minimum of 2 bits of space.

Schwern
But now you have a contract with your API consumers to release that memory. Workable, but ugly and fairly dangerous.
rpj
@rpj Can't the caller just free it?
Schwern