views:

118

answers:

5

I'm using popen to read output from shell commands. I will use fgets to read line by line. My question is how to choose the best buffer size for my char* buffer? I remember from a professor telling us to include <limits.h> and use LINE_MAX for such things. It works fine on my Mac, but there's no LINE_MAX on Linux.

This mailing list archive poses the same question, but no answer to my question http://bytes.com/topic/c/answers/843278-not-able-locate-line_max-limits-h

A: 

You could use malloc() and expand if necessary, or use the source and look at how a GNU utility does it.

Vince
okay I'll check out a GNU utility. I'm using malloc, but only once, and reusing the same line buffer.
Derrick
I always look for code in GNU or good open source projects. Or you could grow the heap space dynamically (to a point) but this can be slow (to copy everything back).
Vince
+3  A: 

When <limits.h> does not define LINE_MAX, look at _POSIX2_LINE_MAX, which is required to be at least 2048. I usually use 4096.

Also look for the (new) POSIX functions getline() and getdelim() - both at the same URL. These allocate memory as necessary.

Jonathan Leffler
+4  A: 

man getline

Also see http://www.gnu.org/s/libc/manual/html_node/Line-Input.html and the discussion of getline() vs. fgets() vs. gets(). Has been subject on SO more often than I can count as well.

hroptatyr
A: 

check the line for an '\n', if not exists expand the buffer before you call the next fgets.

You also need to check `feof()` if there's no `'\n'`, to account for the corner-case of the last line in the file not having a trailing newline.
caf
A: 

POSIX systems have getline which will allocate a buffer for you.

On non-POSIX systems, you can use Chuck B. Falconer's public domain ggets function, which is similar. (Chuck Falconer's website is no longer available, although archive.org has a copy, and I've made my own page for ggets.)

jamesdlin
It's also pretty easy to implement a portable and fast `getline`, with full support for embedded null characters just like the original GNU version, using only `realloc`, `fgets`, `memset`, and `memchr`. This is probably a lot better than `ggets`, which seems to have broken behavior for newlines/end of file and no way of handling embedded nulls, but it really depends on your application and what you need.
R..
@R..: AFAIK there aren't any EOF issues with `ggets` anymore, and while it's true it doesn't handle embedded NULs, I don't think that's a common use case. (It's also not something that's directly supported by `fgets`, and I'm not sure how you would build something around `fgets` that distinguishes embedded NUL bytes from the actual end. It certainly doesn't seem as trivial as you make it out to be.)
jamesdlin
How in the world do you distinguish between a final line ending with a newline and a final line missing a newline, using `ggets`? If you can't, then it's a lossy function. The loss may not matter for many uses, but I still consider it a major limitation. As for `fgets` and embedded nulls, it's easy. You memset your buffer with `'\n'` before calling `fgets` and then searching for `'\n'` with `memchr` tells how many bytes were read.
R..
Good point about the lossiness of `ggets`. OTOH, I've personally seen way more cases where `fgets` consumers strip off the trailing newline incorrectly than I've seen cases where they care about preserving a missing `'\n'`. As for detecting embedded NULs, clever, but a pathological implementation could fill the entire buffer on every non-empty read.
jamesdlin
No, the standard specifies what `fgets` does, which is formally equivalent to making repeated calls to `fgetc` and storing the results in the buffer until it's full or `\n` is encountered. Writing past that point is not following the specification.
R..
@R..: Can you point me to where it says so, because 7.19.7.2 in the C99 standard doesn't mention anything like that.
jamesdlin
I was going by the documentation in POSIX which is aligned with ISO C99. Specifying everything in terms of `fgetc` and `fputc` may be a POSIXism, but the citation you provided clearly specifies what is written into the array ("at most one less than the number of characters specified by n" followed by a terminating null character). Standard library functions can't just randomly clobber memory outside of what they're specified to do.
R..
@R..: That clearly specifies what's *read*, not what's *written*. I don't interpret the language of 7.19.7.2 as prohibiting an `fgets` implementation from clearing the rest of the buffer after the terminating NUL.
jamesdlin
Um, the specification clearly is about what's written to memory; otherwise a conformant implementation could just read and discard all the characters. Unless the specification says that the contents of the rest of the buffer are unspecified after the call to `fgets`, then `fgets` **cannot** clobber it.
R..
@R..: The specification states that the characters are *read into the buffer*, so it can't discard them. The limitation of "at most n - 1" characters is what's read from the stream; it's not a limitation on writes beyond that. Moreover, even if it were, always filling the remaining portion of the buffer always would be within the "at most n - 1" bound. I don't see why the specification wouldn't give vendors some latitude here. For example, a system might want to write, say, 4 bytes at a time without having to read back the old contents.
jamesdlin
It might want to, but it would be incorrect and non-conformant. An implementation can't just go clobbering memory outside of what it's specified to do. As an extreme case take this thought experiment: a conforming application could, when reading a file the application itself generated, pass a value larger size value than the actual buffer size to `fgets`, knowing the maximum length of any actual line in the file. This would not be a good idea on multitasking/multiuser systems where another process could modify the file, but in isolation, it's perfectly valid.
R..