tags:

views:

402

answers:

3

Hi,

I want to read in a file line by line, without knowing the line length before. Here's what I got so far:

int ch = getc(file);
int length = 0;
char buffer[4095];

while (ch != '\n' && ch != EOF) {
    ch = getc(file);
    buffer[length] = ch;
    length++;
}

printf("Line length: %d characters.", length);

char newbuffer[length + 1];

for (int i = 0; i < length; i++)
    newbuffer[i] = buffer[i];

newbuffer[length] = '\0';    // newbuffer now contains the line.

I can now figure out the line length, but only for lines that are shorter than 4095 characters, plus the two char arrays seem like an awkward way of doing the task. Is there a better way to do this (I already used fgets() but got told it wasn't the best way)?

--Ry

+1  A: 

You're close. Basically you want to read chunks of data and check them for \n characters. If you find one, good, you have an end of line. If you don't, you have to increase your buffer (ie allocate a new buffer twice the size of the first one and copy the data from the first one in the new one, then delete the old buffer and rename your new buffer as the old -- or just realloc if you're in C) then read some more until you do find an ending.

Once you have your ending, the text from the beginning of the buffer to the \n character is your line. Copy it to a buffer or work on it in place, up to you.

After you're ready for the next line, you can copy the "rest" of the input over the current line (basically a left shift) and fill the rest of the buffer with data from the input. You then go again until you run out of data.

This of course can be optimized, with a circular buffer for example, but this should be more than sufficient for any reasonable io-bound algorithm.

Blindy
+4  A: 

You can start with some suitable size of your choice and then use realloc midway if you need more space as:

int CUR_MAX = 4095;
char *buffer = (char*) malloc(sizeof(char) * CUR_MAX); // allocate buffer.
int count = 0; 
int length = 0;

while ( (ch != '\n') && (ch != EOF) ) {
    if(count ==CUR_MAX) { // time to expand ?
      CUR_MAX *= 2; // expand to double the current size of anything similar.
      count = 0;
      buffer = realloc(buffer, CUR_MAX); // re allocate memory.
    }
    ch = getc(file); // read from stream.
    buffer[length] = ch; // stuff in buffer.
    length++;
    count++;
}
.
.
free(buffer);

You'll have to check for allocation errors after calls to malloc and realloc.

codaddict
It's `realloc` not `relloc`.
Chinmay Kanchi
Just as a note, character-by-character reading is extremely slow. You should read it in big chunks (4-16k).
Blindy
@Blindy: premature optimisation...
Paul R
@Blindy: The standard library I/O does buffering, so this isn't (much) slower than reading in chunks.
jk
+1  A: 

You might want to look into Chuck B. Falconer's public domain ggets library. If you're on a system with glibc, you probably have a (non-standard) getline function available to you.

jamesdlin
Nice! I believe I can trust most UNIX-like systems to have glibc installed, so this is definitely a great way to read in lines.
Moreover, `getline` has been included in the most recent POSIX standard, so it *is* standard on unix now. Still no guarantee that it is included with c *per se*, however.
dmckee