tags:

views:

130

answers:

7

Hey everyone,

In C, is there a way to read a text file line by line without knowing how much space to allocate for it?

here's an example of what I mean:

fgets(line, <dynamic line size>, fileHandle);

Thanks for the help!

+2  A: 

Nothing automatic. You need to keep growing your buffer and calling fgets until you get the newline or the EOF.

// NOTE: not production ready as does not handle memory allocation failures
size_t alloced = 128;
char *p = malloc(alloced);
char *walk = p;
size_t to_read = len;

do
{
    size_t len;
    if (fgets(walk, to_read, fp) == NULL)
        break;

    len = strlen(walk);
    if (walk[len - 1] == '\n')
        break;

    to_read = allocated;
    alloced *= 2;

    p = realloc(p, allocated);
    walk = p + to_read;
}
R Samuel Klatchko
This fails on lines with embedded NULs.
Matthew Flaschen
Text doesn't have embedded nuls
nos
How so, @nos? Yes, C strings use NUL terminators. However, NUL is a valid Unicode and ASCII character, and encodings of it usually contain embedded NUL bytes.
Matthew Flaschen
@MatthewFlaschen - agreed, but fgets() by itself does not work well for lines with embedded NULs (no way to distinguish between a NUL read from the fp vs the NUL written at the end). So if your code is okay with using fgets, it should be fine to use this technique.
R Samuel Klatchko
A: 

Not directly.

To solve this, you'll have to be prepared to handle fgets failing if the buffer isn't big enough. Start by mallocing line to a reasonable initial buffer (256 chars, say), then realloc to twice that size every time fgets returns NULL.

JSBangs
+1  A: 

If you have glibc or another libc that supports POSIX (2008), you can use getline:

ssize_t getline(char **lineptr, size_t *n, FILE *stream);

getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-terminated and includes the newline character, if one was found.

If *lineptr is NULL, then getline() will allocate a buffer for storing the line, which should be freed by the user program. (The value in *n is ignored.)

kaizer.se
This is [standard POSIX](http://www.opengroup.org/onlinepubs/9699919799/functions/getline.html), and it's supported by e.g. [FreeBSD](http://www.unix.com/man-page/FreeBSD/3/getline/) and [NetBSD](http://netbsd.gw.com/cgi-bin/man-cgi?getline+3+NetBSD-current).
Matthew Flaschen
A: 

For your 'dynamic line size', just use whatever maximum memory you want to use. If the line is not complete, process the part you used, and do some additional operations until you reach the end of the line. Use strlen to help determine if you've read an entire line.

void ProcessFile( FILE *fp )
{
    int len = 0;
    char lineBuf[ MAX_SIZE ];

    while( !feof(fp) )
    {
        do
        {
            if( fgets( lineBuf, MAX_SIZE, fp ) > 0 )
            {
                fputs( lineBuf, STDOUT );
                len = strlen( lineBuf );
            }
        } while( !feof(fp) && lineBuf[len-1] != '\n' );

        puts( "A line has been processed!" );
    }

    return;
}
maxwellb
+1  A: 

Basically, you should allocate a temporary buffer of arbitrary size. Then you should scan input for newline character, filling buffer with scanned characters. If buffer fills up, allocate new, larger buffer, copy old contents to new buffer and free old buffer.

Glib library has g_io_channel_read_line function that does that for you.

el.pescado
A: 
char *myGetLine(FILE *pFile)
{
  //Allocation a chunk of memory.
  //Read a chunk from the file.
  //While not a full line then reallocate a bigger chunk of memory and get the next chunk from the file.
  //NOTE: No malloc()/realloc() error checking is done here.
  //NOTE: Each call allocates a chunk of memory that the user must free().

  const int bufIncrSize = 128;   //or whatever increment you like
  int bufSize = bufIncrSize;
  char *pLine = (char *)malloc(bufIncrSize);
  pLine[0] = '\0';  //make it an empty string

  //while not EOF
  while (fgets(&pLine[strlen(pLine)], bufIncrSize, pFile) != NULL) {
    // If we got the newline, then we have the whole line
    if (pLine[strlen(pLine) - 1] == '\n')
      break;

    //else get a bigger buffer and try again
    bufSize += bufIncrSize;
    pLine = (char *)realloc(pLine, bufSize);
  }

  return pLine;  //NOTE the user is responsible for freeing the line buffer
}
Scott Thomson
A: 

You would read a chunk of the line at a time into a fixed-sized buffer, and then copy the contents of that fixed-sized buffer into a dynamically allocated and resizable buffer:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define SIZE ... // some reasonable size to handle most cases

int getNextLine(FILE *stream, char **line, size_t *lineLength)
{
  char inbuf[SIZE];
  int done = 0;
  int rval = 1; // success

  *lineLength = 0;

  /**
   * If *line is not NULL, it is assumed that it was allocated on a
   * previous call to getNextLine.  Free it and set to NULL.
   */
  if (*line != NULL)
  {
    free(*line);
    *line = NULL;
  }

  while(!done)
  {
    char *tmp;

    if (fgets(inbuf, sizeof inbuf, stream))
    {
      /**
       * Check for newline character.  If present, clear it and set the
       * done flag to true.
       */
      char *newline = strchr(inbuf, '\n');
      if (newline != NULL)
      {
        *newline = 0;
        done = 1;
      }

      /**
       * Extend the dynamic buffer by the length of the input string
       * and copy the input string to it. 
       */
      tmp = realloc(*line, *lineLength + strlen(inbuf) + 1);
      if (tmp)
      {
        *line = tmp;
        (*line)[*lineLength] = 0;      
        strcat(*line, inbuf);        
        *lineLength += strlen(inbuf) + 1;
      }
      else
      {
        printf("Error allocating or extending buffer\n");
        rval = 0;
        done = 1;
      }
    }
    else
    {
      if (feof(stream))
      {
        printf("At end-of-file\n");
        rval = EOF;
      }
      else
      {
        printf("Error during read\n");
        rval = 0;
      }
      done = 1;
    } 
  }
  return rval;
}

int main(void)
{
  char *line = NULL;     // line *MUST* be initialized to NULL
  size_t lineLength = 0;
  int status;

  for (;;)
  {
    int status = getNextLine(stdin, &line, &lineLength);
    if (status == 0 || status == EOF)
      break;

    printf("Read %lu characters in line: \"%s\"\n", 
      (unsigned long) lineLength, line);
  }
  return 0;
}
John Bode