tags:

views:

9145

answers:

5

What would be the most efficient method of reading a text file into a dynamic one-dimensional array? reallocing after every read char seems silly, reallocing after every read line doesn't seem much better. I would like to read the entire file into the array. How would you do it?

+8  A: 

I don't understand quite what you want. Do you want to incrementally process the file, reading one line from it, then abandon it and process the next? Or do you want to read the entire file into a buffer? If you want the latter, I think this is appropriate (check for NULL return for malloc and fopen in real code for whether the file exist and whether you got enough memory):

FILE *f = fopen("text.txt", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);

char *bytes = malloc(pos);
fread(bytes, pos, 1, f);
fclose(f);

hexdump(bytes); // do some stuff with it
free(bytes); // free allocated memory
Johannes Schaub - litb
Yes, that would apply to my case. I meant that using realloc after each read char seems very inefficient, similarly after every read \n (to extend the array).
diminish
great, i'm glad it helps you
Johannes Schaub - litb
You should open the file in binary mode - there might be problems otherwise (check eg. glibc manual, 12.17)
Christoph
oh, thanks. i had no idea that it makes *that* much of a difference.
Johannes Schaub - litb
On POSIX systems, it shouldn't. But I'm pretty sure I once stumbled upon a bug which went away after switching to binary mode - but I can't remember what the exact issue was...
Christoph
Re: binary-mode - Control-Z can cause trouble on Windows. General point: You could consider using 'stat()' or 'fstat()' to tell you the file size. Also, beware gargantuan files (larger than 2 GB); long may not work reliably.
Jonathan Leffler
The stat functions are not part of the C standard; fseek()/ftell() is the only way I know of to get the size of a file if you want to use ISO C.
Christoph
i'm also unaware of any other way to get the filesize in standard C. But i seriously doubt he's loading a whole file with 2^32 bytes in memory using that method
Johannes Schaub - litb
hi, what is the difference between (let's assume we use 100 instead of pos) char *bytes = malloc(100*sizeof(char)); and above line where you have written char *bytes = malloc(100); second question is that what if my file has 180205962 characters in it. will the above way of reading the file would be efficient?
asel
@asel, first question: `sizeof(char)` is defined to be 1, so there is no difference. Second question: no, you probably should read it incrementally (like, line-by-line, or some other piecewise method). Otherwise, your memory will quickly become exhausted.
Johannes Schaub - litb
A: 

Best way would be to pre-allocate an amount of bytes and fread() for this amount. As long as the number of really read bytes is > 0 you can reallocate and continue reading. You will then reallocate blocks. The block size is the important parameter for the performance of the algorithm.

FILE *file = fopen("...", "r");
if (file != NULL)
{
  const size_t block_size = 1024;
  unsigned char *buffer = malloc(block_size);
  size_t read_bytes = 0;
  size_t last_read;
  while ((last_read = fread(buffer + read_bytes, 1, block_size, file)) > 0)
  {
    read_bytes += last_read;
    unsigned char *buffer2 = malloc(read_bytes + block_size);
    memcpy(buffer2, buffer, read_bytes);
    free(buffer);
    buffer = buffer2;
  }

  // ... do something with read_bytes of buffer

  free(buffer);

  fclose(file);
}
rstevens
Could you explain the magic behind the block size?
diminish
It's a convenient number to reduce the number of realloc() (or, in this case, malloc()) calls. The code can be critiqued; using realloc() would be better since it might be able to extend the existing allocation instead of always allocating a new chunk. Etc.
Jonathan Leffler
It's also generally better to not grow your buffer linearly, but exponentially (eg. always double buffer size on overflow); this will save you realloc() calls and seems to work reasonably well in practice...
Christoph
+6  A: 

If mmap(2) is available on your system, you can open the file and map it into memory. That way, you have no memory to allocate, you even don't have to read the file, the system will do it. You can use the fseek() trick litb gave to get the size.

void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
philippe
A: 

If you want to use ISO C, use this function.

It's litb's answer, wrapped with some error handling...

Christoph
A: 

i need to read the table as below from file .txt and store the table in array by c and then I can print any element in the table through for loop or some conditions any help 0 1 2 3 4 5
0 -- 1 2 1 1 1
1 0 -- 2 3 3 5
2 0 1 -- 3 3 3
3 1 1 2 -- 4 1
4 3 3 3 3 -- 5
5 1 1 1 1 4 --

radwan