views:

71

answers:

3

I want to load a txt file into an array like file() does in php. I want to be able to access different lines like array[N] (which should contain the entire line N from the file), then I would need to remove each array element after using it to the array will decrease size until reaching 0 and the program will finish. I know how to read the file but I have no idea how to fill a string array to be used like I said. I am using gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) to compile.

How can I achieve this?

A: 

I suggest you read your file into an array of pointers to strings which would allow you to index and delete the lines as you have specified. There are efficiency tradeoffs to consider with this approach as to whether you count the number of lines ahead of time or allocate/extend the array as you read each line. I would opt for the former.

  1. Read the file, counting the number of line terminators you see (ether \n or \r\n)
  2. Allocate a an array of char * of that size
  3. Re-read the file, line by line, using malloc() to allocate a buffer for each and pointed to by the next array index

For your operations:

  • Indexing is just array[N]
  • Deleting is just freeing the buffer indexed by array[N] and setting the array[N] entry to NULL

UPDATE:

The more memory efficient approach suggested by @r.. and @marc-van-kempen is a good optimization over malloc()ing each line at a time, that is, slurp the file into a single buffer and replace all the line terminators with '\0'

Assuming you've done that and you have a big buffer as char *filebuf and the number of lines is int num_lines then you can allocate your indexing array something like this:

char *lines[] = (char **)malloc(num_lines + 1); // Allocates array of pointers to strings
lines[num_lines] = NULL; // Terminate the array as another way to stop you running off the end

char *p = filebuf; // I'm assuming the first char of the file is the start of the first line
int n;
for (n = 0; n < num_lines; n++) {
  lines[i] = p;
  while (*p++ != '\0') ; // Seek to the end of this line
  if (n < num_lines - 1) {
    while (*p++ == '\0')  ; // Seek to the start the next line (if there is one)
  }
}

With a single buffer approach "deleting" a line is merely a case of setting lines[n] to NULL. There is no free()

bjg
Or just read the file in a buffer and then do your work off it. Uses more RAM, but could be faster due to only doing the I/O once - depends if the OS buffers the first read for you or not.
Michael Dorgan
Where can I read some examples of array of pointers? I understand that the number of lines would define the size of the main array but I dont know how to manage each array element.
jahmax
+2  A: 

Proposed algorithm:

  1. Use fseek, ftell, fseek to seek to end, determine file length, and seek back to beginning.
  2. malloc a buffer big enough for the whole file plus null-termination.
  3. Use fread to read the whole file into the buffer, then write a 0 byte at the end.
  4. Loop through the buffer byte-by-byte and count newlines.
  5. Use malloc to allocate that number + 1 char * pointers.
  6. Loop through the buffer again, assigning the first pointer to point to the beginning of the buffer, and successive pointers to point to the byte after a newline. Replace the newline bytes themselves with 0 (null) bytes in the process.

One optimization: if you don't need random access to the lines (indexing them by line number), do away with the pointer array and just replace all the newlines with 0 bytes. Then s+=strlen(s)+1; advances to the next line. You'll need to add some check to make sure you don't advance past the end (or beginning if you're doing this in reverse) of the buffer.

Either way, this method is very efficient (no memory fragmentation) but has a couple drawbacks:

  • You can't individually free lines; you can only free the whole buffer once you finish.
  • You have to overwrite the newlines. Some people prefer to have them kept in the in-memory structure.
  • If the file ended with a newline, the last "line" in your pointer array will be zero-length. IMO this is the sane interpretation of text files, but some people prefer considering the empty string after the last newline a non-line and considering the last proper line "incomplete" if it doesn't end with a newline.
R..
+1  A: 

Two slightly different ways to achieve this, one is more memory friendly, the other more cpu friendly.

I memory friendly

  1. Open the file and get its size (use fstat() and friends) ==> size
  2. allocate a buffer of that size ==> char buf[size];
  3. scan through the buffer counting the '\n' (or '\n\r' == DOS or '\r' == MAC) ==> N
  4. Allocate an array: char *lines[N]
  5. scan through the buffer again and point lines[0] to &buf[0], scan for the first '\n' or '\r' and set it to '\0' (delimiting the string), set lines[1] to the first character after that that is not '\n' or '\r', etc.

II cpu friendly

  1. Create a linked list structure (if you don't know how to do this or don't want to, have a look at 'glib' (not glibc!), a utility companion of gtk.
  2. Open the file and start reading the lines using fgets(), malloc'ing each line as you go along.
  3. Keep a linked list of lines ==> list and count the total number of lines
  4. Allocate an array: char *lines[N];
  5. Go through the linked list and assign the pointer to each element to its corresponding array element
  6. Free the linked list (not its elements!)
Marc van Kempen