views:

104

answers:

4

I am developing a database-like application that stores a a structure containing:

struct Dictionary
{
    char *key;
    char *value;

    struct Dictionary *next;
};

As you can see, I am using a linked list to store information. But the problem begins when the user exits out of the program. I want the information to be stored somewhere. So I was thinking of storing the linked list in a permanent or temporary file using fopen, then, when the user starts the program, retrieve the linked list. Here is the method that prints the linked list to the console:

void PrintList()
{
    int count = 0;
    struct Dictionary *current;

    current = head;

    if (current == NULL)
    {
            printf("\nThe list is empty!");
        return;
    }

    printf("    Key \t  Value\n");
    printf("  ======== \t ========\n");

    while (current != NULL)
    {
        count++;
        printf("%d.  %s \t %s\n", count, current->key, current->value);
        current = current->next;
    }
}

So I am thinking of modifying this method to print the information through fprintf instead of printf and then the program would just get the infomation from the file. Could someone help me on how I can read and write to this file? What kind of file should it be, temporary or regular? How should I format the file (like I was thinking of just having the key first, then the value, then a newline character)?

+2  A: 

The file should probably be regular. A temp file isn't guaranteed to be there the next time your start your application. Also, your format there looks fine for humans, not so fine for machines. I'd recommend either creating your own binary file format or using XML(or maybe JSON?). You could probably format it pretty easily like

key1\0value1\0key2\0value2\0....

I'll write a quick example is psuedoish code:

//To write...
Dictionary *this=begin_list;
while(this!=null){
  for(i=0;i<strlen(this->key);i++){
    write_byte(this->key[i]);
  }
  for(i=0;i<strlen(this->value);i++){
    write_byte(this->value[i]);
  }
  this=this->next;
}

//to read...
Dictionary *prev;
Dictionary *this;
char *buffer;
while(!eof){
  buffer=malloc(MAX_STRING_LEN);
  int i=0;
  this=malloc(sizeof(Dictionary)
  while(i<MAX_STRING_LEN){ //note no error checking
    buffer[i]=read_byte();
    if(buffer[i]==0){
      break;
    }
  }
  this->key=buffer;
  buffer=malloc(MAX_STRING_LEN)
  while(i<MAX_STRING_LEN){ //note no error checking
    buffer[i]=read_byte();
    if(buffer[i]==0){
      break; 
    }
  }
  this->value=buffer;
  if(prev!=null){
    prev->next=this;
  }
  this->next=null;
  prev=this;
}

I know it's a poor example. I think scanf or similar may make the job a ton easier, but my C skills are getting rusty.

Earlz
How would I read the XML in C?
Mohit Deshpande
@Mohit. Google it. I've really never tried it.
Earlz
@Mohit, you may also want to look into the BSON file format which is supposedly just a binary JSON format used by MongoDB(a key-value(ish/document) database, so sounds right)
Earlz
If I store it in a regular file, how would I read only the keys and values.
Mohit Deshpande
@Mohit You would open it, read the file(one byte at a time if you wanna be simple) until `\0` and then put that value as key1, then the same thing for value1 repeat for keyX and valueX
Earlz
A quick code sample would help! If you can.
Mohit Deshpande
@Mohit there you go
Earlz
Thanks! That cleared it up a lot.
Mohit Deshpande
What is the read_byte function??? And what is MAX_STRING_LEN?
Mohit Deshpande
@Mohit, well I just aimed for psuedo code. MAX_STRING_LEN should be some macro you define for what the maximum length of either a key or value can be and read_byte can be replaced by an `fopen` and `fgetc`.
Earlz
This code sample has a couple of problems. When writing, your loop index needs to be checked against the string length *and* against MAX_STRING_LEN (in case a string isn't terminated or is too long). Second, the write loops are writing out the contents of the strings but aren't adding '\0' characters between them. Also, your restore function wastes a lot of memory. After reading in a string, `realloc` your buffer down to the correct size. If you are always going to use max length buffers, then the first suggestion in my answer would be simpler.
bta
@bta I wasn't going to make a production quality linked-list serializer. I left that as an exercise to the reader. I wouldn't have even put any code if it wasn't requested. Anyone who copies and pastes this code into their production project is an idiot.
Earlz
+1  A: 

One way you can read or write to the file is using freopen like this: freopen("file.out", "wt" , stdout), then you're printf's will go to the file.out and you will not need to modify the code alot.

You can store the information in plain text, but I really think that the best way to do this is save the info in a binary file. You can check out more about this searching info about fread and fwrite.

vladv
+1 vladv, I agree with the concepts. The other way to do it is to use fprintf(Filedescriptor, "format-string", ...args...) which is probably what I'd do to stop writing over the stdout file descriptor.
Ninefingers
+1  A: 

The fundamental issue is that pointers do not translate to external storage. There is no guarantee that when your program executes again, it will have the same memory ranges (addresses). Given this principle, there are alternative methods to storing your data.

Processes for persistant data:
1. Use a database, small or large.
2. Convert your data into ASCII text in a scannable format.
3. Use fixed length binary records
4. Use variable sized binary records
5. Implement a dictionary data structure using file offsets instead of pointers.

Using A Database
Let a professional application (that has been tested and works) manage your data. This lets you concentrate on using the data rather than storage and retreival.

Convert to a scannable format
The idea here is to write the data to the file in a format that is easy to retrieve and maintain. Examples include Comma Separated Values (CSV), XML and INI. This requires code on your part to read and write the data. There are libraries to assist.

Use fixed length binary records
With fixed length records, the data is read from the file and inserted into your dictionary. Binary files are very efficient as far as transferring data, but not very portable, especially when operating system versions change, platforms change or compiler versions change. There may be a waste of space for text records.

Use variable sized binary records
This technique saves space but increases processing time. Each record must be processed in order to find the location of the next one. Random access to records is difficult. Otherwise similar to fixed length binary records.

Implement a dictionary data structure in the file
Same algorithm as your memory based data structure except uses file offsets instead of pointers. New records can be appended to the end of the file. Reclaiming deleted entries is difficult and will lead to fragmentation. Fragmentation can be resolved by writing a new file. If you are going through this much effort, you might as well use an existing database application.

Thomas Matthews
I really pray he is storing the string's data in a file and not the strings actual pointer. But I've seen crazier.
Earlz
A: 

Here's one way to solve the problem.

Create a data structure for your list items like this:

struct DictionaryArchive {
    char key[MAX_KEY_LENGTH];
    char value[MAX_VALUE_LENGTH];
    int next;
};

You will need to determine the values of MAX_KEY_LENGTH and MAX_VALUE_LENGTH according to the data that you will be expecting.

Now, convert your linked list into an array of these structures. Instead of storing a pointer for locating the next item, you will store the array index of the next item. This converts your list into a format where each element is a predictable size your entire list is one consecutive span of memory. Now, you can fwrite this array to a binary file to archive it, and fread it back out to restore it.

A much more space-efficient alternative to using fixed-size char arrays above is to instead define a custom file format instead of using static structures. For your case, you can use a file format like this to store your data in a retrievable manner:

  • The list is written to the file in order, starting with the head and following the next pointers to the tail
  • Each list item will be stored using four data fields in the following order:
    1. 16-bit integer, key_length
    2. 8-bit char array with key_length elements, key_data
    3. 16-bit integer, value_length
    4. 8-bit char array with value_length elements, value_data

Now, you can walk the list, dumping your data to the file node by node. To re-construct your data, read through the binary file, generate new struct Dictionary elements for each entry, and link them together in the order they appear in the file.

Your code to write the data to the data file would look something like this (untested, for illustration purposes only):

FILE* fd;
size_t len;
struct Dictionary* pDict = list_head;
fd = fopen("output_file.dat", "w");

// Walk through the list, storing each node
while (pDict != NULL) {
    // Store key
    len = strlen(pDict->key);
    fwrite(&len, sizeof(len), 1, fd);
    fwrite(pDict->key, len, sizeof(char), fd);

    // Store value
    len = strlen(pDict->value);
    fwrite(&len, sizeof(len), 1, fd);
    fwrite(pDict->value, len, sizeof(char), fd);

    // Move to next list node
    pDict = pDict->next;
};

fclose(fd);

Your code to read the data out would be very similar (read instead of write, and create a new struct Dictionary object for each loop iteration).

bta