views:

467

answers:

3

Hi,

I'm using fwrite() and fread() for the first time to write some data structures to disk and I have a couple of questions about best practices and proper ways of doing things.

What I'm writing to disk (so I can later read it back) is all user profiles inserted in a Graph structure. Each graph vertex is of the following type:

typedef struct sUserProfile {
    char name[NAME_SZ];
    char address[ADDRESS_SZ];
    int socialNumber;
    char password[PASSWORD_SZ];

    HashTable *mailbox;
    short msgCount;
} UserProfile;

And this is how I'm currently writing all the profiles to disk:

void ioWriteNetworkState(SocialNetwork *social) {
    Vertex *currPtr = social->usersNetwork->vertices;
    UserProfile *user;

    FILE *fp = fopen("save/profiles.dat", "w");

    if(!fp) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }

    fwrite(&(social->usersCount), sizeof(int), 1, fp);

    while(currPtr) {
        user = (UserProfile*)currPtr->value;

        fwrite(&(user->socialNumber), sizeof(int), 1, fp);
        fwrite(user->name, sizeof(char)*strlen(user->name), 1, fp);
        fwrite(user->address, sizeof(char)*strlen(user->address), 1, fp);
        fwrite(user->password, sizeof(char)*strlen(user->password), 1, fp);
        fwrite(&(user->msgCount), sizeof(short), 1, fp);

        break;

        currPtr = currPtr->next;
    }

    fclose(fp);
}

Notes:

  • The first fwrite() you see will write the total user count in the graph so I know how much data I need to read back.
  • The break is there for testing purposes. There's thousands of users and I'm still experimenting with the code.

My questions:

  • After reading this I decided to use fwrite() on each element instead of writing the whole structure. I also avoid writing the pointer to to the mailbox as I don't need to save that pointer. So, is this the way to go? Multiple fwrite()'s instead of a global one for the whole structure? Isn't that slower?
  • How do I read back this content? I know I have to use fread() but I don't know the size of the strings, cause I used strlen() to write them. I could write the output of strlen() before writing the string, but is there any better way without extra writes?
+1  A: 
  • It is slower. Calling a function x times is slower than calling it once where x>1. If performance turns out to be a concern, you can use fwrite/fread with sizeof(structure) for regular use and write a portable serialized version to import/export. But check if it really is a problem first. Most formats don't use binary data anymore, so you can tell that at least fread performance it's not their main concern.

  • No there isn't. the alternative is doing a fgetc(3) based strlen.

jbcreix
In my current tests, performance is not a issue with all those `fwrite` so I think I'll just leave it like this. But what do you mean about writing a portable serialized version?
Nazgulled
For the second problem, I ended up "cleaning" the string with '\0' til the end. A little more space wasted, but less writes/reads.
Nazgulled
I meant having a separate function that exported the data in case you needed portability. What do you mean with cleaning, fixed width strings?
jbcreix
I mean, `char name[30]` and you read `name` from `stdin`. Let's say the name is just "John", it will actually be `'J', 'o', 'h', 'n', '\0', 'g', 'a', 'r', 'b', 'a', 'g', 'e'`, etc... All I do is clean the string like this: `'J', 'o', 'h', 'n', '\0', '\0', '\0', '\0'`, etc...
Nazgulled
+2  A: 

You're right: as you're doing it now, there's no way to read back the content because you can't tell where one string ends and the next begins.

The advice you cite to avoid using fwrite() for structured data is good, but interpreting that advice to mean that you should fwrite() each element individually may not be the best solution.

I think you should consider using a different format for your file, instead of writing raw values with fwrite(). (For example, your files will not be portable to a machine with different byte order.)

Since it looks like most of your elements are strings & integers, have you considered a text-based format using fprintf() to write and fscanf() to read? One big advantage of a text-based format instead of an application-specific binary format is that you can view it with standard tools (for debugging, etc.)

Also, whatever format you choose, make sure you consider the possibility that you may need to add more fields in the future. At a minimum, that means you should include a version number in some kind of header, either for the file itself or for each individual entry. Even better, tag the individual fields (to allow for optional attributes), for example:

name: user1
address: 1600 pennsylvania ave
favorite color: blue

name: user2
address: 1 infinite loop
last login: 12th of never
David Gelhar
This is a university project and I emailed my instructor about something related (not exactly this) and he happened to mention that for this (we are talking about writing/reading the app state, meaning, all data structures) we were supposed to use a byte based format instead of a text one. And isn't it faster to write/read in binary instead of using fscanf() to parse the content?
Nazgulled
I'd be surprised if the overhead of fscanf() is significant compared to how long it takes to read the data from the disk. As with so many things, there's a tradeoff of speed vs. portability/maintainability. For example, what happens to your binary data if you need to increase NAME_SZ?
David Gelhar
It's not my call, I though of using text files at first, but everyone else is doing it binary and the teacher also mentioned it...
Nazgulled
+2  A: 

If your program needs to be at all portable then you should not be writing ints and shorts to disk as blocks of memory: the data will be corrupted when you try to read them in on a computer with a different word size (e.g. 32bit -> 64 bit) or different byte order.

For strings, you can either write the length first, or include a terminator at the end.

The best way is usually to use a text based format. For example, you could write each record as a separate line, with fields separated by a tab or a colon. (As a bonus, you no longer need to write a count of the number of records at the start of the file --- just read in records until you hit end of file.)

Edit: But if this is a class assignment you've been given, you probably don't need to worry about portability. Write '\0' terminators from the strings to disk to delimit them. Don't worry about efficiency when reading it back in, the slowest bit is the disk access.

Or even fwrite() out the entire structure and fread() it all back in. Worried about that pointer? Overwrite it with a safe value when you read it in. Don't worry about the wasted space on disk (unless you've been asked to minimise disk usage).

If you do need to write non-negative ints to disk in a portable binary format, you could do it like this:

  • first byte is the number of following bytes
  • second byte is the most significant non-zero byte in the int
  • ...
  • last byte is the least significant byte in the int

So:

  • 0 encodes as 00
  • 1 encodes as 01 01
  • 2 encodes as 01 02
  • 255 -> 01 ff
  • 256 -> 02 01 00
  • 65535 -> 02 ff ff
  • 65536 -> 03 01 00 00
  • etc

If you need to encode negative numbers as well, you will need to reserve a bit for the sign somewhere.

Dave Hinton
Please read the comment above.
Nazgulled
But if I cannot use a text based file, how can I write ints/shorts in a portable way?
Nazgulled
@Nazgulled: see edit.
Dave Hinton
I actually just e-mailed the instructor about most things you pointed out in your edit. I even though about the same solution for the pointer. :) I think it makes sense and portability shouldn't be an issue for this assignment. That's what I think at least, but I'm not the instructor.
Nazgulled