views:

78

answers:

2

Hi,

For this university project I'm doing (for which I've made a couple of posts in the past), which is some sort of social network, it's required the ability for the users to exchange messages.

At first, I designed my data structures to hold ALL messages in a linked list, limiting the message size to 256 chars. However, I think my instructors will prefer if I save the messages on disk and read them only when I need them. Of course, they won't say what they prefer, I need to make a choice and justify the best I can why I went that route.

One thing to keep in mind is that I only need to save the latest 20 messages from each user, no more.

Right now I have an Hash Table that will act as inbox, this will be inside the user profile. This Hash Table will be indexed by name (the user that sent the message). The value for each element will be a data structure holding an array of size_t with 20 elements (20 messages like I said above). The idea is to keep track of the disk file offsets and bytes written. Then, when I need to read a message, I just need to use fseek() and read the necessary bytes.

I think this could work nicely... I could use just one single file to hold all messages from all users in the network. I'm saying one single file because a colleague asked an instructor about saving the messages from each user independently which he replied that it might not be the best approach cause the file system has it's limits. That's why I'm thinking of going the single file route.

However, this presents a problem... Since I only need to save the latest 20 messages, I need to discard the older ones when I reach this limit.

I have no idea how to do this... All I know is about fread() and fwrite() to read/write bytes from/to files. How can I go to a file offset and say "hey, delete the following X bytes"? Even if I could do that, there's another problem... All offsets below that one will be completely different and I would have to process all users mailboxes to fix the problem. Which would be a pain...

So, any suggestions to solve my problems? What do you suggest?

+2  A: 

You can't arbitrarily delete bytes from the middle of a file; the only way that works is to rewrite the entire file without them. Disregarding the question of whether doing things this way is a good idea, if you have fixed length fields, one solution would be to just overwrite the oldest message with the newest one; that way, the size / position of the message on disk doesn't change, so none of the other offsets are affected.

Edit: If you're allowed to use external libraries, making a simple SQLite db could be a good solution.

tzaman
That's what I thought... I don't like the idea to use fixed length for the messages. The maximum is 256 but I don't want to write 256 bytes if the message is a simple "hi, how you doing?". And I'm allowed to use external libs such as that. That would defeat the purpose...
Nazgulled
A: 

You're complicating your life way more than you need to.

If your messages are 256 characters, then use a array of 256 characters to hold each message.

Write it to disk with fwrite, read with fread, delete it by changing the first character of the string to \0 (or whatever else strikes your fancy) and write that to disk.

Keep an index of the messages in a simple structure (username/recno) and bounce around in the file with fseek. You can either brute-force the next free record when writing a new one (start reading at the beginning of the file and stop when you hit your \0) or keep an index of free records in an array and grab one of them when writing a new one (or if your array is empty then fseek to the end of the file and write a complete new record.)

Frank Cox
I didn't get most of your answer and I don't think you understood the problem correctly. One thing I can say is that I don't want to use a array of 256 chars, I don't want to store the message in memory. That was my first approach, that's what I'm trying to avoid. This is not just one message for one user, it's 20 messages for thousands of users (at max of course).
Nazgulled
You have to put the active message somewhere, so that goes into the 256 char array. Save the array to disk as one 256 char record, even if the message is only 10 chars long. Then you can easily find it again by fseek-ing to recno*256. Keep track of recno and sender in a simple structure that you can keep an array of in, if you like, ram, or in another disk file. I don't see what's unclear about that. Simple random access file with fixed-length character strings and a separate index array.
Frank Cox
Like I said, I don't like/want that. I'm not going to save 256 bytes FOR EACH MESSAGE, when it should only save a few bytes (if indeed the message is tiny). It's a waste of space.
Nazgulled
It might be a waste of space, but it makes things *significantly* simpler. If you want to allocate variably-sized records, then you'll have to keep track of all the current blocks of free space, alongside their size, including merging them when you end up with two adjacent free blocks. It seems unlikely that the juice would be worth the squeeze.
caf