Practical to save thousands of data structures in a file and do specific lookups?

views:

answers:

+2 Q:

Practical to save thousands of data structures in a file and do specific lookups?

There's been a discussion between me and some colleagues that are taking the same class as me (and thus have the same project) about saving data to files and read from those files only when we need that specific data.

For instance, the project is something about managing a social network. I'm not going into specifics because it doesn't matter, but the idea is to use the best data structures to manipulate this data.

Let's say I'm using an Hash Table to save the users profile data. Some of them argue that only some specific information should be saved in the data structures, like and ID that represents an user. Everything else should be put on files. We should access the files to get that data we want when we want.

I don't think this is practical... It could be if we were using some library for a database like SQLite or something, but are not and I don't think we are supposed to. We are only supposed to code everything ourselves and use C functions, like these. Nor do I think we are supposed to do a perfect memory management. The requisites of the project are not for us to code a database, or even a pseudo-database. What this project demands of us, are the best data structures (as long as we know how to justify why we picked those instead of others) to store the type of data and the all data specified for the project.

I should let you know that we had 2 classes before where the knowledge we got there is to be applied on this project. One of those dealt with the basis of C, functions, structures, arrays, strings, file IO, recursion, pointers and simple data structures like binary trees and linked lists, stuff like that. The other one was about more complex data structures, hash tables, AVL trees, heaps, graphs, etc... It also talked about time complexity, big O notation and stuff like that.

For instance, let's say all I have in memory is the IDs of the users and then I need to find all friends of a specific user. I'll have to process the whole file (or files) finding out the friends of that user. It would be much easier if I could have all that data in memory already.

It makes no sense to me that we need to pick (and justify) the data structures that we best see fit for the project and then only use them to lookup for an ID. We will then need to do a second lookup, to get the real data we need, which will take it's time, won't it? Why did we bother with the data structures in the first place if we still need to get to search a bunch of files on the hard drive?

How could it be possible, using standard C functions, coding everything manually and still simulate some kind of database? Is this practical at all?

Am I missing something here?

From what you're saying I doubt that you need to store anything on disk. One thing that I would ask the teacher is if you're optimizing for time or space complexity (there will be a trade off between these two depending on what you're trying to achieve).

thext 2010-03-10 00:36:39

That can certainly be done. The resource forks in Mac System 5-8 files were stored as binary indexed databases (general use of the term, don't think SQL!). (I think the interface was actually written in assembly, but I could do it in c).

The only thing is: it's a pain in the butt. Such files typically need to start with some kind of index or header, and then hold a bunch of records at predictable locations. (OK, sometimes the first index just points at some more indexes. How many layers of indirection do you care to manage?)

If you're going to do it, just remember: binary mode access.

dmckee 2010-03-10 00:37:13

Of course, I wouldn't think it wouldn't be possible. In this area, everything is possible. But the question is: is it feasible for our project? Is such a thing a requisite? Did someone taught us the fundamentals for that kind of thing in past classes? I don't think so...

Nazgulled 2010-03-10 01:03:29

I imagine you can figure out the basics. Open the file with an index (it'll have to start with a length) that tells you what else is found in the file and where each bit starts. Whether you need to or not is not something I can say.

dmckee 2010-03-10 01:29:05

I don't understand what you're saying...

Nazgulled 2010-03-10 01:33:52

Hmm... what about persistent storage?

If your project requires you to be able to remember friend data between two restarts of the app, then don't you think file storage (or whatever else becomes an issue)?

rmk 2010-03-10 00:41:04

Actually, that's one of the requisites for the project but that's an entirely different issue.

Nazgulled 2010-03-10 00:59:42

+1 A:

It sounds like the project might be more about how you design the relationships between your data "entities," and not as much about how you store them. I don't think storing data off in files would be a good solution - file IO will be much slower than accessing things in memory. If you had the need to persist data on the disk, you'd probably want to just use a database, rather than files (I know it's an academic course though, so who knows).

I think you should focus more on how you design your data types, and their relationships, to maximize the speed of lookups, searches, etc. For example, you could store all the users in a linked list, or store them in a tree, or a graph, but each will have its implications on how fast you can find users, etc. Depending on what features you want in your social networking site, there will be different designs that will allow different types of behavior to perform better than it would in other designs.

Andy White 2010-03-10 00:41:34

That's exactly what I think and what I think this project is about, but some people disagree with me...

Nazgulled 2010-03-10 01:00:37

Well, I think you are right!

Andy White 2010-03-10 06:01:13

I'm having a very hard time figuring out what you are trying to ask here.
But there is a general rule that may apply:

If all of your data will fit in memory at once, it is usually best to load all of it into memory at once and keep it there. You write out to a file only to save, to exit, or for backup.

There are lots of exceptions to this rule, but for a class project where this is going to be the only major application running on the machine, you may as well store everything in memory. After all, you have already paid for the memory; you don't want it just sitting there idle.

I may have completely misunderstood the question you are trying to ask...

Norman Ramsey 2010-03-10 03:12:55

ansaurus

tags:

views:

answers:

Practical to save thousands of data structures in a file and do specific lookups?

related questions