views:

577

answers:

10

I'm currently working on a school project, in java, and I'm coding a database application. Something like the MySQL Monitor where you type in queries and get results / whatever.

In applications I've coded before, I used databases to store data, like user profiles, settings, etc. Now, obviously, I can't use a database to store data generated from this school project, otherwise what's the point?

I'm thinking about storing the data in files but that's the only idea I have in my mind right now and I'm kinda running dry.. and to be honest, I don't want to start banging at code and then I discover a better way of doing it.

So if anyone has any idea how to store the data (like CSV?), or has some kind of knowledge of how database applications work internally, can you please shed some light?

-- EDIT: just to be more clear, I can't use database engines to store the data, to put it this way, I'm coding a simple database engine. Ideas like what Galwegian, jkramer and Joe Skora suggested is what I'm looking for.

+1  A: 

I suppose you could do a very simple proof of principle 'database' application using xml files and maybe use xpath to query it.

Would be very slow compared to a database (depending on file size and hardware of course), but would work.

Galwegian
A: 

If you're using C#, you might consider writing a simple linq to xml type ORM.

Ian P
A: 

You could use a serialization format like YAML, and store an array of hashes, where each hash is a table record and the keys in each hash are column names. You could then just load the serialized file into memory, work with arrays and hashes, and then store everything back.

I hope that's what you meant.

jkramer
+2  A: 

What you probably want is to use are random access files. Once you have a set of fields for a record, you can write them to disk as a block. You can keep an index separately on disk on in memory and access any record directly at any time. Hopefully that gives you enough to get started.

Joe Skora
+2  A: 

I am not sure I understand your requirement, but wouldn't 'SQLite' work for you (though it is still a database engine, which is what you may be avoiding in the first place, so I am not so sure)?

ayaz
A: 

Can't you use a file based database like hsqldb to store your user settings etc.? This way you have a familiar interface to your data and are able to store it in the filesystem.

Jasper
+5  A: 

Sure, you could create your own database with a file system since that is how actual databases are implemented. For example, you could decide to store your data in fixed or variable length raw data files, and then create a separate index file with file pointers into that other file for quick indexed access for any queries based on what type of index information you want stored in your Index file

So yes, look at creating 2 files - 1 to store the data and the other to store file pointers into that file keyed by whatever indexes you are wanting to provide quick index access by.

Best of luck - you will come to learn alot about database construction with this project I am betting.

Brad Osterloo
Oooh, I do like Galwegians idea regarding a simple database using XML/XPath and I think would be a great starting point unless the point of your exercise is to create your own "indexing" system.
Brad Osterloo
your idea is great too.. except that I have to start from square one because I don't have any experience at all with what you're talking about! :)Thanks for the idea mate ;)
ninuhadida
A: 

StackOverflow isn't for homework.

Having said that, here's the Quick and Dirty way to an efficient, flexible database.

  1. Design a nice Map (HashMap, TreeMap, whatever) that does what you want to do. Often, you'll have a "Record" class with your data, and a number of "Index" objects which are effectively Map<String,List<Record>> collections. (Why a list of records? What about an index on a not-very-selective field?)

  2. Write a class to serialize your collections into files.

  3. Write a class to deserialize your collections from files.

  4. Write your query processing or whatever around the in-memory Java objects.

In-memory database.

Don't like Java's serialization? Get a JSON or YAML library and use those formats to serialize and deserialize.

"But an in-memory database won't scale," the purists whine. Take that up with SQLite, not me. My PC has 2GB of RAM, that's a pretty big database. SQLite works.

S.Lott
thanks for your answer, and just to clear it up, I'm not asking anyone to do my homework. I'm here just to get some ideas from better coders than I am.
ninuhadida
@ninuhadida: "I'm currently working on a school project," How is this not homework?
S.Lott
I don't want to start a flame war here, but:1. I'm only asking for ideas2. At school we only learned the basics3. We're free to pick whatever project title we want4. And I decided to do it the hard way and actually learn something.And StackOverflow is a way to "learn".
ninuhadida
@ninuhadida: No flame. I just can't understand the two assertions: homework and not homework. I'm looking for some clarification. It can't be both.
S.Lott
+2  A: 

I would create a database that uses binary tables, one file per table. Take a look at the very handy DataInputStream and DataOutputStream classes. Using them you can easily go back and forth from binary files to Java types.

I would define a simple structure for the table: a header that describes the contents of the table, followed by the row data. Have each column in the table defined in the header - its name, data type, and maximum length. Keep it simple. Only handle a few data types using the capabilities of DataInput/OutputStream as your guide. Use a simple file-naming convention to associate table names to file names.

Create a test table with enough columns to have at least one of each data type. Then, create a simple way to populate tables with data, either by processing input files or via console input. Finally, create a simple way to display the contents of entire tables to the console.

After that, you can add on a very simple version of a SQL-like dialect to do queries. A simple query like this:

SELECT * FROM EMPLOYEES

...would require opening up the file containing the EMPLOYEES table (via your table filename naming convention), parsing the header, and reading through the entire table, returning the contents.

After you get that working, it will be simple to add other functionality such as processing of simple WHERE clauses, returning only the rows (or columns within rows) that match certain criteria.

If it's not necessary to have such a general-purpose solution (any number of tables, any number of columns, an actual query language, etc.) you can simply add methods to your API like:

Employee[] result = EmployeeDataManager.select("LASTNAME", "Smith");

...or something like that. If you build up slowly, dividing your functionality up into several small tasks as I have suggested, soon you will have implemented all of the features you need.

shadit
I'll look into the Data I/O streams! Seems promising to say the least.thanks for the heads up!
ninuhadida
+1  A: 

The basics of storing records in blocks in data files have been around for decades. Obviously there are a great many variations on a theme, and all of them are designed to work around the fact that we have slow disk drives.

But the fundamentals are not difficult. Combining fixed length columns with a fixed number of columns can give you very rapid access to any record in your database.

From there, it's all offsets.

Let's take the example of a simple row containing 10 32-Bit integers. A single row would be 40 bytes (4 bytes per integer * 10). If you want row 123, simply multiply it by 40. 123 * 40, gives you an offset of 4920. Seek that far in to the database file, read 40 bytes, and voila, you have a row from your database.

Indexes are stored in B+-Trees, with tree nodes distributed across blocks on the disk. The power of the B+Tree is that you can easily find a single key value within the tree, and then simply walk the leaf nodes to scroll through the data in key order.

For a simple format that's useful and popular, consider looking up the original DBase format -- DBF Files. It's evolved some over the years, but the foundation is quite simple, well documented, and there are lots of utilities that can work on it. It's a perfectly workable database format that deals with all of the fundamental issues with the problem.

Will Hartung