tags:

views:

120

answers:

4

I'm contemplating an application that is, at a glance, a "free form" database. A collection of notes and artifacts. However, at the same time there are some higher level structures within the system.

My 10 second back-of-napkin design entails storing individual "entries" in small files (perhaps XML), organized in directories, and then indexing the entire set using something like Lucene.

The premise behind it is that it will be trivial for folks to "interface" with the system, since they need to merely "put files in to the right places". And since they're simple text files, they can be generated by any program (such as scripting languages), and, if necessary, even a text editor.

The detail is maintaining the index, and any other possible relationships.

In theory, on startup, the program could scan the directory for changed files and update the index. It could even do this in the background. I don't anticipate this being a horribly long process, as I don't anticipate having 1000's of entries. But it could always be an option to have the system scan only when instructed if the size gets too large.

Or I could require that some special file is updated with "new files" or somesuch thing that the system can check on start.

The alternative, is to use some other format rather than individual file. Use a database of some kind, they're a dime a dozen. But by doing that, all of a sudden this data is effectively "opaque" to a casual user. This makes scripting and such potentially more difficult.

Now, I could use something like, say, SQLLite which has broad support, and publish a database schema. Or I was thinking I could create a service layer in the application.

If I write it in Java, I could publish a Java API that a tool could use, but only if it too was written in Java.

Or I could expose the API as, say, lightweight Web Services (POX over HTTP, or REST over HTTP). HTTP support is far and wide today. That would require that the application be running in order to use any utility.

As with everything, it's a balance. I think the File solution is simpler, potentially less efficient, but perhaps limiting in the internal complexity.

The API can be much more powerful, but is harder to use, and certainly not useful for a casual user.

How do you think you might approach this kind of problem?

A: 

Although I cannot claim to know which solution would best suite your needs, I would second-guess a design that allows users unfettered access to the guts of my data. I guess the overall design depends on how often new files are added, how often they are accessed, and what type of information is stored in them.

Nescio
A: 

KISS

As you say, the File solution is simpler so it should be the winner. Just plan on having the data storage piece be an easily replaced module (SoC) and only add complexity as required.

Larry Smithmier
A: 

Definitely an embedded database as the primary data-store. That will save you a lot of headaches keeping the folder structure the way you intended. Searching, indexing will also be a lot faster. The user can migrate or backup their data by copying a single file. You take advantage of full-text search, and lots of SQL queries and functions.

You then have 'Export' options available. Export to filesystem would build that XML hierarchy to described earlier. You can also easily export to a single file, XML, CSV, etc. Importing from those exports can be an option as well.

Finally if you separate your DB access cleanly enough internally, you can expose that API to the user, as your API example.

So using an embedded database as your main datastore gives you all other options easily, plus the performance of a database.

kervin
+2  A: 

I'd go with the File Format (and release your access code somehow as a reference implantation so people can do I/O)

BCS