views:

69

answers:

2

I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.

I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).

Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).

Or do I even need that, should I just go with raw/custom Java?

+2  A: 

Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).

Java API

Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream.

You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.

Java is independent of the OS. You just need to make sure you use File.pathSeparator, or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator.

The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, e.g. check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.

Most OS have an internal buffer for file writing/reading. Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here) for system such as database.

Also both file and directory are abstracted with File and you need to check with isDirectory. This can be confusing, for instance if you have one file x, and one directory /x (I don't remember exactly how to handle this issue, but there is a way).

Web service

The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.

Transactions

Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.

You could go with a complicated design involving some form of distributed transaction (see this answer), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:

  • Update. If the user wants to overwrite a file, you actually create a new one. The level of indirection between the logical file name and the physical file is stored in database. This way you never overwrite a physical file once written, to ensure rollback is consistent.
  • Create. Same story when user want to create a file
  • Delete. If the user want to delete a file, you do it only in database first. A periodic job polls the file system to identify files which are not listed in database, and removes them. This two-phase deletes ensures that the delete operation can be rolled back.

This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction, but I feel like the project is dead (2007).

ewernli
A solid answer, thanks. I should have guessed Apache Commons have something to contribute to this issue as well! I must admit that I am little surprised of the high level filehandling Java API - I was expecting something more low level. I think your pointers are enough to get me both started and finished in this. However, the point you make about transactions is very relevant. Would it be possible to bind the filesystem operations to the same transaction (and to rollback)?
Tuukka Mustonen
@Tukka Mustonen I've added few more details. Also, it's unfortunately not possible to bind filesystem and database easily, afaik. I've however described one possible way to have more robustness without introducing complicated distributed transactions.
ewernli
Thanks for elaborating your answer. I think your suggested approach to binding filesystem operations with database transactions is simple enough, yet very viable. As there are no other answers (yet), I am choosing this as the accepted answer and will go this road. I would still be curious to know about any project binding all this stuff behind a single, even more high level API, that provides transaction and versioning support, sensible file/directory structures, support for file metadata, exception translation/handling, and generally takes care of the things I cannot foresee :) Anyone?
Tuukka Mustonen
@Tuukka Mustonen There is one: JSR-170 and JSR-283. You can also have a look at Apache Jackrabbit. But I what you are now asking for is not a simple API to save, load, and delete file, but a document management API :)
ewernli
A: 

There is DataNucleus, a Java persistence provider. It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.

Support for XML and JSON seems to be experimental.

Tuukka Mustonen