views:

111

answers:

5

My Java application is currently using ZIP as a project file format. The project files contain a few XML files and many image and sound files.

The project files are getting pretty big, and since I can't find a way with the java.util.zip classes to write to a ZIP file without recreating it, my file saves are becoming very slow. So for example, if I just want to update one XML file, I need to rewrite the entire ZIP.

Is there some other Java ZIP library that will allow me to do random writes to a ZIP file?

I know switching to something like SQLite solves the random write issue. Would using SQLite just to write XML, Sound and Images as blobs be an appropriate use?

I suppose I could come up with my own file format and use RandomAccessFile but then there would be a lot of bookkeeping I'd have to write.

Update...

My file format is very much like Office Open XML. It is a ZIP file containing XML and other resources.

Someone must have solved the problem of how to do random writes to update a ZIP file. Does anyone know how?

+1  A: 

First of all I would separate your app's resources in those that are static (such as images) and those that can be changed (the xml files you mentioned). Since the static files won't be re-written, you can continue to store them in a zip file, which IMHO is a good approach to deploy any resources.

Now you have 2 options:

  • Since the non-static files are probably not too big (the xml files are likely to be smaller than images+sounds), you can stick with your current solution (zip file) and simply maintain 2 zip files, of which only one (the smaller one with the changeable files) can/will be re-written.

  • You could use a in-memory-database (such as hsqldb) to store the changeable files and only persist them (transferring from the database to a file on the drive) when your application shuts down or that operation is explicitly needed.

f1sh
I need to have everything in a single project file.
awbranch
A: 

sqlite is not always fast (at least in my experience). I would suggest individually compressing the XML files -- you'll still get decent compression, and just use the file system to save them. You could experiment with btrfs, or just go with ext4. If you're not on Linux, then this should still work okay, but it might not be as fast until things are cached in memory.

the idea is that if you do not have redundancy between XML files, then you don't get that much saving by compressing them in one "solid" archive.

gatoatigrado
The goal of the ZIP isn't to compress the XML, but rather to group all the project files into a single file. Could I use btrfs to do this?
awbranch
No, sorry. I was suggesting to not group the files. Why do you want to do that?In Linux you can use any regular file as an entire file system -- just run something like "dd if=/dev/zero of=file1 bs=1M count=100; /sbin/mkfs.ext4 file1; mkdir -p mountpoint; sudo mount file1 mountpoint -o loop" and everything you write into the "mountpoint/" directory will get written to "file1" -- you can observe that with "md5sum file1; echo > mountpoint/asdf; sync; md5sum file1"
gatoatigrado
I forgot to add -- Mac has very good support for the same functionality through .dmg.
gatoatigrado
Is there something that would work on Windows as well?
awbranch
IIRC you need admin rights to mount a loop device on linux
finnw
+1  A: 

There exist so-called single-file virtual file systems, that let you create file-based containers and provide file-system like structure and APIs. One of the samples is SolFS (it has C-written core with JNI wrapper) and some other C- and Delphi-written solutions (I don't remember their names at the moment). I guess there exist similar native Java solutions as well.

Eugene Mayevski 'EldoS Corp
A virtual file system may be a good direction to go it.I've found TrueZip which claims to be a java virtual file system for ZIP. If it truly has random write access to the ZIP file this would be a perfect solution. Will investigate further.
awbranch
I think I understood your problem. Java built-in ZIP classes don't support modification of existing archive (i.e. no AddEntry/DeleteEntry methods) and trueZip just fills this gap (and the gap is specific to particular implementation of ZIP access code).However, this is not a true virtual file system anyway, because ZIP format itself was not intended to be used in this scenario. When the ZIP file is modified, this is still a lengthy and time-consuming operation no matter what library or component you use.To be continued ...
Eugene Mayevski 'EldoS Corp
In opposite, "true" virtual file systems operate pages (like clusters on physical disk), and addition and deletion of files doesn't require rewriting of the file (under the hood or explicit). Only the modified pages are overwritten in-place. Create a huge file using TrueZip and try to delete or modify an entry in the middle of the file and measure speed.
Eugene Mayevski 'EldoS Corp
Thanks for the response. I'm going to investigate using a VFS more. I found ZX-VFS - http://www.zipxap.com/HomePage_Products_ZxVfs.html which looks exactly like what I need. But there is very little information on this product. Looks pretty new.
awbranch
A: 

Before offering another answer along the lines of using properly structured JARs, I have to ask -- why does the project need to be encapsulated in one file? How do you distribute the program to users to run?

Tom G
Its a desktop application that users download and install. They use it very much like a user would use PowerPoint. They create projects that contain media files and share those projects.
awbranch
A: 

If you must keep a project contained within a single file and be able to replace resources efficiently, yes I would say SQLite is a good choice.

If you do choose to use SQLite, also consider converting some of the XML schemas to one or more SQL tables rather than storing large XML documents as BLOBs.

finnw