views:

70

answers:

3

Is there a library or a class in JAVA that synchronizes one folder to another one, changing only the "target" folder? I need something really fast (ant sync task is way too slow, for 4 GIG it took 10 minutes, compared to 3 seconds for Allway Sync - a desktop software).

Thanks!

UPDATE : Should be open source too :)

I found http://syncdir.sourceforge.net/#source_code with the source code, but it's for Java 6+

+3  A: 

It sounds like you're looking for a Java based implementation of the rsync utility from UNIX. As far as I know, there have been quite a few attempts, but I'm not familiar with anything that made it into production.

There are several open source directory sync projects on sourceforge, but you may want to eyeball them first to make sure that they are safe enough for use on the platforms you are interested in.

Why do you need this to be implemented in Java?

Uri
+1 rsync is ported to almost all platforms
stacker
well, It have to be "integrated" into one of our software.
tinky05
BTW, "ant" is not slow because the difference is made over a network, I made the test locally. So it doesn't have to be super funky with compression and all (what rsync seems to do)
tinky05
@stacker, in some situations you cannot reasonably deal with native code, but must stick to a pure java solution. I would love a Java port of rsync too.
Thorbjørn Ravn Andersen
@Thorbjorn: So would I. I hate maintaining separate shell scripts and batch files when working in a mixed environment.
Uri
rsync is really only needed in high latency, low bandwidth situations (I implemented the rsync algorithm in Java for a project a couple of years ago). rsync is a fascinating algorithm, but it won't be useful on a network connection that is capable of transferring 4 GB in 3 seconds.
Kevin Day
+1  A: 

Sorry to be the bearer of bad news - but I think what you are going to run into here is performance limitations of Java IO. You can (and should) use NIO to do relatively quick file transfers now - but operations that involve querying the file system for meta data (i.e. get directory listings, determine which entry is a file, it's modified date, etc...) are horribly inefficient. For example, when you call File.isDirectory() then File.canWrite() it actually makes the same low level system call twice, applying different masks in each case. On a network with even modest latency, all these little reads can really add up.

The good news is that JDK 7 addresses these issues with an alternative mechanism for interacting with the file system.

Long and short: you won't find a Java implementation that can come close to a native app for this sort of thing - at least not until JDK 7 ships. Obviously, you could write JNI code to pull this off, but if you are doing that, you might as well write the entire app native and be done with it.

Kevin Day
thanks for the answer. But I guess that I can code something that could compare two identical folders of 4 GIG in less than 10 minutes? (that's the time ant - sync task - took) And by the tests I made, the size of the files makes a difference with ant (weird hu), even when there are nothing to copy from one side to the other.
tinky05
It depends on how you want to measure 'identical'. Ant probably does content analysis to make sure the files are indeed the same. You might choose to say that if the file name, size and modified dates are equal, then the files don't need to be copied - but that's not a 100% certain thing (probably acceptable for most cases - but you have to walk in knowing that you have a window for things to go wrong). If this is being used to launch the space shuttle, it's probably not cool to make that sort of assumption. If it's to sync an mp3 playlist, then sure.
Kevin Day
I've look at the source code and it doesn't seems that it checks the content, but it's so "generic" (aka complicated) that maybe somewhere down the line it reads the content anyway. Using the date and size, it takes less than 1 second to do something that ant do in 10 min.
tinky05
A: 

Well, I finally coded my own. It was easy with the DirectoryWalker (half of the job done) : http://commons.apache.org/io/api-release/index.html It was indeed A LOT faster than Apache ant's sync task.

tinky05