tags:

views:

244

answers:

3

I know, in the time of Maven it is not recommended to store libraries in VCS, but sometimes it makes sense, though.

My question is how to best store them - compressed or uncompressed? Uncompressed they are larger, but if they are replaced a couple of times with newer ones, then maybe the stored difference between two uncompressed .jar files might be much smaller than the difference of compressed ones. Did someone make some tests?

+1  A: 

.jar files are (can be) compressed already, compressing them a second time probably will not yield the size improvement you expect.

rsp
I did not mean to compress them a second time, but create them compressed or uncompressed.
mklhmnn
@mklhmnn, If you store `.jar`s, I would keep them in their original distribution format. Jar's generated from the source in your repository I would not add to the repository.
rsp
JAR uses ZIP format so it's always compressed.
ZZ Coder
@ZZ Coder: no, one can create uncompressed jar- and zip-files easily. For example, to reduce download size, it is recommended to use uncompressed jars (because they can be better compressed by the bundling compressor).
mklhmnn
+8  A: 

Best practice to store .jar files in VCS (SVN, Git, …): don't.

It could make sense in a CVCS (Centralized VCS) like SVN, which can handle millions of files whatever their size is.

It doesn't in a DVCS, especially one like Git (and its limits):

  • Binary files don't fit well with VCS.
  • By default, cloning a DVCS repo will get you all of its history, with all the jar versions.
    That will be slow and take a lot of disk space, not matter how well those jar are compressed.
    You could try to play with shallow cloning, but that's highly unpractical.

Use a second repository, like Nexus, for storing those jars, and only reference a txt file (or a pom.xml file for Maven project) in order to fetch the right jar versions.
A artifact repo is more adapted for distribution and release management purpose.


All that being said, if you must store jar in a Git repo, I would have recommend initially to store them in their compressed format (which is the default format for a jar: see Creating a JAR File)
Both compressed and uncompressed format would be treated as binary by Git, but at least, in a compressed format, clone and checkout would take less time.

However, many threads mentions the possibility to store jar in uncompressed format:

I'm using some repos that get regular 50MB tarballs checked into them.
I convinced them to not compress the tarballs, and git does a fairly decent job of doing delta compression between them (although it needs quite a bit of RAM to do so).

You have more on deltified object on Git here:

  • It does not make a difference if you are dealing with binary or text;
  • The delta is not necessarily against the same path in the previous revision, so even a new file added to the history can be stored in a delitified form;
  • When an object stored in the deltified representation is used, it would incur more cost than using the same object in the compressed base representation. The deltification mechanism makes a trade-off taking this cost into account, as well as the space efficiency.

So, if clones and checkouts are not common operations that you would have to perform every 5 minutes, storing jar in an uncompressed format in Git would make more sense because:

  • Git would compressed/compute delta for those files
  • You would end up with uncompressed jar in your working directory, jars which could then potentially be loaded more quickly.

Recommendation: uncompressed.

VonC
Thank you for answering although this does not answer my question. Sometimes it makes sense (for us) to store jar files in a repository. For that case I want to know what's best - compressed or uncompressed.
mklhmnn
@mklhmnn: all right, I have added my recommendation, at least for Git: uncompressed format for jar is worth a try.
VonC
"uncompressed format ... is worth a try" vs. "I ... recommend to store ... in ... compressed format" seems contradictionary for me. Do you suggest compressed or uncompressed?
mklhmnn
@mklhmnn: uncompressed. I have edited my answer to make it crystal clear. That remains however to be tested.
VonC
+2  A: 

You can use similar solution as found in answers to "Uncompress OpenOffice files for better storage in version control" question here on SO, namely using clean / smudge gitattribute using rezip as filter to store *.jar files uncompressed.

Jakub Narębski
Good addition to the "storing jar uncompressed" solution. +1
VonC