views:

1008

answers:

6

I have 100 of .gz files which I need to de-compress. I have couple of questions

a) I am using the code given at http://www.roseindia.net/java/beginners/JavaUncompress.shtml to decompress the .gz file. Its working fine. Quest:- is there a way to get the file name of the zipped file. I know that Zip class of Java gives of enumeration of entery file to work upon. This can give me the filename, size etc stored in .zip file. But, do we have the same for .gz files or does the file name is same as filename.gz with .gz removed.

b) is there another elegant way to decompress .gz file by calling the utility function in the java code. Like calling 7-zip application from your java class. Then, I don't have to worry about input/output stream.

Thanks in advance. Kapil

+1  A: 

Regarding A, the gunzip command creates an uncompressed file with the original name minus the .gz suffix. See the man page.

Regarding B, Do you need gunzip specifically, or will another compression algorithm do? There's a java port of the LZMA compression algorithm used by 7zip to create .7z files, but it will not handle .gz files.

Paul Morie
A: 

If you have a fixed number of files to decompress once, why don't you use existing tools for that? As Paul Morie noticed, gunzip can do that: for i in *.gz; do gunzip $i; done And it would automatically name them, stripping .gz$

On windows, try winrar, probably, or gunzip from http://unxutils.sf.net

alamar
A: 

GZip is normally used only on single files, so it generally does not contain information about individual files. To bundle multiple files into one compressed archive, they are first combined into an uncompressed Tar file (with info about individual contents), and then compressed as a single file. This combination is called a Tarball.

There are libraries to extract the individual file info from a Tar, just as with ZipEntries. One example. You will first have to extract the .gz file into a temporary file in order to use it, or at least feed the GZipInputStream into the Tar library.

You may also call 7-Zip from the command line using Java. 7-Zip command-line syntax is here: 7-Zip Command Line Syntax. Example of calling the command shell from Java: Executing shell commands in Java. You will have to call 7-Zip twice: once to extract the Tar from the .tar.gz or .tgz file, and again to extract the individual files from the Tar.

Or, you could just do the easy thing and write a brief shell script or batch file to do your decompression. There's no reason to hammer a square peg in a round hole -- this is what batch files are made for. As a bonus, you can also feed them parameters, reducing the complexity of a java command line execution considerably, while still letting java control execution.

BobMcGee
+1  A: 

Have you tried

gunzip *.gz
Peter Lawrey
+2  A: 

a) Zip is an archive format, while gzip is not. So an entry iterator does not make much sense unless (for example) your gz-files are compressed tar files. What you want is probably:

File outFile = new File(infile.getParent(), infile.getName().replaceAll("\\.gz$", ""));

b) Do you only want to uncompress the files? If not you may be ok with using GZIPInputStream and read the files directly, i.e. without intermediate decompression.

But ok. Let's say you really only want to uncompress the files. If so, you could probably use this:

public static File unGzip(File infile, boolean deleteGzipfileOnSuccess ) throws IOException {
  GZIPInputStream gin = new GZIPInputStream(new FileInputStream(infile));
  File outFile = new File(infile.getParent(), infile.getName().replaceAll("\\.gz$",   ""));
  FileOutputStream fos = new FileOutputStream(outFile);
  byte[] buf = new byte[100000]; // Buffer size is a matter of taste and application...
  int len;
  while ( ( len = gin.read(buf) ) > 0 )
    fos.write(buf, 0, len);
  gin.close();
  fos.close();
  if ( deleteGzipfileOnSuccess )
    infile.delete();
  return outFile;

}

fredarin
Hi, can I read the files without uncomprerssing. I want something like reading it line by line.And, the files may not have only 80 characters in length/line.BufferedReader is what used to work for me. But, it doesn't have a constructor for GzInputStream.
I'd write what I want, such as:BufferedReader in = new BufferedReader(new GzipFileReader(file));Then implement GzipFileReader as extends Reader.
fredarin
A: 

.gz files (gzipped) can store the filename of a compressed file. So for example FuBar.doc can be saved inside myDocument.gz and with appropriate uncompression, the file can be restored to the filename FuBar.doc. Unfortunately, java.util.zip.GZIPInputStream does not support any way of reading the filename even if it is stored inside the archive.

Garnet Ulrich