views:

109

answers:

1

I have some data which takes up more than 50MB in an uncompressed file, but compresses down to less than half a MB using gzip.

Most of this is numerical data. I'm trying to figure out how to process this data without having to uncompress it completely. For example, if this data contains a couple of strings and 5 or so numerical values per record, is there a way I can uncompress a single row (or a small set of rows), process them, then discard them?

Unix provides utilities such as zcat, grep, etc. which operate directly on compressed data, I'd like to do the same in Java.

Thanks

+7  A: 

Just wrap your FileInputStream in a GZipInputStream:

public static BufferedReader createReader (File f, String encoding) throws IOException
{
    try
    {
        InputStream in = new FileInputStream (f);
        if (f.getName ().endsWith (".gz"))
            in = new GZIPInputStream (in, 10240);

        return new BufferedReader (new InputStreamReader (in, encoding));
    }
    catch (UnsupportedEncodingException e)
    {
        throw new RuntimeException("Missing encoding "+encoding, e);
    }
}
Aaron Digulla