views:

281

answers:

2

Hello all,

I would like to scan text of textfiles in Matlab with the textscan function. Before I can open the textfile with fid = fopen('C:\path'), I need to unzip the files first. The files have the extension: *.gz

There are thousands of files which I need to analyze and high performance is important.

I have two ideas: (1) Use an external program an call it from the command line in Matlab (2) Use a Matlab 'zip'toolbox. I have heard of gunzip, but don't know about its performance.

Does anyone knows a way to unzip these files as quick as possible from within Matlab?

Thanks!

+1  A: 
Max
A: 

I've found 7zip-commandline(Windows) / p7zip(Unix) to be somewhat speedier for this.

[edit]From some quick testing, it seems making a system call to gunzip is faster than using MATLAB's native gunzip. You could give that a try as well.

Just write a new function that imitates basic MATLAB gunzip functionality:

function [] = sunzip(fullfilename,output_dir)
if ~exist('output_dir','var'), output_dir = fileparts(fullfilename); end

app_path = '/usr/bin/7za';
switches = ' e'; %extract files ignoring directory structure
options = [' -o' output_dir];

system([app_path switches options '_' fullfilename]);

Then use it as you would use gunzip:

sunzip('/data/time_1000.out.gz',tmp_dir);

With MATLAB's toc timer, I get the following extraction times with 6 uncompressed 114MB ASCII files:

gunzip: 10.15s
sunzip: 7.84s

JS Ng