ansaurus

Question

Fastest way to import CSV files in MATLAB

Answer 1

A:

If you work on Windows, it may be a good idea to keep your data in an Excel sheet, which you can then rather conveniently access using xlsread.

To speed up your code, I suggest using textscan, since every file access comes with a bit of overhead in Matlab, and since you apparently can keep all the data in memory. From your code I assume that val is numeric, but the headers are not, and that you really want to just collect all the numbers in one large cell array without ordering them according to the header.

% read the file in one go
fid = fopen(filename,'r');
tmp = textscan(fid,'Delimiter',','); % comma, because the file is a comma separated list
fclose(fid);
allFields = tmp{1}; 
% within all fields, find the headers by converting everything to double and looking for the nans
data = cellfun(@str2double,allFields,'UniformOutput',false);
isHeader = cellfun(@isnan,data);

% now we can conveniently assign the output
headers = allFields(isHeader); % copy headers to new array
data(isHeader) = []; % remove headers from data

If you do not want to collect your data in a single list, you can use headerIdx = find(isHeader) to get the indices of where the headers are, and collect data(idx(i)+1:idx(i+1)-1) for each separate header.

Jonas 2010-01-11 18:26:58

Answer 2

+3 A:

It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:

Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN

or you could even just print empty fields:

Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,

Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:

>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}

ans = 

    'Item1'
    'Item2'
    'Item3'

>> C{2}

ans =

     1     2     3   NaN  %# TEXTSCAN sets empty fields to NaN anyway
     4     5     6     7
     8     9   NaN   NaN

gnovice 2010-01-11 18:28:39

Answer 3

+1 A:

Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example

stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
    [stringParts{i},r]=strtok(tline,',');
    tline=r;
    i=i+1;
    if isempty(r), break; end
end

% store the header
headers{count} = stringParts{1};

% convert the data into numbers
for j=2:length(stringParts)
    data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;

Azim 2010-01-11 18:33:59

+1 for recommending strtok - I didn't know it existed before

Doresoom 2010-01-11 19:44:51

Answer 4

A:

Q1) If you know the max number of columns you can fill empty entries with NaN Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.

Q2) The fastest way to read num. data from a file without mex-files is csvread. I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:

http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell

Serg 2010-01-11 19:18:00

The Item# column is actually text labels, so yes, I do need it. I probably should have clarified that.

Doresoom 2010-01-11 19:33:15

Answer 5

A:

It may be easiest to open the data in a spreadsheet program, save it as an .xls file and then use what's described here to open .xls files: http://www.mathworks.com/access/helpdesk/help/techdoc/ref/xlsread.html

Eric 2010-07-30 16:30:59

@Eric: In my experience, xlsread has been slower than importing csv.

Doresoom 2010-07-30 17:26:50

ansaurus

tags:

views:

answers:

Fastest way to import CSV files in MATLAB

related questions