tags:

views:

6994

answers:

4

I have a huge CSV file that has a mix of numerical and text datatypes. I want to read this into a single matrix in Matlab. I'll use a simpler example here to illustrate my problem. Let's say I have this CSV file:

1,foo
2,bar

I am trying to read this into MatLab using:

A=fopen('filename.csv');
B=textscan(A,'%d %d', 'delimiter',',');
C=cell2mat(B);

The first two lines work fine, but the problem is that texscan doesn't create a 2x2 matrix; instead it creates a 1x2 matrix with each value being an array. So I try to use the last line to combine the arrays into one big matrix, but it generates an error because the arrays have different datatypes.

Is there a way to get around this problem? Or a better way to combine the arrays?

A: 

I believe you can't use textscan for this purpose. I'd use fscanf which always gives you a matrix as specified. If you don't know the layout of the data it gets kind of tricky however.

fscanf works as follows:

fscanf(fid, format, size)

where fid is the fid generated by the fopen

format is the file format & how you are reading the data (['%d' ',' '%s'] would work for your example file)

size is the matrix dimensions ([2 2] would work on your example file).

dborba
+3  A: 

I am note sure if combining them is a good idea. It is likely that you would be better off with them separate.

I changed your code, so that it works better:

clear
clc
A=fopen('filename.csv');
B=textscan(A,'%d %s', 'delimiter',',')
fclose(A)

Looking at the results

K>> B{1}

ans =

       1
       2

K>> B{2}

ans =

'foo'
'bar'

Really, I think this is the format that is most useful. If anything, most people would want to break this cell array into smaller chunks

num = B{1}
txt = B{2}

Why are your trying to combine them? They are already together in a cell array, and that is the most combined you are going to get.

MatlabDoug
I'm combining them in order to create a matrix that will serve as the dependent variables in an OLS regression. I have a number of text fields that will need to be converted to dummy variables (e.g. fields that say "true" or "false" will be converted to 1 or 0). I was planning on sticking the text into the matrix and then going through it and converting fields where necessary. But based on your suggestion, maybe it is better to convert things while they're still in arrays and then combine them? After everything has been converted, is it advisable to use cell2mat to combine?Thanks Doug!
Jack7890
You might want to keep these in a structure:data.num = numdata.txt = txtThis will keep them in one container, and let you refer to them as more intuitive names. Putting them into one cell array would just make the syntax to read and manipulate them harder with no benefit other than combinign them. I would go with the structure.
MatlabDoug
@Jack7890: If you *really* want to combine the individual arrays into a matrix, you would first have to convert all of the array contents to the same data type. For example, let's say that the array ['foo'; 'bar'] stored in B{2} becomes [3; 4]. To horizontally concatenate the arrays into [1 3; 2 4], do the following: C = [B{:}]; To vertically concatenate the arrays into [1; 2; 3; 4], do this instead: C = vertcat(B{:});
gnovice
Thanks a ton for the help guys. One more quick question: I'm confused by why textscan() returns vectors that are stored in a single cell. I've been getting around this by just using cell2mat() on all the vectors that it returns, so that the vector is stored as a K-by-1 matrix, but I'm worried that I'm missing something here. Is there a reason that textscan() doesn't automatically store things in a K-by-1 matrix? Is there a problem with my approach?
Jack7890
A: 

There is a natural solution to this, but it requires the Statistics toolbox (version 6.0 or higher). Mixed data types can be read into a dataset array. See the Mathworks help page here.

Richie Cotton
A: 

Look on MatlabCC for a file called "readtext", works wonders...

Matt

related questions