views:

216

answers:

1

Hi,

I have a text file of 10001 lines, where first line contains the name of the attributes and the following lines contain values. The attribute types are mixed (strings and floats) and delimited by '\t'.

Does anyone know what is the best way to import such text file into matlab and organize these data into appropriate structure for further analysis?

I would like to use these data for some data mining applications so it would be very useful if each column could contain metadata as well (variable type, numeric/categorical value...)

Thank you for the suggestions!

A: 

How the columns are being indexed, by name or by integer index?

For the first case the best approach would be using a struct-array. An array element for each row in original data. There are two questions to be answered:

  1. How the fields will be named? Do you know the header in advance? Are all header strings valid MATLAB variable names and can be used as field names? Function genvarname could help in some scenarios.

  2. How to transform data matrix as output from textscan into a struct array? Look at the function cell2struct in the MATLAB help. If your field names (header) are really dynamic then you can still use cell2struct by creating argument cell dynamically and then calling cell2struct(args{:})

If columns are being indexed numerically then stay with cell matrix as output of textscan.

For the meta-data I would use another variable being a struct or a struct-array.

Mikhail