views:

1510

answers:

6

I have a textfile with the following structure:

1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605
37
1999-01-05 
1,122.50
1,087.50
1,122.50
0
3,250
712,175
14
...

The file contains repeated sets of eight values (a date followed by seven numbers, each on their own line).

I want to read it into MATLAB and get the values into different vectors. I've tried to accomplish this with several different methods, but none have worked - all output some sort of error.

In case it's important, I'm doing this on a Mac.

+2  A: 

Use a script to modify your text file into something that Matlab can read.

eg. make it a matrix:

M = [
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605;  <-- notice the ';'
37
1999-01-05 
1,122.50
1,087.50
1,122.50
0
3,250;   <-- notice the ';'
712,175
14
...
]

import this into matlab and read the various vectors from the matrix.

Note: my matlab is a bit rusty. Might containt errors.

Thanks, I solved it that way. created a script in python to make me matrix the way you suggested!
Fifth-Edition
+2  A: 

It isn't entirely clear what form you want the data to be in once you've read it. The code below puts it all in one matrix, with each row representing a group of 8 rows in your text file. You may wish use different variables for different columns, or (if you have access to the Statistics toolbox), use a dataset array.

% Read file as text
text = fileread('c:/data.txt');

% Split by line
x = regexp(text, '\n', 'split');

% Remove commas from numbers
x = regexprep(x, ',', '')

% Number of items per object
n = 8;

% Get dates
index = 1:length(x);
dates = datenum(x(rem(index, n) == 1));

% Get other numbers
nums = str2double(x(rem(index, n) ~= 1));
nums = reshape(nums, (n-1), length(nums)/(n-1))';

% Combine dates and numbers
thedata = [dates nums];

You could also look into the function textscan for alternative ways of solving the problem.

Richie Cotton
A: 

Similar to Richie's. Using str2double to convert the file strings to doubles. This implementation processes line by line instead of breaking the file up with a regular expression. The output is a cell array of individual vectors.

function vectors = readdata(filename)

fid=fopen(filename);

tline = fgetl(fid);
counter = 0;
vectors = cell(7,1);
while ischar(tline)
    disp(tline)
    if counter > 0
        vectors{counter} = [vectors{counter} str2double(tline)];
    end
    counter = counter + 1
    if counter > 7
        counter = 0;
    end
    tline = fgetl(fid);
end

fclose(fid);
Todd
+7  A: 

EDIT: This is a shorter version of the code I previously had in my answer...

If you'd like to read your data file directly, without having to preprocess it first as dstibbe suggested, the following should work:

fid = fopen('datafile.txt','rt');
data = textscan(fid,'%s %s %s %s %s %s %s %s','Delimiter','\n');
fclose(fid);
data = [datenum(data{1}) cellfun(@str2double,[data{2:end}])]';

The above code places each set of 8 values into an 8-by-N matrix, with N being the number of 8 line sets in the data file. The date is converted to a serial date number so that it can be included with the other double-precision values in the matrix. The following functions (used in the above code) may be of interest: TEXTSCAN, DATENUM, CELLFUN, STR2DOUBLE.

gnovice
This is exactly what I wanted to do! Thanks a bunch.But - why is reading data so troublesome in matlab?...thanks again
Fifth-Edition
@Fifth: The thing that made your case difficult was the usage of commas within the format of the number. Normally, commas would be used to separate numbers from one another, not to denote separation between thousands and millions within numbers. As you can see from Amro's example, the MATLAB code is trivial for a case with better-formatted numbers.
gnovice
@Fifth: Actually, I was able to come up with an even shorter version of my code, comparable to Amro's compact answer without needing any preprocessing of the data file.
gnovice
@Fifth: It should be noted that C-language scanf and C++ iostreams both choke on the commas in your example file. You would have to do a two-pass operation in those languages as well. C# Double.Parse() handles the comma, and I don't know about Java.
mtrw
This is shorter indeed :)BTW you can you the 'CollectOutput' option on textscan, hence you avoid the call to cellfun:> data = textscan(%...%, 'CollectOutput',1);> M = [datenum(data{1}(:,1)) str2double(data{1}(:,2:end))];
Amro
@Amro: Good point. I thought about doing that, but the character count actually came out higher (the additional argument to TEXTSCAN and the extra indexing outweighed the removal of the call to CELLFUN). I'm not sure which is fastest (probably the one you suggested), but I decided to take the "code golf" route and go with the shorter one. ;)
gnovice
Wow - thanks for the thorough walkthrough. So, if I have a file that is better formatted text - it will be easier to read?and it will look something like what Amro wrote below? the "fid = fopen(..) .." part.Great answers and comments all! thanks
Fifth-Edition
+3  A: 

I propose yet another solution. This one is the shortest in MATLAB code. First using sed, we format the file as a CSV file (comma seperated, with each record on one line):

cat a.dat | sed -e 's/,//g ; s/[ \t]*$/,/g' -e '0~8 s/^\(.*\),$/\1\n/' | 
            sed -e :a -e '/,$/N; s/,\n/,/; ta' -e '/^$/d' > file.csv

Explanation: First we get rid of the thousands comma seperator, and trim spaces at the end of each line adding a comma. But then we remove that ending comma for each 8th line. Finally we join the lines and remove empty ones.

The output will look like this:

1999-01-04,1100.00,1060.00,1092.50,0,6225,1336605,37
1999-01-05,1122.50,1087.50,1122.50,0,3250,712175,14

Next in MATLAB, we simply use textscan to read each line, with the first field as a string (to be converted to num), and the rest as numbers:

fid = fopen('file.csv', 'rt');
a = textscan(fid, '%s %f %f %f %f %f %f %f', 'Delimiter',',', 'CollectOutput',1);
fclose(fid);

M = [datenum(a{1}) a{2}]

and the resulting matrix M is:

  730124     1100     1060   1092.5    0   6225   1336605    37
  730125   1122.5   1087.5   1122.5    0   3250    712175    14
Amro
A: 

This has regular expression checking to make sure your data is formatted well.

fid = fopen('data.txt','rt');

%these will be your 8 value arrays
val1 = [];
val2 = [];
val3 = [];
val4 = [];
val5 = [];
val6 = [];
val7 = [];
val8 = [];

linenum = 0; % line number in file
valnum = 0; % number of value (1-8)

while 1
   line = fgetl(fid);
   linenum = linenum+1;
   if valnum == 8
      valnum = 1;
   else
      valnum = valnum+1;
   end

    %-- if reached end of file, end
    if isempty(line) | line == -1
      fclose(fid);
      break;
   end


   switch valnum
      case 1
         pat = '(?\d{4})-(?\d{2})-(?\d{2})'; % val1 (e.g. 1999-01-04)
      case 2
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val2 (e.g. 1,100.00)  [valid up to 1billion-1]
      case 3
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val3 (e.g. 1,060.00)  [valid up to 1billion-1]
      case 4
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val4 (e.g. 1,092.50)  [valid up to 1billion-1]
      case 5
         pat = '(?\d+)'; % val5 (e.g. 0)
      case 6
         pat = '(?\d*[,]*\d*[,]*\d+)'; % val6 (e.g. 6,225)  [valid up to 1billion-1]
      case 7
         pat = '(?\d*[,]*\d*[,]*\d+)'; % val7 (e.g. 1,336,605)  [valid up to 1billion-1]
      case 8
         pat = '(?\d+)'; % val8 (e.g. 37)
      otherwise
         error('bad linenum')
   end

   l = regexp(line,pat,'names'); % l is for line
    if length(l) == 1 % match
      if valnum == 1
         serialtime = datenum(str2num(l.yr),str2num(l.mo),str2num(l.dy)); % convert to matlab serial date
         val1 = [val1;serialtime];
      else
         this_val = strrep(l.val,',',''); % strip out comma and convert to number
         eval(['val',num2str(valnum),' = [val',num2str(valnum),';',this_val,'];']) % save this value into appropriate array
      end
   else
      warning(['line number ',num2str(linenum),' skipped! [didnt pass regexp]: ',line]);
   end
end