views:

454

answers:

3

I'm trying to load the following ascii file into MATLAB using load()

% some comment
1 0xc661
2 0xd661
3 0xe661

(This is actually a simplified file. The actual file I'm trying to load contains an undefined number of columns and an undefined number of comment lines at the beginning, which is why the load function was attractive)

For some strange reason, I obtain the following:

K>> data = load('testMixed.txt')

data =

         1       50785
         2       58977
         3       58977

I've observed that the problem occurs anytime there's a "d" in the hexadecimal number.

Direct hex2dec conversion works properly:

K>> hex2dec('d661')
ans =
       54881

importdata seems to have the same conversion issue, and so does the ImportWizard:

K>> importdata('testMixed.txt')

ans =

       1       50785
       2       58977
       3       58977

Is that a bug, am I using the load function in some prohibited way, or is there something obvious I'm overlooking?

Are there workarounds around the problem, save from reimplementing the file parsing on my own?

Edited my input file to better reflect my actual file format. I had a bit oversimplified in my original question.

+3  A: 

"GOLF" ANSWER:

This starts with the answer from mtrw and shortens it further:

fid = fopen('testMixed.txt','rt');
data = textscan(fid,'%s','Delimiter','\n','MultipleDelimsAsOne','1',...
                'CommentStyle','%');
fclose(fid);
data = strcat(data{1},{' '});
data = sscanf([data{:}],'%i',[sum(isspace(data{1})) inf]).';

PREVIOUS ANSWER:

My first thought was to use TEXTSCAN, since it has an option that allows you to ignore certain lines as comments when they start with a given character (like %). However, TEXTSCAN doesn't appear to handle numbers in hexadecimal format well. Here's another option:

fid = fopen('testMixed.txt','r');     % Open file

% First, read all the comment lines (lines that start with '%'):

comments = {};
position = 0;
nextLine = fgetl(fid);                % Read the first line
while strcmp(nextLine(1),'%')
  comments = [comments; {nextLine}];  % Collect the comments
  position = ftell(fid);              % Get the file pointer position
  nextLine = fgetl(fid);              % Read the next line
end
fseek(fid,position,-1);               % Rewind to beginning of last line read

% Read numerical data:
nCol = sum(isspace(nextLine))+1;       % Get the number of columns
data = fscanf(fid,'%i',[nCol inf]).';  % Note '%i' works for all integer formats
fclose(fid);                           % Close file

This will work for an arbitrary number of comments at the beginning of the file. The computation to get the number of columns was inspired by Jacob's answer.

gnovice
And here I thought textscan could do anything!
mtrw
I'm trying to avoid explicitly defining the format, as my files can have a varying number of columns. Of course, if that's my only option, I'll do it. But I find it strange that load fails like that.
Kena
What if the number of comment lines and the number of columns is not predetermined?
Kena
Someone please explain the downvote. I can't help it if the OP keeps updating their question with new info.
gnovice
Yeah, comments should be *mandatory* with downvotes. Anyway, there's a huge discussion on meta about this.
Jacob
@Jacob: Yeah, it comes up a lot on Meta, but I doubt anything will ever change about it... "preserving anonymity" and all that.
gnovice
Can't explain the downvote either, but I've upvoted you to compensate for that, even though it doesn't solve my problem exactly. Thanks for the hard work.
Kena
That's the most sophisticated scanf line I've ever seen! I'll have to look at this some more, but I think you've outgolfed me.
mtrw
+2  A: 

New:

This is the best I could come up with. It should work for any number of comment lines and columns. You'll have to do the rest yourself if there are strings, etc.

% Define the characters representing the start of the commented line
% and the delimiter
COMMENT_START = '%%';
DELIMITER = ' ';

% Open the file
fid = fopen('testMixed.txt');

% Read each line till we reach the data    
l = COMMENT_START;
while(l(1)==COMMENT_START)
    l = fgetl(fid);
end

% Compute the number of columns
cols = sum(l==DELIMITER)+1;
% Split the first line 
split_l = regexp(l,' ','split');

% Read all the data
A = textscan(fid,'%s');
% Compute the number of rows
rows = numel(A{:})/cols;

% Close the file
fclose(fid);

% Assemble all the data into a matrix of cell strings
DATA = [split_l ; reshape(A{:},[cols rows])']; %' adding this to make it pretty in SO

% Recognize each column and process accordingly
% by analyzing each element in the first row
numeric_data = zeros(size(DATA));
for i=1:cols
    str = DATA(1,i);
    % If there is no '0x' present
    if isempty(findstr(str{1},'0x')) == true
     % This is a number
     numeric_data(:,i) = str2num(char(DATA(:,i)));
    else
     % This is a hexadecimal number
     col = char(DATA(:,i));
     numeric_data(:,i) = hex2dec(col(:,3:end));
    end
end

% Display the data
format short g;
disp(numeric_data)

This works for data like this:

% Comment 1
% Comment 2
1.2 0xc661 10 0xa661
2 0xd661 20 0xb661
3 0xe661 30 0xc661

Output:

  1.2        50785           10        42593
    2        54881           20        46689
    3        58977           30        50785

OLD:

Yeah, I don't think LOAD is the way to go. You could try:

a = char(importdata('testHexa.txt'));
a = hex2dec(a(:,3:end));
Jacob
This works for files containing only hex data, but not mixed data (see edited file input... I had over simplified in my first example)
Kena
@Jacob, you can force textscan to deal with the comments with:A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');This will also handle comments in the middle of the file. For rest, I can't think of anything more elegant that what you already did.
mtrw
Thanks! I'd forgotten about that while coding this up.
Jacob
+2  A: 

This is based on both gnovice's and Jacob's answers, and is a "best of breed"

For files like:

% this is my comment
% this is my other comment

1 0xc661 123
2 0xd661 456
% surprise comment
3 0xe661 789 
4 0xb661 1234567

(where the number of columns within the file MUST be the same, but not known ahead of time, and all comments denoted by a '%' character), the following code is fast and easy to read:

f = fopen('hexdata.txt', 'rt');
A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');
fclose(f);
A = A{1};
data = sscanf(A{1}, '%i')';
data = repmat(data, length(A), 1);
for ctr = 2:length(A)
    data(ctr,:) = sscanf(A{ctr}, '%i')';
end
mtrw
+1: Nice combination of answers. I feel inclined to play a round of "golf" with you and post a shorter answer. ;)
gnovice

related questions