views:

198

answers:

4

I have a microarray data of 38 row and 7130 columns. i am teying to read the data but keeping having the above error.

I debugged and found when i read the data, i have a 1x7129 instead of a 38x7130. I don't know why. my 7130th column contains letters while the rest of the data are numbers. Any idea why this is happening?

A: 

Try this code to read the data:

filename = 'yourfilename.txt';
fid = fopen(filename,'r');

% If you have a line with column headers use those 3 lines. Comment if not.
colnames = fgetl(fid);
colnames = textscan(colnames, '%s','delimiter','\t');
colnames = colnames{:};

% Reading the data
tsformat = [repmat('%f ',1,7129) '%s'];
datafromfile = textscan(fid,tsformat,'delimiter','\t','CollectOutput',1);
fclose(fid);

% Get the data from the cell array    
data = datafromfile{1};
labels = datafromfile{2};

EDIT To separate your dataset to training and test, do something like this:

train_samp = 1:19;
test_samp = 20:38;
train_data = data(train_samp,:);
test_data = data(test_samp,:);
train_label = labels(train_samp);
test_label = labels(test_samp);

You can also separate samples randomly:

samp_num = size(data,1);
test_num = 19;
randorder = randperm(samp_num);
train_samp = randorder(test_num+1:samp_num);
test_samp = randorder(1:test_num);

I haven't done transposition data = data';. If you have to, just switch row and column indexes in the above code:

train_data = data(:,train_samp);
test_data = data(:,test_samp);
yuk
i used your code but have errror: Index exceeds matrix dimensions.
Mola
See my comments to your code below and update here.
yuk
A: 

My file is in text tab delimited and here is my code for reading the file:

clear; 
fn=32; 
col=fn+1; 
cluster=2; 
num_eachClass=3564; 
row=num_eachClass*cluster; 
fid1 = fopen('data.txt', 'r'); 
txt_format=''; 
for t=1:col txt_format=[txt_format '%g ']; 
end 
data = fscanf(fid1,txt_format,[col row]); 
data = data'; fclose(fid1); 
Mola
my sample data: -214 -139 -76 -135 -106 -138 -72 -413 5 -88 -165 -67 -92 -113 -107 -117 -476 -81 -44 17 -144 -247 -74 -120 -81 -112 -273 -20 7 -213 -25 -72 -4 15 -318 -32 -124 -135 -153 -73 -49 -114 -125 -85 -144 -260 -127 -105 -155 -93 -119 -147 -72 -219 -213 -150 -51 -229 -199 -90 -321 -263 -150 -233 -327 -207 -100 -252 -20 -139 -116 -114 -192 -49 -79 -186 -58 -1 -307 265 -76 215 238 7 106 42 -71 84 -31 -118 -126 -50 -18 -119 100 79 -157 -168 -11 -114 -85 -78 -76 -50 -57 136 124 -1 -125 2 -95 49 -37 -70
Mola
A: 

Here is my modified code with Yuk's code but i still have error:

clear;
clc;
fn=7129;
col=fn+1;
cluster=2;
num_eachClass=19;
row=num_eachClass*cluster;
fid = fopen('data.txt', 'r');

% Reading the data 
tsformat = [repmat('%f ',1,7129) '%s']; 
data= textscan(fid,tsformat,'delimiter','\t'); 
data = data';
fclose(fid); 


test_num = 0;
train_num = num_eachClass-test_num;
t1=1;
t2=1;
for i=1:row
   t=mod(i-1,num_eachClass); 
   if t<train_num
      Training(t1,:)=data(i,:);
      t1=t1+1;
   else
      Test(t2,:)=data(i,:);
      t2=t2+1;
   end
end
TrainingLabel = Training(:,col);
Training = Training(:,1:fn);
if(test_num~=0)
    TestLabel = Test(:,col);  
    Test = Test(:,1:fn);
end

Here is the error: ??? Index exceeds matrix dimensions.

Error in ==> test at 38

TrainingLabel = Training(:,col);

Mola
Keep an eye on your `data` variable. Since you removed `'CollectOutput',1` from textscan, you get cell array with one row and 1730 columns, with each element containing 1 column from your original data. After `data=data';` you get 7130x1 cell array. Since you didn't refer particular column before line 38, it works, but `Training` variable is also cell array with 1 column. At line 38 you are trying to get column 1730, which does not exist. What do you want to do? Separate your data to training and test set? Don't you have to do it in a random way?
yuk
I updated my answer on how to separate training and test data in MATLAB way.
yuk
A: 

Hi Experts, i have changed my ID from Mola to Moladou because my Mola Id is not working Here is my whole code:

clear;
fn=7129;
col=fn+1;
cluster=2;
num_eachClass=19;
row=num_eachClass*cluster;
fid1 = fopen('data', 'r');
txt_format='';
for t=1:col
    txt_format=[txt_format '%g '];
end 
data = fscanf(fid1,txt_format,[col row]);
data = data';
fclose(fid1);

%%-----------  training data and test data ------------
test_num = 0;
train_num = num_eachClass-test_num;
t1=1;
t2=1;
for i=1:row
   t=mod(i-1,num_eachClass); 
   if t<train_num
      Training(t1,:)=data(i,:);
      t1=t1+1;
   else
      Test(t2,:)=data(i,:);
      t2=t2+1;
   end
end
TrainingLabel = Training(:,col);
Training = Training(:,1:fn);
if(test_num~=0)
    TestLabel = Test(:,col);  
    Test = Test(:,1:fn);
end
train_row=t1-1;
test_row=t2-1;
%---------------To scale the data----------------------------------
%---------------normalize  standard deviation to 1 ----------------
% % 
% % for i2=1:fn
% %     mean1=mean(Training(:,i2),1);
% %     std1=std(Training(:,i2),1);
% %     Training(:,i2) = (Training(:,i2)-mean1)/std1;
% %     
% %     if(test_num~=0)
% %        Test(:,i2) = (Test(:,i2)-mean1)/std1;
% %     end
% %     
% % end

%%-------------------initial values-------------------
map_size = 15;
neighborhood = 9 ;
learning_rate = 0.8;
lattice = 'r'; % r : rect , h : hexa
iterations = 100;
%-----------------------------------------------------
%%--- To initiate the map

[Map] = initiate_som_Su(Training,train_row,fn,map_size);

%%--- training process--- 
[Map] = training_Batch(Training,train_row,fn,Map,map_size,neighborhood,lattice,iterations);

%--- calibrate the Map
[Map LabelMap] = calibrateMap(Training,TrainingLabel,train_row,fn,Map,map_size);


%% read the data
sD = som_read_data('data.txt');

% filter columns by variance
%% filter columns by variance
var_values = var(sD.data,1);
var_th = 1e5; % just an example that worked for me
indexToRemove = var_values < var_th;
sD.data(:,indexToRemove) = []; % filter out
sD.comp_names(indexToRemove) = [];
sD.comp_norm(indexToRemove) = [];

%% normalize the data
sD = som_normalize(sD,'var');

%% create, initialize and train a SOM 
sM = som_make(sD,'lettice','hexa');
sM = som_autolabel(sM,sD,'vote'); 

%% U-matrix
U=som_umat(sM,'rect');
% U1=U(1:19,1:19);
figure, som_show(sM,'umat','all') 



%% P-matrix
[pheight rad_real perc] = somvis_p_matrix(sM,sD);
tit = sprintf('P-Matrix (radius = %0.2f)', rad_real);
figure, som_show (sM, 'color', {pheight, tit});

%% U*-matrix
[ustar rad_real perc] = somvis_ustar_matrix(sM, sD);
tit = sprintf('U*-Matrix (radius = %0.2f)', rad_real);
figure, som_show (sM, 'color', {ustar, tit}); 

%---Show the labels of map 
% showMap(Map,LabelMap,map_size,data,col);

%---
LabelMap

you can check to see where the error lies and edit that part for me. Thanks

Did you check your data variable after FSCANF? You read your last text column with %g as floating-point number. FSCANF will read the first 1729 numbers and then stop.
yuk
Stop adding new answers. Edit the question!
yuk
Sorry, i will edit my question next time. Sorry about that.Yes that is what it is exactly doing. FSCANF reads the first 1729 numbers and then stop. Why does this happened and how can i have it read all the data?
This is the error i am having now:.............................??? Error using ==> eigOut of memory. Type HELP MEMORY for your options.Error in ==> showMap at 41[U D]=eig(C);Error in ==> main function at 126showMap(Map,LabelMap,map_size,data,col);
You just probably have too large matrix. Since you went through the data reading, edit this answer and show your final code.
yuk
The code still remains the same. I just Transpose my data and change the number of rows with column and do the corresponding changes in my program and it gives the out of memory error. I don't know why that error and how to resolve it.Any idea please
Do you transpose the data outside of MATLAB? So now it 7129x32? Probably you will have to filter out some rows with low variance, for example. I still didn't understand if you changed the code or not? This is confusing: `The code still remains the same` and `do the corresponding changes in my program`?
yuk
In which line you get the error and what is the exact error?
yuk