tags:

views:

107

answers:

3

I am thinking of writing the data to a file. Does anyone have an example of how to write a big amount of data to a file?

edit 1: Most elements in the matrix are zeroes, others are uint32. I guess the simplest save() and load() would work, as @Jonas suggested.

+1  A: 

If you're concerned with keeping the size of the data file as small as possible, here are some suggestions:

  • Write the data to a binary file (i.e. using FWRITE) instead of to a text file (i.e. using FPRINTF).
  • If your data contains all integer values, convert it to or save it as a signed or unsigned integer type instead of the default double precision type MATLAB uses.
  • If your data contains floating point values, but you don't need the range or resolution of the default double precision type, convert it to or save it as a single precision type.
  • If your data is sufficiently sparse (i.e. there are many more zeroes than non-zeroes in your matrix), then you can use the FIND function to get the row and column indices of the non-zero values, then just save these to your file.

Here are a couple of examples to illustrate:

data = double(rand(16,2^20) <= 0.00001);  %# A large but very sparse matrix

%# Writing the values as type double:
fid = fopen('data_double.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');     %# Write the matrix size (2 values)
fwrite(fid,data,'double');           %# Write the data as type double
fclose(fid);                         %# Close the file

%# Writing the values as type uint8:
fid = fopen('data_uint8.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');    %# Write the matrix size (2 values)
fwrite(fid,data,'uint8');           %# Write the data as type uint8
fclose(fid);                        %# Close the file

%# Writing out only the non-zero values:
[rowIndex,columnIndex,values] = find(data);  %# Get the row and column indices
                                             %#   and the non-zero values
fid = fopen('data_sparse.dat','w');  %# Open the file
fwrite(fid,numel(values),'uint32');  %# Write the length of the vectors (1 value)
fwrite(fid,rowIndex,'uint32');       %# Write the row indices
fwrite(fid,columnIndex,'uint32');    %# Write the column indices
fwrite(fid,values,'uint8');          %# Write the non-zero values
fclose(fid);                         %# Close the file

The files created above will differ drastically in size. The file 'data_double.dat' will be about 131,073 KB, 'data_uint8.dat' will be about 16,385 KB, and 'data_sparse.dat' will be less than 2 KB.

Note that I also wrote the data\vector sizes to the files so that the data can be read back in (using FREAD) and reshaped properly. Note also that if I did not supply a 'double' or 'uint8' argument to FWRITE, MATLAB would be smart enough to figure out that it didn't need to use the default double precision and would only use 8 bits to write out the data values (since they are all 0 and 1).

gnovice
+2  A: 

How is the data generated? How do you need to access the data?

If I calculate correctly, the variable is less than 200MB if it's all double. Thus, you can easily save and load it as a single .mat file if you need to access it from Matlab only.

%# create data
data = zeros(16,2^20);

%# save data
save('myFile.mat','data');

%# clear data to test everything works
clear data

%# load data
load('myFile.mat')
Jonas
+5  A: 

I guess nobody's seen the edit about the zeroes :)

If they're mostly zeroes, you should convert your matrix to its sparse representation and then save it. You can do that with the sparse function.

Code

z = zeros(10000,10000);
z(123,456) = 1;
whos z
z = sparse(z);
whos z

Output

Name          Size                   Bytes  Class     Attributes

  z         10000x10000            800000000  double  

Name          Size               Bytes  Class     Attributes

  z         10000x10000            40016  double    sparse    

I don't think the sparse implementation is designed to handle uint32.

Jacob
Correct on sparse an uint32, but, double should have an acceptable range.
Donnie
Right, I was trying to emphasize that the original data being `uint32` is not going to help.
Jacob