views:

1153

answers:

4

Is there any way to share memory between matlab processes on the same computer?

I am running several matlab processes on a multi-core computer (running windowsif it matters). They all use the same gigantic input data. It would be nice to only have a single copy of it in memory.

Edit: Unfortunately each process needs access to the whole gigantic input data, so there is no way to divide the data and conquer the problem.

+3  A: 

EDIT: Put the data in a raw file and use memmapfile (thanks SCFrench).

============================================

No, there is no real way of doing it.

My top two solutions have been: buy more RAM or page in the data.

The closest thing you could do would be to use a mex function to allocate shared memory, then allow successive calls to the mex function to extract out smaller slices of the memory. You wouldn't want to wrap the shared memory as a Matlab array (because Matlab's memory model wouldn't handle it well).

I was going to suggest looking into memmap, but apparently it's problematic.

Sometimes you can first run one Matlab program to pre-process or split up the data into smaller chunks. Then each of the Matlab processes can operate on its own smaller chunk.

Here's a tutorial on dealing with large datasets in Matlab.

Mr Fooz
Not the answer I wanted - I was wishing for a 'Yes it's possible, do this'. But thanks a lot for the link, I'm going to read it right now.
AnnaR
I did a little more poking around and maybe this one does it: http://polaris.cs.uiuc.edu/matmarks/
Mr Fooz
I have posted an update to the comp.soft-sys.matlab news thread linked above to the word "problematic". It turns out this was a bug in older versions of MATLAB, and is fixed as of R2008a.
SCFrench
+1  A: 

Probably not, at least not in the way where you treat the data like a regular MATLAB variable.

If on a Windows machine, you could create a COM/ActiveX wrapper to access your shared data. MATLAB allows the use of COM objects through the actxserver function. But it's questionable whether you could actually access the data "directly" through different processes. There's some kind of marshaling layer between MATLAB and COM and data gets converted, at least according to the Mathworks docs on exchanging data between MATLAB and COM. If I absolutely had to share structured data between processes, with fast access, on a Windows machine, I'd probably write something in C++ to use shared memory via Boost::interprocess and wrap access to it in an in-process COM server (DLL). I've done this before, once. As much as Boost::interprocess makes it a lot easier, it's a pain.

The Java approach (since MATLAB runs on top of Java) would be much more promising, but as far as I know, there aren't any decent Java libraries to provide access to shared memory. The closest thing is probably to use a memory-mapped file via java.nio.MappedByteBuffer, but that's really low-level. Still, if your data is in a relatively "square" form (e.g. a big 2-D or 3-D or 4-D matrix of homogeneously-sized data) this might work OK.

You could try to use HDF5 files, MATLAB has built-in HDF5 support and it's "relatively" fast. But from my experience, HDF5 doesn't seem to play very well with concurrency. (at least not when one process is writing and the others are readers. If there are multiple readers and no writers, it works just fine.)

Jason S
+2  A: 

If the processes only ever read the data, but do not modify it, then I believe you can place your input data into one large file and have each process open and read from that file. Each process will have it's own file position indicator that it can move anywhere in the file to read the data it needs. I tested having two MATLAB processes reading simultaneously from a file a million or so times each and everything seemed to work fine. I only used basic file I/O commands (listed below). It appears you could also do this using MEMMAPFILE, as Mr Fooz mentioned in his answer (and SCFrench in a comment), assuming you have MATLAB version R2008a or newer.

Here are some of the file I/O commands that you will likely use for this:

  • FOPEN: Each process will call FOPEN and return a file identifier it will use in all subsequent calls. You can open a file in either binary or text mode:

    fid = fopen('data.dat','r');   % Binary mode
    fid = fopen('data.txt','rt');  % Text mode
    
  • FREAD: In binary mode, FREAD will read data from the file:

    A = fread(fid,20,'double');  % Reads 20 double-precision values
    
  • FSCANF: In text mode, FSCANF will read and format data from the file:

    A = fscanf(fid,'%d',4);  % Reads 4 integer values
    
  • FGETL/FGETS: In text mode, these will read whole lines from the file.

  • FTELL: This will tell you the current file position indicator in bytes from the beginning of the file:

    ftell(fid)
    ans =
         8    % The position indicator is 8 bytes from the file beginning
    
  • FSEEK: This will set the file position indicator to a desired position in the file:

    fseek(fid,0,-1);  % Moves the position indicator to the file beginning
    
  • FCLOSE: Each process will have to close its access to the file (it's easy to forget to do this):

    fclose(fid);
    

This solution will likely require that the input file has a well-structured format that is easy to traverse (i.e. just one large matrix). If it has lots of variable length fields then reading data from the correct position in the file could get very tricky.


If the processes have to also modify the data, this could get even more difficult. In general, you don't want a file/memory location being simultaneously written to by multiple processes, or written to by one process while another is reading from the same location, since unwanted behavior can result. In such a case, you would have to limit access to the file such that only one process at a time is operating on it. Other processes would have to wait until the first is done. A sample version of code that each process would have to run in such a case is:

processDone = false;
while ~processDone,
  if file_is_free(),  % A function to check that other processes are not
                      %   accessing the file
    fid = fopen(fileName,'r+');  % Open the file
    perform_process(fid);        % The computation this process has to do
    fclose(fid);                 % Close the file
    processDone = true;
  end
end

Synchronization mechanisms like these ("locks") can sometimes have a high overhead that reduces the overall parallel efficiency of the code.

gnovice
Wow! I'll try this out. This could possibly solve my problem.
AnnaR
A: 

You may want to checkout my Matlab file-exchange submission "sharedmatrix" #28572. It allows a Matlab matrix to exist in shared memory, provided you are using some flavor of Unix. One could then attach the shared matrix in a body of a parfor or spmd, ie,

shmkey=12345;
sharedmatrix('clone',shmkey,X);
clear X;
spmd(8)
    X=sharedmatrix('attach',shmkey);
    % do something with X
    sharedmatrix('detach',shmkey,X);
end
sharedmatrix('free',shmkey);

Since X exists in shared memory for the body of the spmd (or parfor) it has no load time and no communication time. From the perspective of Matlab it is a newly created variable in the spmd (or parfor) body.

Cheers,

Josh

http://www.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix