views:

79

answers:

3

I have created a monster, or at least a lot of MATLAB handle classes that point to each other. For instance, an experiment (handle) is a set of tracks (handles) which contain runs (handle) and reorientations (handle). Then the tracks point pack to the experiment that contains them, the runs and reorientations point back to the track they came from, and they also point ahead and behind to the next run & reorientation.

I have come to realize that all this cross pointing around may confuse MATLAB when it comes time to load or save files, so as much as I can I've defined the handles as Transient and used the set.property methods to define the back pointers. For instance

Track < Handle
   properties(Transient = true)
      expt;
   end
end

Experiment
   properties(AbortSet = true)
      track;
   end
   methods 
      function set.track(obj, value)
          if (~isempty(value) && isa(value, 'Track'))
              value.expt = obj;
          end
          obj.track = value;
   end
end

This seems to have sped up loading from disk somewhat, but I think I am still missing things.

I can save an experiment to disk, creating a 48 MB file, in about 7 seconds. But it then takes 3 minutes to load the file from disk. I have tried to use the profiler to locate the slow points, but it reports a total time of ~50 milliseconds.

Questions:

Does anyone have experience with saving handle objects to disk and can recommend general practices to speed up loading?

Is there any way to get the profiler to report what matlab is doing with the other 179.95 seconds or a systematic way to determine what is slowing down the loading without using the profiler?

+2  A: 

I do not save handle objects to disk. Instead, I have custom save/load methods that copy the information in the handle objects to structures for saving, from which I construct the objects and their dependencies on loading.

Thus, loading is reasonably fast, and I can have a patch method that allows me to update the structure (or some of the data contained therein) before I send it to the class constructor.

For the profiler issue: I guess MATLAB is showing this time as 'overhead' somewhere. It is very difficult to track that down in my experience.

Jonas
I think MATLAB supports the conversion to structure with the "saveobj" and "loadobj" methods you can define for each class. The problem I could see is I don't want to have to write these methods for every subclass.
Marc
That's why I created save and load methods in my superclass, which are inherited. If some properties should be handled differently from others, you can either write your methods so that they recognize something different about the methods, or you have a hidden property in each subclass that lists the 'special' properties.
Jonas
but how do you deal with calling the constructor in the loadobj method? E.g. bar < foo. If I define loadobj only in foo, when I try to load a bar from disk, won't I end up with a foo?
Marc
Again, I don't use loadobj methods. My load method looks like this: 1. load the structure. 2. `constructor = str2fun(loadedStruct.class);`, 3. `obj = constructor(loadedStruct)`.In other words, the load method of `foo` loads the structure that contains the information about a `bar`, including a `class`-field with the class name "bar", and then the load-method of `foo` calls the constructor of `bar` with the structure as input.
Jonas
Clever. Actually, I think this will also work in the saveobj/loadobj paradigm, as there's no reason saveobj can't save the classname as well.
Marc
A: 

Have you tried the different options to SAVE such as -v7.3? I believe that there are some differences when using that format.

Edric
Marc
+2  A: 

I haven't worked with handle objects, but in general, there is per-mxarray overhead in saving and loading, so optimizing MAT files is a matter of converting the data in them to a form with fewer mxarrays. An mxarray is a single-level array structure. For example:

strs = {'foo', 'bar', 'baz'};

The strs array contains 4 mxarrays: one cell array and 3 char arrays.

To speed up saving and loading, try doing this when saving, and the inverse when loading. - Convert cellstr to 2-D char - Convert record-organized structs and objects to planar-organized - Eliminate redundant objects by storing a canonical set of values in one array and replacing object instances with indexes in to that array. (This is probably not relevant for handles, which inherently behave this way.)

"Record-organized" means an array of N things is represented as an N-long array of structs with scalar fields; "planar-organized" means it's represented as a scalar struct containing N-long arrays in its fields.

See if you can convert your in-memory object graph to a normalized form that fits in a few large primitive arrays, similar to how you might store it in SQL. The object properties for all the objects in one set of arrays, and the handle relationships as (id, id) tuples held in numeric arrays, maybe using indexes into the property arrays as your object ids.

A saveobj and loadobj defined at the "top" class in your object graph could do the conversion.

Also, if you are using network file systems, try doing your saving and loading on a local filesystem with temporary copies. For reading, copy the MAT file to tempdir and then load() from there; for writing, save() to tempdir and then copy it to the network drive. In my experience, save() and load() are substantially faster with local I/O, enough that it's a big net win (2x-3x speedup) even with the time to do the copies. Use tempname() to pick temp files.

With the profiler, are you using the "-timer real" option? By default, "profile" shows CPU time, and this is I/O-centric stuff. With "-timer real", you should see those other 180 seconds of wall time attributed to save() and load(). Unfortunately, since they're builtins, the profiler won't let you see inside them, and that might not help much.

Andrew Janke
Marc