views:

68

answers:

1

Hi. For my work I have to set up a project in Matlab, which is not my language of choice and I have some questions regarding efficiency.

I am currently dealing with a collection of points with several properties. Rather than putting all of these in separate arrays of equal length I would much prefer to make a single array of Point objects, using Matlab's user defined classes. For example:

% Point.m
classmethod Point < handle
  properties
    x, y, prop1, prop2
  end
end

% script.m
... % define x(100), y(100), prop1(100), prop2(100)
points(100) = Point; % this seems to be the way to allocate an object vector
for i = 1:100
  points(i).x = x(i); % copy values into object
  points(i).y = y(i);
  points(i).prop1 = prop1(i);
  points(i).prop2 = prop2(i);
end

The reason that I prefer the above is both aesthetic (objects should be objects) and practical, as it allows me to easily create subsets of points without having to index several different arrays.

However I wonder if it is the most efficient way of doing things, considering that the set of points might grow quite large in the order of thousands or tens of thousands of points. My main questions are:

  1. For my understanding: how does Matlab store object arrays in memory? How does it handle varying object size dependent of prop1 being, for instance, a struct?
  2. How does this affect operations like [points.x], something that I would need to do quite often? Is this considered an efficient operation?
  3. Is there a better way to initialize the object array? The above loop construction seems highly inefficient.
  4. I suppose it should be possible to simulate object-like behaviour while storing properties more favourably, perhaps by overloading subsref. Would you recommend that?

Or to put things more general: what would be the optimal way of organizing my points?

Looking forward to your advice!

+8  A: 

Not really answering your questions in order, but here's some hopefully useful information:

  1. Objects are stored in memory in the same way as structures - each field is its own fully-fledged MATLAB array (mxArray to C-Mex programmers), so the size of each field can be independent.
  2. I would probably make something like a single PointList object with fields x, y, prop1, prop2. These fields would then be vectors of the appropriate length. This will almost certainly be more efficient than a list of Point objects. It will certainly take less memory.
  3. You should define accessor methods on PointList to ensure that your vectors are always the same lengths etc.
  4. If you really wanted to, you could have your PointList have a "capacity" that is larger than the number of elements currently stored in it - that way, you could avoid resizing x, y, ... all the time
  5. In general, overloading subsref is not for the faint-hearted. Bear in mind that you also need to correctly overload at least numel, ndims, length, end and size too.
Edric
+1 for very detailed answer. Are you working for MathWorks?
Mikhail
Yes I am - but this information is all in the doc one way or another.
Edric
+1 useful information. For #4, take a look at this question: http://stackoverflow.com/questions/1548116/matrix-of-unknown-length-in-matlab
Amro
@Edric: then maybe you can answer the question what is wrong with handle object performance: http://stackoverflow.com/questions/1446281/matlabs-garbage-collector
Mikhail
Thanks Edric that was very insightful. Based on what you write I will drop the object array and focus on a single PointList approach. Rather than making separate vectors x y etc though I am considering to put them side by side in a single matrix and use accessor methods to single out the requested columns; this implies consistency of lengths and more importantly allows me to make point subsets by indexing a single matrix. Indeed for efficiency I will preallocate space in chunks, thanks also to Mikhail for that reference. Many thanks!
gertjan
@Mikhail I don't have anything to add about GC - and I see that you've already seen stuff posted in various places by other MathWorkers far more knowledgeable than I am on that topic.
Edric