views:

760

answers:

4

I am new to MATLAB, it wasn't in the job description and I've been forced to take over for the person who wrote and maintained the code my company uses. Life's tough.

The guy from which I'm taking over told me that he declared all the big data vectors as global, to save memory. More specifically, so that when one function calls another function, he doesn't create a copy of the data when he passes it over.

Is this true? I read Strategies for Efficient Use of Memory, and it says that

When working with large data sets, be aware that MATLAB makes a temporary copy of an input variable if the called function modifies its value. This temporarily doubles the memory required to store the array, which causes MATLAB to generate an error if sufficient memory is not available.

It says something very similiar in Memory Allocation For Array #Function Arguments:

When you pass a variable to a function, you are actually passing a reference to the data that the variable represents. As long as the input data is not modified by the function being called, the variable in the calling function and the variable in the called function point to the same location in memory. If the called function modifies the value of the input data, then MATLAB makes a copy of the original array in a new location in memory, updates that copy with the modified value, and points the input variable in the called function to this new array.

So is it true that using global can be better? It seems a little sloppy to blithely declare all the large data as global, instead of making sure that none of the code modifies its input argument. Am I wrong? Does this really improve RAM usage?

+4  A: 

In my experience, provided that none of the code modifies the large data, memory usage is the same, regardless of whether you use a global variable or an input argument, just like the Matlab docs say. Further information is in this blog post by a MathWorks employee.

There is quite a bit of folklore on performance issues in Matlab and not all of it is right. The internals of Matlab have changed quite a bit. It may be that in a previous version it's better to use a global variable.

Jitse Niesen
+1 and answer for coolness, for using Loren's blog, too. She's a user here! - http://stackoverflow.com/users/113700/loren
scraimer
+2  A: 

The solution seems a bit strange to me. As you found out already, it shouldn't have significant impact on the memory usage if the called function does not modify the data array. However, if the called function modifies the data array, there's a functional difference: In one case (making the data array global), the change has an impact on the rest of the code, in the other case (passing it as reference) the modifications are only local and temporary.

groovingandi
+3  A: 

I think you pretty much answered your own question, but a couple more references would be good here:

I made a video on this:

http://blogs.mathworks.com/videos/2008/09/16/new-location-and-memory-allocation/

Similar to what Loren spoke of here:

http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/

-Dogu

MatlabDoug
+1  A: 

This answer may be somewhat tangential, but an additional topic that bears mention here is the use of nested functions to manage memory.

As has already been established in other answers, there is no need for global variables if the data you are passing to the function is not modified (since it will be passed by reference). If it is modified (and is thus passed by value), using a global variable instead will save you memory. However, global variables can be somewhat "uncouth" for the following reasons:

  • You have to make a declaration like global varName everywhere you need them.
  • It can be conceptually a little messy trying to keep track of when and how they are modified, especially if they are spread across multiple m-files.
  • The user can easily break your code with an ill-placed clear global, which clears all global variables.

An alternative to global variables was mentioned in the first set of documentation you cited: nested functions. Immediately following the quote you cited is a code example (which I've formatted slightly differently here):

function myfun

  A = magic(500);
  setrowval(400, 0);
  disp('The new value of A(399:401,1:10) is')
  A(399:401,1:10)

  function setrowval(row, value)
    A(row,:) = value;
  end

end

In this example, the function setrowval is nested inside the function myfun. The variable A in the workspace of myfun is accessible within setrowval (as if it had been declared global in each). The nested function modifies this shared variable, thus avoiding any additional memory allocation. You don't have to worry about the user inadvertently clearing anything and (in my opinion) it's a bit cleaner and easier to follow than declaring global variables.

gnovice

related questions