How do you organize your Matlab code? I've come into ownership of several thousand lines of Matlab code, some as >900 line functions and a few directories full of function_name.m
files. It's hard to figure out what everything is doing (or relating to) or figure out the dependencies. I'd like to reorganize things as I figure out what does what, but Matlab limits what's possible. What would you suggest?
views:
656answers:
6Port to NumPy.
(Joke.)
Usually in Matlab you have some files written as functions, and some as scripts. Scripts do things like load the data you want to process, and feed it to the functions, and graph it.
To organize things I would start at the top level script and find out which functions do the loading, graphing, processing, etc. Keep the scripts in a top level directory and try to separate the functions out into subdirectories, according to the purpose of the function. Put dependencies of a function into the same subdirectory. Try to make it so that no code in a directory depends on anything in a parent directory (or cousin directory).
Whenever you figure out what a function does and what its arguments are, write a doc comment.
This assumes the person who wrote the code was reasonable. If not, Matlab makes it easy to plunk everything down into one directory and have everything depend on everything else in a rickety tower of code, so you may end up doing a lot of refactoring.
I have had to deal with this problem many times in my various roles at The MathWorks. This is what I do for the big pieces of MATLAB code:
- Back it up, maybe twice!
- Select all, Ctrl-I to smart indent
Select all, Ctrl-J to wrap comments
If I am feeling paper-based- Print all the files out, and get a set of highlighters- follow manually, highlighting long term variables and important function calls.
~~~ AND / OR ~~~
5 If I am feeling lucky, start running the code in the debugger, stepping through one line at a time (stepping into subfunctions that were user written)
At this point, I can go through and follow a typical flow through the control structure. I may not have a great idea what everything does, but I have a decent idea of what is going on.
Normally, my goal is to find a bug, solve it and move on. Your goals might be completely different. This is the method that I have used to quickly comprehend hundereds of different pieces of MATLAB code that I have been sent over the years.
I agree with most of the comments about Matlab not being terribly supportive of modern software source code structuring but I don't believe it's too difficult to impose some of your own structure with a little discipline.
Organise your source files into a hierarchy of directories, as you would the source files for any program written in another programming language. You don't need to stick to a hierarchy, choose your own structure if you wish. Use the setpath command (or whatever the heck it is called) to tell Matlab where to look for your m files when you are working.
Acquaint yourself with the Matlab profiler tool which can give you call graphs (not terribly graphically, more like gprof's call graphs) which is some help in deciphering spaghetti code.
Of course, all our m files are in the repository and we serve them out of that. We keep a private toolbox on one of our networked drives and all users can call the 'released' code in that toolbox directly.
Back everything up is right. Create a pristine tarball of the original source tree, and then throw it all in source control so you can track and roll back your changes.
Have a look at Matlab's depfun() and depdir(), which detect static dependencies. It could help you see dependencies between Matlab functions. With "depfun -toponly" on all the files and a little string munging, you could build a list of immediate dependencies and throw that in a GraphViz file to produce a big directed graph of your codebase's call connections. Clusters in the graph could be a good place to divide the code around. (EDIT: See Jonas's solution; looks like m2html does this for you.)
If you have a lot of latitude to rewrite the code, consider rewriting some of the code as objects, using stateless utility classes with class methods and private functions as ways of packaging related functions together and providing some encapsulation. I've worked with largish Matlab codebases organized this way, and it works all right. In classic Matlab, classes are your only way of doing some sort of packages. I believe Matlab's newer OO system has namespace support, too.
If you don't want to convert the code to OO, you can organize related functions in subdirectories. That helps to organize it for source code browsing at least.
All the functions should have some doco in Matlab's standard helptext format, including an H1 line. If they don't, stick the comments on what you learn there. Then use the "contentsrpt" tool to automatically generate table of contents files for the classes or directories.
Good luck.
Does your code come with decent help text? In that case, m2html is going to be a great help, since it allows you to create linked html help for easy browsing.
Furthermore, it allows you to make dependency graphs, which help you understand a bit more how you may want to organize the code.
MATLAB Programming Style Guidelines by Richard Johnson is a good resource.