views:

357

answers:

5

Below is my C wrapper for a Fortran ZHEEVR routine from well-known LAPACK numerical library:

void zheevr(char jobz, char range, char uplo, int n, doublecomplex* a, int lda, double vl, double vu, int il, int iu, double abstol, double* w, doublecomplex* z, int ldz, int* info)
{
    int m;
    int lwork = -1;
    int liwork = -1;
    int lrwork = -1;
    int* isuppz = alloc_memory(sizeof(int) * 2 * n);
    zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, small_work_doublecomplex, &lwork, small_work_double, &lrwork, small_work_int, &liwork, &info);
    lwork = (int) small_work_doublecomplex[0].real;
    liwork = small_work_int[0];
    lrwork = (int) small_work_double[0];
    doublecomplex* work = alloc_memory(sizeof(doublecomplex) * lwork);
    double* rwork = alloc_memory(sizeof(double) * lrwork);
    int* iwork = alloc_memory(sizeof(int) * liwork);
    zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, work, &lwork, rwork, &lrwork, iwork, &liwork, info);
    free(iwork);
    free(rwork);
    free(work);
    free(isuppz);
}

This function is called hundreds of thousands of times in my application, to diagonalize the complex matrix "a" (parameter names follow the Fortran convention for this function) for the same matrix size. I think that the work arrays sizes will be the same most of the time, as the diagonalized matrices will be of the same structure. My questions are:

  1. Can the repeated alloc/free ("alloc_memory" is a simple wrapper around glibc's malloc) calls hurt performance, and how badly?
  2. Does the order of free's matter? Should I free the last allocated array first, or last?
+2  A: 

It certainly will affect performance - how much youb can only find out for sure by timing. To create a version that avoids most allocations, allocate to a static pointer and remember the size in another static integer. If the next call uses the same size, just reuse what was allocated last time. Only free anything when you need to create a new matrix because the size has changed.

Note this solution is only suitable for single-threaded code.

anon
I have already implemented this solution in a wrapper around another LAPACK function which needs less work arrays (just 2 allocations per call), ZHEEV. Suprisingly, there is no performance difference between "naive" allocations and static arrays.
quant_dev
Depending on the platform, the C malloc library may already do something like this. Google for 'Doug Lea Allocator'.
Don Wakefield
I am using GNU libc 2.3.4.
quant_dev
+1  A: 

Alright. You are going to get the profiler answer anytime soon. If you have an AMD machine, I strongly recommend the free AMD's CodeAnalyst.

As for your memory problem, I think that you could work with local memory management in this case. Just determine the maximum number of memory that you can allocate for this function. Next you declare a static buffer and you work with it a bit like how a compiler handles the stack. I did a wrapper like this over VirtualAlloc once and it's VERY fast.

toto
+5  A: 

1) Yes they can.

2) Any sane libc shouldn't worry about order of free(). Performance wise that shouldn't matter too.

I'd recommend removing memory management from this function -- so caller will be supplying the matrix size and allocated temporary buffers. That'll cut number of mallocs significantly if this function is called from same place on matrix of the same size.

HMage
Thanks for 2), I had my doubts about that. The problem with moving memory management elsewhere is that the code starts to look really ugly.
quant_dev
It's possible to write a sane memory allocator with a fast block-merge test at free(), that looks to see whether the "next" block is already free, and if so merges. Order of freeing would then make a slight difference to fragmentation, since if you free in reverse order, the blocks will be merged immediately. Whether anyone has ever done that in a libc implementation is another question, but it's not insane IMO.
Steve Jessop
+1  A: 

If you are allocating the same size item hundreds of thousands of times, then why not just maintain a heap of your objects (since these seem to be relatively simple, i.e. don't contain pointers to other allocated memory) and free onto your own heap (or stack actually)?

The heap can lazily allocate new objects using the glib malloc, but when freeing just push the item onto the heap. When you need to allocate, if there is a freed object available it can just allocate that one.

This will also save you multiple calls to allocation (since you won't need to do any allocation and it looks like your routine makes several calls to malloc) and will also avoid fragmentation (to some extent) at least on the re-used memory. Of course the initial allocations (and other allocations as the program is running when it needs to expand this memory) may cause fragmentation, but if you are really worried about this you can run some stats and find the average/max/typical size of your heap during runs and pre-allocate this at once when the program starts up, avoiding fragmentation.

Larry Watanabe
+3  A: 
  • Can you use C99? (Answer: yes, you already are using C99 notations - declaring variables when needed.)
  • Are the sizes of the arrays sane (not too huge)?

If both answers are 'yes', consider using VLA's - variable length arrays:

void zheevr(char jobz, char range, char uplo, int n, doublecomplex* a, int lda, double vl, double vu, int il, int iu, double abstol, double* w, doublecomplex* z, int ldz, int* info)
{
    int m;
    int lwork = -1;
    int liwork = -1;
    int lrwork = -1;
    int isuppz[2*n];
    zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, small_work_doublecomplex, &lwork, small_work_double, &lrwork, small_work_int, &liwork, &info);
    lwork = (int) small_work_doublecomplex[0].real;
    liwork = small_work_int[0];
    lrwork = (int) small_work_double[0];
    doublecomplex work[lwork];
    double rwork[lrwork];
    int iwork[liwork];
    zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, work, &lwork, rwork, &lrwork, iwork, &liwork, info);
}

One nice thing about using VLAs is that there is no freeing to be done by you.

(Untested code!)

Jonathan Leffler
Thanks! I didn't know I can do such things in C99. Can I safely pass such an array to Fortran code?
quant_dev
Maybe... Maybe not... The stack is just another memory area but there could be problems. You'll have to test it.One thing's for sure, if it works well with Stack memory, you can't get any faster than this.
toto
I have tested it and it works OK with Fortran, of course I cannot use it for very large matrices.
quant_dev
On Windows I know that you can set a Stack size at compile time (I never needed that thought), you might want to check that out.
toto