tags:

views:

26

answers:

2

Hi,

I am totally stumped. I have a fairly large recursive program written in c that calls cblas_dgemm(). The result is verified independently by a program that works correctly.

C = alpha*A*B + beta*C 

On repeated tests using random matrices and all possible combination of parameters the program gives correct answer ONLY if abs(beta) = 2^n (1,2,4,8..). Any value works for alpha. Any other positive/negative, odd/even value for beta gives correct answer b/w 10-30% of the time.

I am using Ubuntu 10.04, GCC 4.4.x, I have tried system installed blas/cblas/atlas as well as manually compiled atlas.

Any hints or suggestions would be greatly appreciated. I am amazed at the wonderfully generous (and smart) folks lurking at this site.

Thanking you all in advance,

Russ

A: 

Yes, a full example would be handy. Here is an old example I had hanging around using GSL's sgemm variant; should be easy to fix to double. Please try and see if this gives the result shown in the GSL manual:

/* from the gsl info documentation in node 'gsl cblas examples' */
/* compile via 'gcc -o $file $file.c -lgslcblas' */
/* edd 15 Nov 2003 */ 

#include <stdio.h>      
#include <gsl/gsl_cblas.h> 

int   
main (void)    
{     
  int lda = 3; 
  float A[] = { 0.11, 0.12, 0.13,  
                0.21, 0.22, 0.23 };  
  int ldb = 2;                      
  float B[] = { 1011, 1012,  
                1021, 1022,                                                      
                1031, 1032 }; 
  int ldc = 2; 
  float C[] = { 0.00, 0.00,  
                0.00, 0.00 };     
  /* Compute C = A B */ 
  cblas_sgemm (CblasRowMajor,  
               CblasNoTrans, CblasNoTrans, 2, 2, 3,  
               1.0, A, lda, B, ldb, 0.0, C, ldc);  
  printf ("[ %g, %g\n", C[0], C[1]);         
  printf ("  %g, %g ]\n", C[2], C[3]);   

  return 0;    
}          
Dirk Eddelbuettel
s/float/double and s/sgemm/dgemm of course.
Alexandre C.
As soon as i posted the question, i realized i needed to make sure that a direct call to cblas_dgemm() works correctly. So, I tried an example like yours and indeed there are no problems when dgemm is called directly (using identical include/link directives). Which rules out problems with blas/cblas linking errors etc. It makes the situation even more bizarre. The function calling gemm picks various parameters (transpose/lda/beta etc.). AFAIK, these are being picked correctly - works for beta (0,1,2,4,8..). I will try to isolate the problem further. Will post results. thanks. Russ
A: 

Two completely unrelated errors conspired to produce an illusive picture. It made me look for problems in the wrong place.

(1) There was a simple error in the logic of the function calling dgemm. Would have been easily fixed if I was not chasing the wrong problem.

(2) My double-compare function: double version of AlmostEqual2sComplement() (http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm) used incorrect sized integer - resulting in an incorrect TRUE under certain rare circumstances. This was the first time the error bit me!

Thanks again for the useful suggestion of using the scientific method when trying to debug a program.

Russ

Sometimes just showing your problem to someone makes you spot the solution.
Alexandre C.