tags:

views:

169

answers:

1

Hi,

I have a program using OpenMP to parallelize a for-loop. Inside the loop, the threads will write to shared variable, so I need to synchronize them. However I can sometimes get either segment fault or double free or corruption error. Anyone knows what happens? Thanks and regards! Here is the code:

void KNNClassifier::classify_various_k(int dim, double *feature, int label, int *ks, double * errors, int nb_ks, int k_max) {   
  ANNpoint      queryPt = 0;    
  ANNidxArray   nnIdx = 0;      
  ANNdistArray  dists = 0;     

  queryPt = feature;      
  nnIdx = new ANNidx[k_max];                
  dists = new ANNdist[k_max];               

  if(strcmp(_search_neighbors, "brutal") == 0) {// search  
    _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);  
  }else if(strcmp(_search_neighbors, "kdtree") == 0) {  
    _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);  // double free or corruption
  }  

  for (int j = 0; j < nb_ks; j++)  
  {  
    scalar_t result = 0.0;  
    for (int i = 0; i < ks[j]; i++) {          
        result+=_labels[ nnIdx[i] ];  // Segmentation fault
    }  
    if (result*label<0)  
    {  
    #pragma omp critical  
    {  
      errors[j]++;  
    }  
    }  

  }  

  delete [] nnIdx;  
  delete [] dists;  

}

      void KNNClassifier::tune_complexity(int nb_examples, int dim, double **features, int *labels, int fold, char *method, int nb_examples_test, double **features_test, int *labels_test) {    
          int nb_try = (_k_max - _k_min) / scalar_t(_k_step);    
          scalar_t *error_validation = new scalar_t [nb_try];    
          int *ks = new int [nb_try];    

          for(int i=0; i < nb_try; i ++){    
            ks[i] = _k_min + _k_step * i;    
          }    

          if (strcmp(method, "ct")==0)                                                                                                                     
          {    

            train(nb_examples, dim, features, labels );// train once for all nb of nbs in ks                                                                                                

            for(int i=0; i < nb_try; i ++){    
              if (ks[i] > nb_examples){nb_try=i; break;}    
              error_validation[i] = 0;    
            }    

            int i = 0;    
      #pragma omp parallel shared(nb_examples_test, error_validation,features_test, labels_test, nb_try, ks) private(i)    
            {    
      #pragma omp for schedule(dynamic) nowait    
              for (i=0; i < nb_examples_test; i++)         
              {    
                classify_various_k(dim, features_test[i], labels_test[i], ks, error_validation, nb_try, ks[nb_try - 1]); // where error occurs    
              }    
            }    
            for (i=0; i < nb_try; i++)    
            {    
              error_validation[i]/=nb_examples_test;    
            }    
          }

          ......
     }

UPDATE:

As in my last post http://stackoverflow.com/questions/2182004/double-free-or-corruption, the code runs fine with single-thread but gives runtime errors for multi-thread. The error changes from time to time. If I run it twice, one will be segfault, and the other will be double free or corruption.

+3  A: 

Let's take a look at your segmentation fault line:

result+=_labels[ nnIdx[i] ];

result is local -- OK.

nnIdx is local -- also OK.

i is local -- still OK.

_labels ... what is it?

Is it global? Did you define access to it via #pragma shared?

Same goes for the former:

_search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);

Seems as we have here a problem that is not easily solvable -- _search_struct is not thread safe -- probably values in it are modified by threads at once. You have to have a dedicated _search_struct per-thread, probably by allocating it in classify_various_k.

The really bad news however is that ANN is probably completely non-threadable:

The library allocates a small amount of storage, which is shared by all search struc- tures built during the program’s lifetime. Because the data is shared, it is not deallocated, even when the all the individual structures are deleted.

As seen above there'll always be problems with parallel data modification, because the library itself has some shared data -- hence it's not thread-safe itself :/.

Kornel Kisielewicz
_labels is a data member of class KNNClassifier. It is not explicity declared as shared foe openmp. Also if I run my progrom twice, one will be segfault, ang the other will be double free or corruption as in my last post. i.e. the error is not fixed.
Tim
_search_struct->annkSearch(queryPt, k_max, nnIdx, dists, _eps) is to find the k_max nearest neighbor for a query point *feature among some points that have been organized in k-d tree with *_search_struct. The indices of the k-n neighbors are stored in nnIdx, and their distances to the query point in dists.
Tim
@Tim, updated answer.
Kornel Kisielewicz
Thanks. I perhaps should not use that library if I really want to go parallel.
Tim