BLAS Library Benchmark
Is there a benchmark that compares the different BLAS (Basic Linear Algebra Subprograms) libraries? I am especially interested in sparse matrix multiplication for single- and multi-core systems? ...
Is there a benchmark that compares the different BLAS (Basic Linear Algebra Subprograms) libraries? I am especially interested in sparse matrix multiplication for single- and multi-core systems? ...
Dear all I have a project written in C# where I need to do various linear algebraic operations on matrices (like LU-factorization). Since the program is mainly a prototype created to confirm a theory, a C# implementation will suffice (compared to a possibly speedier C++ one), but I would still like a good BLAS or LAPACK library availab...
I am looking for algorithm to solve the following problem : I have two sets of vectors, and I want to find the matrix that best approximate the transformation from the input vectors to the output vectors. vectors are 3x1, so matrix is 3x3. This is the general problem. My particular problem is I have a set of RGB colors, and another se...
I know that Blitz++ gets its performance plus by extensive usage of expression templates and template metaprogramms. But at some point you can't get more out of your code by using these techniques - you have to multiply and sum some floats up. At this point you can get a final performance kick by using the highly optimized (especially fo...
I have noticed that MATLAB provides the BLAS and LAPACK headers among others: $ ls ${MATLAB_DIR}/extern/include/ blas.h engine.h lapack.h mat.h mclmcr.h mex.h mwutil.h blascompat32.h fintrf.h libmatlbm.mlib matrix.h mclmcrrt.h mwdebug.h tmwtypes.h emlrt.h ...
Has anyone had experience using prefetch instructions for the Core 2 Duo processor? I've been using the (standard?) prefetch set (prefetchnta, prefetcht1, etc) with success for a series of P4 machines, but when running the code on a Core 2 Duo it seems that the prefetcht(i) instructions do nothing, and that the prefetchnta instruction i...
Is there an equivalent of dgemm (from BLAS) for integral types? I only know of dgemm, sgemm for double precision / single precision matrices, but would like to have it for matrices that are of integral type such as int (or short int...). Note: I'm not looking for a solution that involves converting to float/double, and am looking for a ...
How large a system is it reasonable to attempt to do a linear regression on? Specifically: I have a system with ~300K sample points and ~1200 linear terms. Is this computationally feasible? ...
I am using R. I want to run prcomp on a matrix. The code works fine with one installation of R on a Linux box but breaks on another identical (or so I thought) installation of R on a different Linux box. The codes are dataf = read.table("~/data/testdata.txt") pca = prcomp(dataf) The error msg on the bad instance is > dataf = read.tab...
I'm trying to use CMake to build a program relying on blas, I'm detecting blas using : include (${CMAKE_ROOT}/Modules/FindBLAS.cmake) The problem is, FindBLAS require a fortran compiler and complain with -- Looking for BLAS... - NOT found (Fortran not enabled) As blas is already installed on my machine (ATLAS Blas), and gfortran is...
I'm wondering about Nvidia's CUBLAS Library. Does anybody have experience with it? For example if I write a C program using BLAS will I be able to replace the calls to BLAS with calls to CUBLAS? Or even better implement a mechanism which let's the user choose at runtime? What about if I use the BLAS Library provided by Boost with C++? ...
Background: I am working on a project written in a mix of C and Fortran 77 and now need to link the LAPACK/BLAS libraries to the project (all in a Linux environment). The LAPACK in question is version 3.2.1 (including BLAS) from netlib.org. The libraries were compiled using the top level Makefile (make lapacklib and make blaslib). Probl...
R has a qr() function, which performs QR decomposition using either LINPACK or LAPACK (in my experience, the latter is 5% faster). The main object returned is a matrix "qr" that contains in the upper triangular matrix R (i.e. R=qr[upper.tri(qr)]). So far so good. The lower triangular part of qr contains Q "in compact form". One can extra...
I have a program that runs through R but uses the BLAS routines. It runs through correctly about 8 times but then throws an error: BLAS/LAPACK routine 'DGEMV ' gave error code -6 What does this error code mean? ...
I am trying to use the C++ armadillo library (armadillo-0.9.10) on a Mac Pro. I follow the manual installation instruction in the README.txt file. I have modified the config.hpp file to indicate that I have LAPACK and BLAS installed. I then try to compile the examples. I successfully compile and run example1.cpp, but when I try to run...
I think I've found some gems in the iPhone OS (iOS 4). I found that there're 128-bit, 256-bit, 512-bit and 1024-bit integer data types, provided by the Accelerate Framework. There're also Apple's implementation of Basic Linear Algebra Subprograms (BLAS), Apple's implementation of LAPACK (Linear Algebra PACKage), and Digital Signal Proce...
A is an MxK matrix, B is a vector of size K, and C is a KxN matrix. What set of BLAS operators should I use to compute the matrix below? M = A*diag(B)*C One way to implement this would be using three for loops like below for (int i=0; i<M; ++i) for (int j=0; j<N; ++j) for (int k=0; k<K; ++k) M(i,j) = A(i,k)*B(...
Hi there. I'm putting together some scientific code in Fortran 77, and I am having a debate on what would be faster. Basically, I have an MxN matrix, let's call it A. M is larger than N. Later on in the code, I need to multiply transpose(A) by a bunch of vectors. My question is, would it be faster to take A, transpose it on my o...
Hi, I am totally stumped. I have a fairly large recursive program written in c that calls cblas_dgemm(). The result is verified independently by a program that works correctly. C = alpha*A*B + beta*C On repeated tests using random matrices and all possible combination of parameters the program gives correct answer ONLY if abs(beta) ...
I am trying to compute: C = 1*(A*B') + 0*C using cblas_dgemm(). As far as I can tell, the parameters are correct. The error message itself does not make sense: "ldb must be >= MAX(K,1): ldb=3 K=3Parameter 11 to routine cblas_dgemm was incorrect" But, ldb = k = 3! Here is the detailed output of all three matrices and the parameters. ...