views:

198

answers:

5

I'm writing an application where quite a bit of the computational time will be devoted to performing basic linear algebra operations (add, multiply, multiply by vector, multiply by scalar, etc.) on sparse matrices and vectors. Up to this point, we've built a prototype using C++ and the Boost matrix library.

I'm considering switching to Python, to ease of coding the application itself, since it seems the Boost library (the easy C++ linear algebra library) isn't particularly fast anyway. This is a research/proof of concept application, so some reduction of run time speed is acceptable (as I assume C++ will almost always outperform Python) so long as coding time is also significantly decreased.

Basically, I'm looking for general advice from people who have used these libraries before. But specifically:

1) I've found scipy.sparse and and pySparse. Are these (or other libraries) recommended?

2) What libraries beyond Boost are recommended for C++? I've seen a variety of libraries with C interfaces, but again I'm looking to do something with low complexity, if I can get relatively good performance.

3) Ultimately, will Python be somewhat comparable to C++ in terms of run time speed for the linear algebra operations? I will need to do many, many linear algebra operations and if the slowdown is significant then I probably shouldn't even try to make this switch.

Thank you in advance for any help and previous experience you can relate.

+2  A: 

I don't have directly applicable experience, but the scipy/numpy operations are almost all implemented in C. As long as most of what you need to do is expressed in terms of scipy/numpy functions, then your code shouldn't be much slower than equivalent C/C++.

llasram
+4  A: 

As llasram says, many libs in python are written in C/C++ so python should run at an acceptable speed.

On C++ you can also test gsl (gnu scientific library) but I believe that the routines of linear algebra will be the same as Boost (the two libraries are using BLAS for that). For sparse linear algebra, you should take a look at SBLAS but I never used it. Here's a short general "pros and cons" that I see :

  • C++ :
    • Will force you to keep a well-structured program
    • Can be quite easily wrapped for high level languages (like python) to ensure fast-testing (look at the python c api or at swig).
  • Python :
    • easy to debug but can easily lead to badly-structured programs
    • can very easily import data for tests
    • there are some very reliable libraries like scipy/numpy (by the way, scipy also uses BLAS for linear algebra)
    • managed code

I personnaly use gsl for matrix manipulation and I wrap my C++ libraries into Python libs to test easily with data. On my mind, it's a way of combining the pros of the two languages.

Elenaher
Who says C++ will force you to write a well-structure program? =p
katrielalex
Who says Python is easy to debug? =p
wok
+1  A: 

Speed nowdays its no longer an issue for python since ctypes and cython emerged. Whats brilliant about cython is that your write python code and it generates c code without requiring from you to know a single line of c and then compiles to a library or you could even create a stanalone. Ctypes also is similar though abit slower. From the tests I have conducted cython code is as fast as c code and that make sense since cython code is translated to c code. Ctypes is abit slower.

So in the end its a question of profiling , see what is slow in python and move it to cython, or you could wrap your existing c libraries for python with cython. Its quite easy to achieve c speeds this way.

So I will recommend not to waste the effort you invested creating these c libraries , wrap them with cython and do the rest with python. Or you could do all of it with cython if you wish as cython is python bar some limitations. And even allows you to mix c code as well. So you could do part of it in c and part of it python/cython. Depending what makes you feel more comfortable.

Numpy ans SciPy could be used as well for saving more time and providing ready to use solutions to your problems / needs.You should certainly check them out. Numpy has even has weaver a tool that let you inline c code inside your python code, just like you can inline assembly code inside your c code. But i think you would prefer to use cython . Remember because cython is both c and python at the same time it allows you to use directly c and python libraries.

Kilon
+5  A: 

My advice is to fully test the algorithm in Python before translating it into any other language (otherwise you run the risk of optimizing prematurely a bad algorithm). Once you have clearly defined the best interface for your problems, you can factor it out to external code.

Let me explain.

Suppose your final algorithm consists of taking a bunch of numbers in (row, column, value) format and, say, computing the SVD of the corresponding sparse matrix. Then you can leave the entire interface to Python:

class Problem(object):
   def __init__(self, values):
       self.values = values

   def solve(self):
       return external_svd(self.values)

where external_svd is the Python wrapper to a Fortran/C/C++ subroutine which efficiently computes the svd given a matrix in the format (row, column, value), or whatever floats your boat.

Again, first try to use numpy and scipy, and any other standard Python tool. Only then, after you've profiled your code, should you write the actual wrapper external_svd.

If you go this route, you will have a module which is user friendly (the user interacts with Python, not with Fotran/C/C++) and, most importantly, you will be able to use different back-ends: external_svd_lapack, external_svd_paradiso, external_svd_gsl, etc. (one for each back-end you choose).

As for sparse linear algebra libraries, check the Intel Math Kernel Library, the PARADISO sparse solver, the Harwell Subroutine Library (HSL) called "MA27". I've used them successfully to solve very sparse, very large problems (check the page of the nonlinear optimization solver IPOPT to see what I mean)

Arrieta
+2  A: 

2) Looks like you are looking for Eigen.

3) I would guess that if you are doing sparse linear algebra, rather sooner than later you will want every bit of speed-up you can get so I'd just stick with C++. I don't see a point in using Python for this unless quickly testing a prototype, which you have already done in C++ anyways.

exfizik