I'm looking for books or online resources that go in detail over programming techniques for high performance computing using C++.
Take a look at The ADAPTIVE Communication Environment (ACE). It's a library of templates and objects for high performance applications in C++. It has great cross-platform primitives for threading, networking, etc.
The first thing might be reading about MPI(Message Passing Interface) which is the de facto standard in HPC node interconnects.
practically all HPC code I've heard of is either for solving sytems of linear equations or FFT's. Heres some links to start you off at least in the libraries used:
- BLAS - standard set of routines for linear algebra - stuff like matrix multiplication
- LAPACK - standard set of higher level linear algebra routines - stuff like LU decomp.
- ATLAS - Optimized BLAS implementation
- FFTW - Optimized FFT implementation
- PBLAS - BLAS for distributed processors
- SCALAPACK - distributed LAPACK implementation
- MPI - Communications library for distributed systems.
- PETSc - Scalable nonlinear and linear solvers (user-extensible, interface to much above)
The Trilinos suite of libraries and packages offer a broad range of middleware libraries for HPC including sparse, iterative linear solvers; nonlinear solvers; eigen solvers; ODE & DAE integrators including sensitivity analysis; optimization (both invasive and black box); finite element interfaces; mesh interfaces; preconditioners; etc. All of these packages are designed using fairly modern C++ techniques (there are Python APIs as well as some C and Fortran). There used in very large scale parallel (5000+ CPUs) simulations of exceptional consequence (nuclear weapon design) with great success. These packages offer a great suite of capabilities that are much higher level than BLAS, etc.
Despite being 14+ years old, the pioneering work of Expression Templates is still regarded as some of the most exceptional C++ work in years. Fast, efficient, safe... I've used the techniques and they're really remarkable.
Edit: In case the above link remains broken, here's an alternate reference for Expression Templates. This DDJ article cites the original work of Veldhuizen.
Even though not FOSS, the Intel IPP and MKL libraries can really save you a lot of time (both in development and at runtime) if you need to perform any of the operations supported by these libraries (e.g.: signal processing, image processing, matrix math). Of course, it depends on your platform whether you can take benefit from them.
(No: I don't work for Intel, but a happy customer of theirs I am.)
No matter what you write, and how much you design for performance from the beginning, chances are pretty good it will benefit from performance tuning. Usually the bigger the program, the more it will benefit. THIS is a simple, effective way to do that tuning. It is based on "deep sampling", a technique that gives accuracy of diagnosis while de-emphasizing measurement.
You could also look at http://en.wikipedia.org/wiki/Performance_analysis#Simple_manual_technique
Check out the Eigen Vector/Matrix library. The api is very elegant, and the resulting programs are blazing fast (due to explicit vectorization for SSE2 architectures)..
High Scalability - Building bigger, faster, more reliable websites.
And also: