http://www.netlib.org/lapack/

The efficiency of LAPACK software depends on efficient implementations of the BLAS being provided by computer vendors (or others) for their machines.

Thus the BLAS form a low-level interface between LAPACK software and different machine architectures.

Above this level, almost all of the LAPACK software is truly portable.

svn co https://icl.cs.utk.edu/svn/lapack-dev/lapack/trunk

https://github.com/biotrump/lapack-netlib

The BLAS as the Key to Portability

http://www.netlib.org/lapack/lug/node65.html

The LAPACK strategy for combining efficiency with portability is to construct the software as much as possible

out of calls to the BLAS (Basic Linear Algebra Subprograms); the BLAS are used as building blocks.

The Level 1 BLAS are used in LAPACK, but for convenience rather than for performance:

they perform an insignificant fraction of the computation, and they cannot achieve high efficiency on most modern supercomputers.

The Level 2 BLAS can achieve near-peak performance on many vector processors,

such as a single processor of a CRAY Y-MP, CRAY C90, or CONVEX C4 machine.

However on other vector processors, such as a CRAY 2, or a RISC workstation or PC with one more levels of cache,

their performance is limited by the rate of data movement between different levels of memory.

This limitation is overcome by the Level 3 BLAS, which perform O(n^3) floating-point operations on O(n^2) data,

whereas the Level 2 BLAS perform only O(n^2) operations on O(n^2) data.