Custom compiler flags
The default compiler options are designed for a high level of optimization with the gcc compiler. For other compilers, you might get better results with some customization. To set the compiler flags, either set the environment variable CXXFLAGS
or add CXXFLAGS=...
when invoking configure
.
The default is equivalent to
export CXXFLAGS="-O2 -march=native -flto"
Specifying a different architecture
By default, the tools will be compiled for the same architecture of the machine that the compiler is running on. This means that if you compile the toolkit on a modern machine that has some recent CPU extensions (for example
Advanced Vector eXtensions), and then attempt to run the tools on an older machine that doesn't have these extensions, it will fail with an "Illegal Instruction" error. To work around this, you need to specify the target architecture by hand, with the -march=xxxx
compiler flag (or whatever is appropriate for your compiler). For GCC, the possible architectures are listed at
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html. A reasonably safe architecture would be core2
, eg
export CXXFLAGS="-O2 -march=core2 -mtune=native"
Installing to a different location
make install
will copy the executable files to the PREFIX/bin directory specified in the configure script. By default, the PREFIX directory is $HOME
. This is what you want if you are installing a personal copy of the toolkit and you only have write access to your home directory.
If you want to change the default, for example to install the executable files into /usr/local/bin
, use
../mptoolkit/configure --prefix=/usr/local
Debugging
By default, the toolkit will be compiled with no debug information, so stack traces etc will not be useful. To enable debugging, use the configure option --enable-debug
. Note that this also disables optimizations and also enables a lot of debugging checks in the toolkit. It will run much slower, and produce a lot of extra output too.
You can also use --enable-debug=info
to get just basic debugging. This is equivalent to adding -g
to the compiler flags.
You can also use --enable-debug=profile
to set compiler options appropriate for profiling the toolkit with gprof.
Optimized BLAS libaries
The configure script attempts to auto-detect the BLAS and LAPACK libraries, but it will often fail to autodetect an optimized BLAS library, especially if it is installed in a non-standard location. The difference in speed between the reference BLAS library and an optimized BLAS library such as MKL is typically around a factor 4 or more (much more if you also use multi-threading).
To use a specific BLAS libary, use the option --with-blas=...
when invoking configure
. The ...
will typically be something like -L/path/to/optimized/blas -lname_of_library
.
To use MKL, use the option --with=blas="..."
, where the information in ...
is taken from the Intel link line advisor, https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
For example, a typical command for single-threaded MKL (which strangely still requires libpthread), use
../mptoolkit/configure --with-blas="-Wl,--no-as-needed -L/opt/intel/composerxe/mkl/lib/intel64 \
-lmkl_gf_lp64 -lmkl_core -lmkl_sequential -lpthread -lm"
note 1: I used -lmkl_gf_lp64
here, not -lmkl_intel_lp64
. This is very important, as the two versions of MKL use a different way of passing complex values from functions. The toolkit will detect automatically which convention to use, but if ARPACK is compiled with gfortran then you must use the gf version of the library, or ARPACK will not work! The gfortran version will report
checking convention for returning complex values from Fortran functions... return_in_register
whereas using the intel version of MKL gives
checking convention for returning complex values from Fortran functions... pass_as_first_argument
If you get problems such as a segmentation fault or a program hanging inside zdotc (or some similar BLAS function) then it is most likely a problem of wrong MKL version.
note 2: verify in the output of configure
that it really is using the BLAS library that you specified -- if the configure script is unable to run a program using the specified BLAS library it will keep searching, and possibly use the wrong BLAS library!
note 3: Versions MKL prior to around 2019.5 have serious bugs in the SVD and eigensolvers in the LAPACK functions. Symptoms are the toolkit hanging at the start of iDMRG or at the end (while orthogonalizing the MPS), or extreme inaccuracies with TEBD time evolution. A possible workaround is to use a different LAPACK library with MKL BLAS. To do this, specify --with-lapack=LIBRARY
when configuring.
Which BLAS library to choose?
- Most Linux machines will come with either the 'reference' versions of BLAS/LAPACK or the openblas libaries. Avoid the 'reference' versions at all costs - it will work, but it will be much slower than an optimized library. Openblas has quite good single-thread performance, but the multi-thread performance is generally very bad - for the toolkit, it will often be slower to run with >1 thread than for single thread! So we recommend setting the environment variable OPENBLAS_NUM_THREADS=1 when using openblas.
- On recent Intel CPU's, the AVX-512 instruction set makes a big improvement to floating-point performance, typically a factor 2 or more compared with older CPUs. MKL and Openblas take advantage of this instruction set. AMD doesn't implement the AVX-512 instructions, although in most other respects the new Ryzen architecture is faster than the current Intel offerings (as of late 2017).
- For Intel machines, use either openblas or MKL. Depending on the workload, MKL can give quite good multi-thread performance.
- For AMD machines, we recommend the BLIS library (replaces BLAS) and FLAME (replaces LAPACK) as higher performance libraries (about 60% faster than openblas, in some benchmark tests). These libraries need to be installed from the github source; as far as I know there are no pre-built packages available for linux distributions.