The benchmark is a 10000x10000 matrix multiply, double precision FP64. Time in milliseconds (lower is better).
The GPU benchmarks are device-only; no host/device memory transfers.
Architecture | Model | Host | Threads | BLAS library | Benchmark time | Notes |
AMD Ryzen | Threadripper 1950X | | 1 | openblas | 117,113 | |
AMD Ryzen | Threadripper 1950X | | 16 | openblas | 8,459 | |
AMD Ryzen | Threadripper 1950X | | 1 | BLIS | 71,201 | |
AMD Ryzen | Threadripper 1950X | | 16 | BLIS | 7,875 | |
nVidia Kepler | Titan Black | | - | cuBLAS | 1,540 | |
Intel Core | i74910MQ | | 1 | openblas | 45,569 | AVX |
Intel Core | i74910MQ | | 4 | openblas | 18,142 | AVX |
Intel Xeon | E5-2660 | dogmatix | 1 | MKL | 120,182 | |
nVidia Kepler | K20M | dogmatix | - | cuBLAS | 1,930 | |
nVidia Kepler | K80 | m3 | - | cuBLAS | 2,167 | |
nVidia Pascal | P100 | m3 | - | cuBLAS | 445 | |
Intel Xeon | Gold 6132 | weiner | 1 | openblas | 42,196 | AXV |
Intel Xeon | Gold 6132 | weiner | 4 | openblas | 11,365 | AVX |
Intel Xeon | Gold 6132 | weiner | 1 | MKL | 20,076 | AVX-512 |
Intel Xeon | Gold 6132 | weiner | 4 | MKL | 5,576 | AVX-512 |
nVidia Volta | V100 | weiner | - | cuBLAS | 308 | |
Intel Xeon | E5-2680v3 | getafix | 1 | reference BLAS | 1,046,023 | *SLOW* |
Intel Xeon | E5-2680v3 | getafix | 1 | MKL | 48,456 | AVX |
Intel Xeon | E5-2680v4 | getafix | 1 | reference BLAS | 853,811 | *SLOW* |
Intel Xeon | E5-2680v4 | getafix | 1 | MKL | 42,385 | AVX |
Intel Xeon | E5-2680v4 | getafix | 4 | MKL | 12,585 | AVX |
Intel Xeon | E5-2680v4 | getafix | 10 | MKL | 5,766 | AVX |
Intel Xeon | E5-2680v4 | getafix | 28 | MKL | 3,285 | hyperthreading |
Intel Xeon | E5-2680v4 | getafix | 14 | MKL | 4,242 | |