Recent Changes - Search:

HomePage

PmWiki

pmwiki.org

Benchmarks

The benchmark is a 10000x10000 matrix multiply, double precision FP64. Time in milliseconds (lower is better). The GPU benchmarks are device-only; no host/device memory transfers.

ArchitectureModelHostThreadsBLAS libraryBenchmark timeNotes
AMD RyzenThreadripper 1950X 1openblas117,113 
AMD RyzenThreadripper 1950X 16openblas8,459 
AMD RyzenThreadripper 1950X 1BLIS71,201 
AMD RyzenThreadripper 1950X 16BLIS7,875 
nVidia KeplerTitan Black -cuBLAS1,540 
Intel Corei74910MQ 1openblas45,569AVX
Intel Corei74910MQ 4openblas18,142AVX
Intel XeonE5-2660dogmatix1MKL120,182 
nVidia KeplerK20Mdogmatix-cuBLAS1,930 
nVidia KeplerK80m3-cuBLAS2,167 
nVidia PascalP100m3-cuBLAS445 
Intel XeonGold 6132weiner1openblas42,196AXV
Intel XeonGold 6132weiner4openblas11,365AVX
Intel XeonGold 6132weiner1MKL20,076AVX-512
Intel XeonGold 6132weiner4MKL5,576AVX-512
nVidia VoltaV100weiner-cuBLAS308 
Intel XeonE5-2680v3getafix1reference BLAS1,046,023*SLOW*
Intel XeonE5-2680v3getafix1MKL48,456AVX
Intel XeonE5-2680v4getafix1reference BLAS853,811*SLOW*
Intel XeonE5-2680v4getafix1MKL42,385AVX
Intel XeonE5-2680v4getafix4MKL12,585AVX
Intel XeonE5-2680v4getafix10MKL5,766AVX
Intel XeonE5-2680v4getafix28MKL3,285hyperthreading
Intel XeonE5-2680v4getafix14MKL4,242 
Edit - History - Print - Recent Changes - Search
Page last modified on August 27, 2018, at 04:00 PM