In this section the results for different performance tests are presented.
Following MPI Versions and libraries were used:
qlogic:
mvapich2:
impi:
sca:
mkl:
elpa:
Elpa was compiled using sca from above + mkl libraries; when using only mkl libraries BLACS errors occured.
In a small test programm a Matrix of size N x N with N = 512, 1024, 2048, 4096 was randomly setup and diagonalized using PZHEEVX from SCALAPACK and solve_evp_complex_2stage from ELPA. The timings are only given for the diagonalization part.
For each number of cores all possible processor row/column combinations of row/cols = 1,2,4,8,16,32,64 were calculated. In the plotted data only the lowest times are presented.
Absolute timings of the different subroutines:
Scaling of the runtimes relative to the calculation with 16 cores:
For qlogic MPI we also tested the influence of different blocksizes on VSC-1 and VSC-2. The runs were performed as above, but the calculations were done for blocksizes = 2,4,8,16,32,64. The data in the plots and the tables represents the lowest obtained timings for a certain matrix size and number of used cores.
#Data obtained from VSC-1 with qlogic MPI: cores time blocksize SCA ELPA SCA ELPA ------------------------------------ Matrix Size 512: 16 0.081 0.072 16 8 32 0.087 0.059 32 4 64 0.085 0.049 32 2 128 0.093 0.043 4 4 256 0.114 0.040 32 8 ------------------------------------ Matrix Size 1024: 16 0.320 0.402 16 2 32 0.274 0.263 32 2 64 0.245 0.187 32 4 128 0.249 0.153 32 8 256 0.273 0.120 32 2 ------------------------------------ Matrix Size 2048: 16 1.699 2.565 16 2 32 1.148 1.498 32 4 64 0.856 0.907 32 4 128 0.749 0.613 32 4 256 0.666 0.442 32 8 ------------------------------------ Matrix Size 4096: 16 11.921 17.662 32 8 32 6.559 9.710 32 16 64 4.101 5.549 16 2 128 2.837 3.264 32 16 256 2.136 2.066 16 4
#Data obtained from VSC-2 with qlogic MPI: cores time blocksize SCA ELPA SCA ELPA ------------------------------------ Matrix Size 512: 16 0.101 0.097 16 4 32 0.096 0.077 16 2 64 0.090 0.066 8 4 128 0.109 0.058 16 4 256 0.126 0.054 4 4 ------------------------------------ Matrix Size 1024: 16 0.423 0.525 16 4 32 0.312 0.341 16 4 64 0.249 0.254 16 4 128 0.266 0.189 8 4 256 0.251 0.148 8 8 ------------------------------------ Matrix Size 2048: 16 2.448 3.264 32 4 32 1.460 1.974 16 4 64 0.987 1.173 16 16 128 0.848 0.777 16 8 256 0.671 0.545 4 4 ------------------------------------ Matrix Size 4096: 16 19.075 22.678 32 2 32 10.114 12.827 32 8 64 5.705 7.059 32 8 128 3.463 4.288 16 16 256 2.461 2.624 16 2