Dear all,
I run the exciting_smp by following the turtorial.
I have set OMP_NUM_THREADS=32, however, exciting_smp only uses one or two (at mosts) CPUs all the time.
My system: gfortran 9.2.0, exciting-oxygen. CentOS 7.0
Dear Alex,
Thanks for your reply.
I am sorry for no enough information for you.
I try the examples in the turtorial, exciting_smp runs fine
but I try the following input which is written by following the example in the turtorial Electronic Band Structure from GW - exciting,
exciting_smp runs with only one or two (at mosts) CPUs all the time.
Could you clarify please:
You’re trying to follow the GW tutorial for silicon but apply it to boron nitride. And to begin with you’d like to perform a ground state calculation using this input?
What linear algebra library did you link to when you built? packaged lapack/blas, system lapack/blas, MKL, openBLAS?
Dear Alex,
Thanks for your reply.
Yes, I want to calculate the BN’s round state and write the input file based on the tutorial example.
For the example in this tutorial (i.e. Silicon), the exciting _smp works well under multiple threads.
So, I think my compilation is normal.
However, when I run the job for the ground state of BN, the exciting _smp always runs with only one or two (at most) CPUs.
Can you try to run my input file?
Note: oxygen-o’s make_script will not automatically compile lapack/blas contained in the src directory. So I link to an external lapack/blas compiled without the -fopenmp flag. (I don’t have permission to recompile lapack for the time being)
W.r.t lapack/blas linking, if it’s the vendor version with CentOS 7.0, and as you’ve suggested it’s serial-built, diagonalisation won’t run with threads. Not alot of ground state is openMP threaded. Rather, to get optimal performance one assumes a) you build the MPISMP version to exploit k-point distribution across MPI processes and b) you use openBLAS or MKL to gain the benefit of shared-memory parallelism for the linear algebra operations.
PS Never build with the supplied version of lapack/blas (it will be the slowest). This is left over from long ago and I need to remove it completely before the next release.
My thoughts:
Use the package manager of CentOS to install openBLAS or MKL. Rebuild exciting, link to either and see if the threads change/the code speeds up. The Intel suite is actually very easy to install now, and does not require a paid licence.
GW will be similarly slow unless you do this, because of the current parallelisation scheme.
I can run this file for you and see what happens on my mac using GCC 9 and openBLAS.
Why did you select the same muffin tin radius for B and N? This is probably suboptimal (I will need to check with Andris to get the optimal ratio)
Dear Alex,
Great thanks for your time.
I compiled the MPI version, and it runs well.
Becasue the default rmt in species file is 1.45 for both B and N. Their sum is 2.90 which is larger than the distance (2.72) between B and N atoms. So I set them to be 1.34 (I also refer to the ELK program).
I also tried compiling exciting oxygen yesterday, and found that the tutorial files run perfectly, while any other file runs very very sluggishly, and on one or two processors at max (as you had previously discussed). Can you kindly let me know how to check if the compilation has linked the system BLAS/LAPACK libraries? Because I have them installed from the system repository (Ubuntu 20.04).
Try ldd binary_name… I think the makefile is linking to shared libs. But regardless, it will be linking to the system libs by default. If you post your make.inc, it will be immediately clear.
You can also check this at the linking step of building the code.
Ok, so exciting’s definitely linking to what’s installed on your system.
exciting is not set up for ground state calculations with scalapack (k-point parallelism only). The aim is to improve this for our next release. Scalapack is currently only used by the BSE code
To link to scalapack, you’ll need to modify this line in the make.inc MPI_LIBS = -L./ -lscalapack-openmpi accordingly
To increase the ground state calculation speed, you probably want to install and link to openBLAS or MKL, to get the OMP acceleration, rather than your system’s native blas (I’m assuming the native version is serial). There is a mac installation make.inc that shows how one might link to openBLAS (I use it for my mac).