Why does exciting_smp only take up two CPUs at mosts

Dear all,
I run the exciting_smp by following the turtorial.
I have set OMP_NUM_THREADS=32, however, exciting_smp only uses one or two (at mosts) CPUs all the time.
My system: gfortran 9.2.0, exciting-oxygen. CentOS 7.0

Best regards.
Youzhao Lan

Hi Youzhao,

Can you give the input and species files please, so I can try and reproduce the run?
Or specify which tutorial it is?

For groundstate, threading will mostly be utilised in the lapack/openBLAS/MKL library.
Which did you link to?

Thanks,
Alex

Dear Alex,
Thanks for your reply.
I am sorry for no enough information for you.
I try the examples in the turtorial, exciting_smp runs fine
but I try the following input which is written by following the example in the turtorial
Electronic Band Structure from GW - exciting,
exciting_smp runs with only one or two (at mosts) CPUs all the time.

<input>
  <title>monolayer BN</title>
  <structure speciespath='$EXCITINGROOT/species'>
    <crystal>
      <basevect>    4.109335   -2.372526    0.000000</basevect>
      <basevect>    0.000000    4.745052    0.000000</basevect>
      <basevect>    0.000000    0.000000   28.369500</basevect>
    </crystal>
    <species speciesfile='B.xml' rmt='1.34'>
      <atom coord='    0.000000    0.000000    0.500000'></atom>
    </species>
    <species speciesfile='N.xml' rmt='1.34'>
      <atom coord='    0.333333   -0.333333    0.500000'></atom>
    </species>
  </structure>

  <groundstate
      do="fromscratch"
      rgkmax="7.0"
      ngridk="6 6 1"
      xctype="LDA_PW"
      >
   </groundstate>

</input>

Best regards.
Lan

Hi Lan,

Could you clarify please:
You’re trying to follow the GW tutorial for silicon but apply it to boron nitride. And to begin with you’d like to perform a ground state calculation using this input?

What linear algebra library did you link to when you built? packaged lapack/blas, system lapack/blas, MKL, openBLAS?

And to confirm you export omp_num_threads=32?

Dear Alex,
Thanks for your reply.
Yes, I want to calculate the BN’s round state and write the input file based on the tutorial example.
For the example in this tutorial (i.e. Silicon), the exciting _smp works well under multiple threads.
So, I think my compilation is normal.
However, when I run the job for the ground state of BN, the exciting _smp always runs with only one or two (at most) CPUs.
Can you try to run my input file?

Note: oxygen-o’s make_script will not automatically compile lapack/blas contained in the src directory. So I link to an external lapack/blas compiled without the -fopenmp flag. (I don’t have permission to recompile lapack for the time being)

Best regards.
Lan

Hi Lan,

I can try running your file but:

  • I don’t think there’s actually problem
  • I’d like to fully clarify the lapack point

W.r.t lapack/blas linking, if it’s the vendor version with CentOS 7.0, and as you’ve suggested it’s serial-built, diagonalisation won’t run with threads. Not alot of ground state is openMP threaded. Rather, to get optimal performance one assumes a) you build the MPISMP version to exploit k-point distribution across MPI processes and b) you use openBLAS or MKL to gain the benefit of shared-memory parallelism for the linear algebra operations.

PS Never build with the supplied version of lapack/blas (it will be the slowest). This is left over from long ago and I need to remove it completely before the next release.

My thoughts:

  • Use the package manager of CentOS to install openBLAS or MKL. Rebuild exciting, link to either and see if the threads change/the code speeds up. The Intel suite is actually very easy to install now, and does not require a paid licence.
  • GW will be similarly slow unless you do this, because of the current parallelisation scheme.
  • I can run this file for you and see what happens on my mac using GCC 9 and openBLAS.
  • Why did you select the same muffin tin radius for B and N? This is probably suboptimal (I will need to check with Andris to get the optimal ratio)

Cheers,
Alex

Dear Alex,
Great thanks for your time.
I compiled the MPI version, and it runs well.

Becasue the default rmt in species file is 1.45 for both B and N. Their sum is 2.90 which is larger than the distance (2.72) between B and N atoms. So I set them to be 1.34 (I also refer to the ELK program).

Best regards.
Lan

@lyzhao

Hello Lan, Greetings!

I also tried compiling exciting oxygen yesterday, and found that the tutorial files run perfectly, while any other file runs very very sluggishly, and on one or two processors at max (as you had previously discussed). Can you kindly let me know how to check if the compilation has linked the system BLAS/LAPACK libraries? Because I have them installed from the system repository (Ubuntu 20.04).

Thanks and Regards,
Hemanth

Hi Hemanth,

Try ldd binary_name… I think the makefile is linking to shared libs. But regardless, it will be linking to the system libs by default. If you post your make.inc, it will be immediately clear.

You can also check this at the linking step of building the code.

Cheers,
Alex

Hello Alex,

Thank you for your time and cooperation.

This is the output for the ldd binary command, I see that it has not linked to the scalapack library that i have installed in the system.

linux-vdso.so.1 (0x00007ffe89b9e000)
liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x000014c6b03f2000)
libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x000014c6b0197000)
libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x000014c6afdb8000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000014c6afa1a000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x000014c6af7eb000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000014c6af5d3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000014c6af1e2000)
libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x000014c6acf3c000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x000014c6accfc000)
/lib64/ld-linux-x86-64.so.2 (0x000014c6b17de000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000014c6acaf8000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000014c6ac8d9000)

I have attached the makefile that was used for building the executable.
make.inc (1.6 KB)

Thanks and Warm regards,
Hemanth

No problem. To address your points:

  • Ok, so exciting’s definitely linking to what’s installed on your system.
  • exciting is not set up for ground state calculations with scalapack (k-point parallelism only). The aim is to improve this for our next release. Scalapack is currently only used by the BSE code
  • To link to scalapack, you’ll need to modify this line in the make.inc MPI_LIBS = -L./ -lscalapack-openmpi accordingly
  • To increase the ground state calculation speed, you probably want to install and link to openBLAS or MKL, to get the OMP acceleration, rather than your system’s native blas (I’m assuming the native version is serial). There is a mac installation make.inc that shows how one might link to openBLAS (I use it for my mac).

Cheers,
Alex

1 Like