AMSET on multinode machine

16-vikrant · July 17, 2022, 7:24pm

Hi!
I am running amset a multinode machine and the core dumps even on a single node. The code works fine on a normal workstation with 56 cores. What may be the reason?

Thank You!

alex · August 23, 2022, 9:43am

It’s hard to know exactly. Are you setting nworkers and OMP_NUM_THREADS?

16-vikrant · September 16, 2022, 9:10am

I tried with 24 and 48 nworkers. The job script looks like this…

#!/bin/sh
#SBATCH -N 1
#SBATCH --tasks-per-node=48
#SBATCH --job-name=AMS
#SBATCH --error=error
#SBATCH --partition=highmemory
#SBATCH --time=1-00:00:00

export OMP_NUM_THREADS=1
amset run --no-separate-mobility -z prefer

The process stops at…

   1.00e+21       700.0     9.74e-03      2.00e+21
   1.00e+21       725.0     9.67e-03      2.00e+21
   1.00e+21       750.0     9.61e-03      2.00e+21
   1.00e+21       775.0     9.55e-03      2.00e+21
   1.00e+21       800.0     9.48e-03      2.00e+21

Initializing POP scattering
- average N_po: 27.4548
- w_po: 2.576 2pi THz
- hbar.omega: 0.0017 eV

Thank you!

alex · October 4, 2022, 8:25am

Can you try with a smaller number of processors. E.g., 2 and then increase the number slowly. You could be running out of memory.

You can also try setting this in your settings.yaml:

cache_wavefunction: false

If this doesn’t work then there is likely an issue in your numba installation. You can try uninstalling and re-installing this package using conda.