LAMMPS / Python / MatGL / CUDA - issues with NumPy

I am using AdvanceSoftCorp version of LAMMPS https://github.com/advancesoftcorp/lammps using the most recent branch: based-on-lammps_2Aug2023. Using this version for support of MatGL, which has support for CUDA. I have a custom Python3 installation with MatGL and all of it’s dependencies needed for CUDA support. MatGL tests work. Adding ASE https://pypi.org/project/ase/ into the Python3 build doesn’t appear to break anything when running Python shell from this location.

LAMMPS throws errors when loading Python. Errors appear to be with loading of NumPy.

Traceback (most recent call last):
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/core/__init__.py", line 24, in <module>
    from . import multiarray
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/core/multiarray.py", line 10, in <module>
    from . import overrides
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/core/overrides.py", line 8, in <module>
    from numpy.core._multiarray_umath import (
ImportError: /data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so: undefined symbol: PyObject_SelfIter

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/__init__.py", line 130, in <module>
    from numpy.__config__ import show as show_config
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/__config__.py", line 4, in <module>
    from numpy.core._multiarray_umath import (
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/core/__init__.py", line 50, in <module>
    raise ImportError(msg)
ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.10 from "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/bin/python3"
  * The NumPy version is: "1.26.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so: undefined symbol: PyObject_SelfIter


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/tools/build/ubuntu_22.04/lammps/asc_2Aug2023_matgl0.9.1_cuda118/01_test_lammps/./matgl_driver.py", line 33, in <module>
    from ase import Atoms
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/ase/__init__.py", line 17, in <module>
    from ase.atom import Atom
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/ase/atom.py", line 3, in <module>
    import numpy as np
  File "/data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages/numpy/__init__.py", line 135, in <module>
    raise ImportError(msg) from e
ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch

When loading this install of Python3, I can load NumPy without issue and run self tests. When calling from LAMMPS, the Python environment has issues finding it’s own modules. I suspect PYTHONPATH and PYTHONHOME envvars may be able to solve this- but have not been able to resolve yet.

Additional information:

I modified the matgl_driver.py to dump several envvars for debugging when LAMMPS calls Python.

"""
Copyright (c) 2023, AdvanceSoft Corp.

This source code is licensed under the GNU General Public License Version 2
found in the LICENSE file in the root directory of this source tree.
"""

# print several envvars / help with debugging
import os
import sys

print( "PYTHONHOME:" )
print( "    %s" % ( os.environ.get( 'PYTHONHOME' ) ) )

print( "PYTHONPATH:" )
if os.environ.get( 'PYTHONPATH' ):
    for item in os.environ.get( 'PYTHONPATH' ).split( ':'):
        print( "    %s" % ( item ) )
else:
    print( "    %s" % ( os.environ.get( 'PYTHONPATH' ) ) )

print( "PATH:" )
for item in os.environ.get( 'PATH' ).split( ':' ):
    print( "    %s" % ( item ) )
print( "sys.path:")
for item in sys.path:
    print( "    %s" % ( item ) )



    
from ase import Atoms
from ase.calculators.mixing import SumCalculator

import matgl
from matgl.ext.ase import M3GNetCalculator


def m3gnet_initialize(model_name = None, dftd3 = False):
    """
    Initialize GNNP of M3GNet.
    Args:
        model_name (str): name of model for GNNP.
        dftd3 (bool): to add correction of DFT-D3.
    Returns:
        cutoff: cutoff radius.
    """

    # Create M3GNetCalculator, that is pre-trained
    global myCalculator

    if model_name is not None:
        myPotential = matgl.load_model(model_name)
    else:
        myPotential = matgl.load_model("M3GNet-MP-2021.2.8-PES")

    myCalculator = M3GNetCalculator(
        potential      = myPotential,
        compute_stress = True,
        stress_weight  = 1.0
    )

    # Add DFT-D3 to calculator without three-body term
    global m3gnetCalculator
    global dftd3Calculator

    m3gnetCalculator = myCalculator
    dftd3Calculator  = None

    if dftd3:
        from dftd3.ase import DFTD3
        #from torch_dftd.torch_dftd3_calculator import TorchDFTD3Calculator

        dftd3Calculator = DFTD3(
            method  = "PBE",
            damping = "d3zero",
            s9      = 0.0
        )
        #dftd3Calculator = TorchDFTD3Calculator(
        #    xc      = "pbe",
        #    damping = "zero",
        #    abc     = False
        #)

        myCalculator = SumCalculator([m3gnetCalculator, dftd3Calculator])

    # Atoms object of ASE, that is empty here
    global myAtoms

    myAtoms = None

    return myPotential.model.cutoff

def m3gnet_get_energy_forces_stress(cell, atomic_numbers, positions):
    """
    Predict total energy, atomic forces and stress w/ pre-trained GNNP of M3GNet.
    Args:
        cell: lattice vectors in angstroms.
        atomic_numbers: atomic numbers for all atoms.
        positions: xyz coordinates for all atoms in angstroms.
    Returns:
        energy:  total energy.
        forcces: atomic forces.
        stress:  stress tensor (Voigt order).
    """

    # Initialize Atoms
    global myAtoms
    global myCalculator

    if myAtoms is not None and len(myAtoms.numbers) != len(atomic_numbers):
        myAtoms = None

    if myAtoms is None:
        myAtoms = Atoms(
            numbers   = atomic_numbers,
            positions = positions,
            cell      = cell,
            pbc       = [True, True, True]
        )

        myAtoms.calc = myCalculator

    else:
        myAtoms.set_cell(cell)
        myAtoms.set_atomic_numbers(atomic_numbers)
        myAtoms.set_positions(positions)

    # Predicting energy, forces and stress
    energy = myAtoms.get_potential_energy().item()
    forces = myAtoms.get_forces().tolist()

    global m3gnetCalculator
    global dftd3Calculator

    if dftd3Calculator is None:
        stress = myAtoms.get_stress().tolist()
    else:
        # to avoid the bug of SumCalculator
        myAtoms.calc = m3gnetCalculator
        stress1 = myAtoms.get_stress()

        myAtoms.calc = dftd3Calculator
        stress2 = myAtoms.get_stress()

        stress = stress1 + stress2
        stress = stress.tolist()

        myAtoms.calc = myCalculator

    return energy, forces, stress

When running example- I get following errors:

pair_style m3gnet ./matgl_driver.py         # MatGL python driver script
pair_coeff * * M3Gnet-MP-2021.2.8-PES Au    # specify GNN parameter checkpoint file
PYTHONHOME:
    None
PYTHONPATH:
    None
PATH:
    /data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/bin
    /usr/local/sbin
    /usr/local/bin
    /usr/sbin
    /usr/bin
    /sbin
    /bin
    /usr/games
    /usr/local/games
    /snap/bin
sys.path:
    /data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python310.zip
    /data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10
    /data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/lib-dynload
    /data/tools/target/ubuntu_22.04/python/3.10.13_matgl_0.9.1_cuda_11.8/lib/python3.10/site-packages
    .
    ./matgl_driver.py
ERROR: Cannot initialize python for pair_coeff of M3GNet. (src/ML-M3GNET/pair_m3gnet.cpp:490)
Last command: pair_coeff * * M3Gnet-MP-2021.2.8-PES Au    # specify GNN parameter checkpoint file

I’ve found that the error message is pretty generic from LAMMPS when Python dumps a stack trace, but including this as it shows envvars [search paths] that were available from within Python as it was called by LAMMPS and the matgl_driver.py.

NumPy appears to work when Python is called, but when Python is called from LAMMPS- it has trouble finding NumPy.

Any insights or suggestions appreciated.

You are using a LAMMPS version that has been modified and in combination with features that have not been vetted by the LAMMPS developers, thus you need to contact the people hat have made those modifications and implemented those features. You would need to provide conclusive proof that this is an issue caused by plain LAMMPS itself and not by the added stuff to get our attention. In that case, you would best report this as a bug-report issue on GitHub at Issues · lammps/lammps · GitHub

We have a hard enough time to stay on top of all the issues and portability challenges for the >1 million lines of code that (the official) LAMMPS distribution contains to be interested to look into solving issues with features we don’t know about and have not been involved in. Sorry.

Understood