Issue about installing LAMMPS Python module and implement it

Kehan · February 13, 2023, 6:34pm

I am trying to install LAMMPS Python module and extend Python to LAMMPS as well, by following the instructions here. The LAMMPS version is 3 Nov 2022. I would like to do a full installation with CMake, since I hope to use it in various virtual Conda environment. So I followed the instruction in Full install(CMake-only) bulletin instead of the instruction in Virtual environment bulletin. The commands I used are

module load gcc-toolset/12
module load openmpi/gcc/4.1.2
module load anaconda3/2021
mkdir build
cd build
cmake -C …/cmake/presets/basic.cmake -DCMAKE_INSTALL_PREFIX=pwd -DCMAKE_CXX_COMPILER=mpicxx -DPKG_REPLICA=ON -DPKG_MISC=ON -DPKG_MOLECULE=ON -DPKG_KSPACE=ON -DPKG_EXTRA-PAIR=ON -DBUILD_SHARED_LIBS=ON -DLAMMPS_EXCEPTIONS=ON -DPKG_PYTHON=ON …/cmake
(Note that I set the installation prefix to be ‘pwd’, so it will be installed in current ‘build/’ directory)
cmake --build .
cmake --install .

Later I would like to use it in some virtual environments. I knew the LAMMPS executable, shared library and so on could not be found by default, so I added their paths to environment variables in ~/.bashrc file:

export PATH=my-lammps-directory/build/bin:$PATH
export LD_LIBRARY_PATH=my-lammps-directory/build:$LD_LIBRARY_PATH (I am not sure if the path is correct)
export PYTHONPATH=my-lammps-directory/python:$PYTHONPATH (I am not sure if the path is correct)

So far, I think the installation should be complete. There were no error messages popping up during the installation, but I found there were something strange to me:

First, the LAMMPS Python package is installed in my-lammps-dir/build/lib/pythonX.Y/site-packages/lammps as if the system is 32bit, but the shared library liblammps.so appears in my-lammps-dir/build/lib64/ as if the system is 64bit. I do not know if the conflict will cause any problem. By the way, the Linux system is 64bit.
Second, there also exists a shared library liblammps.so inside the folder my-lammps-dir/build/, which is my installation directory. And the liblammps.so here differs from the liblammps.so inside my-lammps-dir/build/lib64/ I mentioned above. I do not know which one is actually linked to the executable lmp.
Third, I am not sure if the PYTHONPATH is correct, since there is only one sentence in the instruction page saying “The PYTHONPATH needs to point to the parent folder that contains the lammps package!”, while there are many folders called lammps.

With these questions, I move to implement it, and some error message is popped up,

conda activate myenv (has been created before)
python

import lammps (works)
from lammps import lammps (works)
lmp = lammps() (fails)

I actually used Slurm to submit the Python test script, of which the error message is relatively long, and I paste the content of slurm-xxx.out file below since I cannot upload files because I am a new user. I guess the error occurs due to some unknown issue of the installation procedure, probably including linking some libraries, adding some paths and so forth.

I will appreciate it if someone can help me solve this problem.

WARNING: There was an error initializing an OpenFabrics device.

Local host: della-r3c4n15
Local device: mlx4_0

[della-r3c4n15:633844:0:633844] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e9)
==== backtrace (tid: 633844) ====
0 /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x15332bc6cedc]
1 /lib64/libucs.so.0(+0x2b0bc) [0x15332bc6d0bc]
2 /lib64/libucs.so.0(+0x2b28a) [0x15332bc6d28a]
3 /usr/local/openmpi/4.1.2/gcc/lib64/libmpi.so.40(PMPI_Comm_set_errhandler+0x43) [0x1533420be0c3]
4 {HOME}/.conda/envs/pimd/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x86b20) [0x153340cd4b20] 5 {HOME}/.conda/envs/pimd/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x2dee1) [0x153340c7bee1]
6 python(PyModule_ExecDef+0x48) [0x558dfc9ae118]
7 python(+0x2361b5) [0x558dfc9ae1b5]
8 python(+0x13b385) [0x558dfc8b3385]
9 python(PyVectorcall_Call+0x6f) [0x558dfc8d649f]
10 python(_PyEval_EvalFrameDefault+0x6152) [0x558dfc954af2]
11 python(_PyEval_EvalCodeWithName+0x260) [0x558dfc944600]
12 python(_PyFunction_Vectorcall+0x594) [0x558dfc945bc4]
13 python(_PyEval_EvalFrameDefault+0x4f83) [0x558dfc953923]
14 python(_PyFunction_Vectorcall+0x1b7) [0x558dfc9457e7]
15 python(_PyEval_EvalFrameDefault+0x4c0) [0x558dfc94ee60]
16 python(_PyFunction_Vectorcall+0x1b7) [0x558dfc9457e7]
17 python(_PyEval_EvalFrameDefault+0x71b) [0x558dfc94f0bb]
18 python(_PyFunction_Vectorcall+0x1b7) [0x558dfc9457e7]
19 python(_PyEval_EvalFrameDefault+0x71b) [0x558dfc94f0bb]
20 python(_PyFunction_Vectorcall+0x1b7) [0x558dfc9457e7]
21 python(+0x142714) [0x558dfc8ba714]
22 python(_PyObject_CallMethodIdObjArgs+0xf5) [0x558dfc8fe275]
23 python(PyImport_ImportModuleLevelObject+0x366) [0x558dfc89f8c6]
24 python(+0x1de0b8) [0x558dfc9560b8]
25 python(+0x13c00e) [0x558dfc8b400e]
26 python(_PyEval_EvalFrameDefault+0x5c58) [0x558dfc9545f8]
27 python(_PyEval_EvalCodeWithName+0x260) [0x558dfc944600]
28 python(_PyFunction_Vectorcall+0x594) [0x558dfc945bc4]
29 python(_PyEval_EvalFrameDefault+0x71b) [0x558dfc94f0bb]
30 python(_PyEval_EvalCodeWithName+0xd5f) [0x558dfc9450ff]
31 python(_PyFunction_Vectorcall+0x594) [0x558dfc945bc4]
32 python(+0x142714) [0x558dfc8ba714]
33 python(_PyObject_CallMethodIdObjArgs+0xf5) [0x558dfc8fe275]
34 python(PyImport_ImportModuleLevelObject+0x6cb) [0x558dfc89fc2b]
35 python(_PyEval_EvalFrameDefault+0x300d) [0x558dfc9519ad]
36 python(_PyEval_EvalCodeWithName+0x260) [0x558dfc944600]
37 python(_PyFunction_Vectorcall+0x534) [0x558dfc945b64]
38 python(+0x1b9698) [0x558dfc931698]
39 python(_PyObject_MakeTpCall+0x228) [0x558dfc8a8fa8]
40 python(_PyEval_EvalFrameDefault+0x4eff) [0x558dfc95389f]
41 python(_PyEval_EvalCodeWithName+0x260) [0x558dfc944600]
42 python(PyEval_EvalCode+0x23) [0x558dfc945eb3]
43 python(+0x242622) [0x558dfc9ba622]
44 python(+0x2531d2) [0x558dfc9cb1d2]
45 python(+0x25636b) [0x558dfc9ce36b]
46 python(PyRun_SimpleFileExFlags+0x1bf) [0x558dfc9ce54f]
47 python(Py_RunMain+0x3a9) [0x558dfc9cea29]
48 python(Py_BytesMain+0x39) [0x558dfc9cec29]
49 /lib64/libc.so.6(__libc_start_main+0xe5) [0x1533440dfd85]
50 python(+0x1f9ad7) [0x558dfc971ad7]

[della-r3c4n15:633844] *** Process received signal ***
[della-r3c4n15:633844] Signal: Segmentation fault (11)
[della-r3c4n15:633844] Signal code: (-6)
[della-r3c4n15:633844] Failing at address: 0x254b70009abf4
[della-r3c4n15:633844] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x15334447dcf0]
[della-r3c4n15:633844] [ 1] /usr/local/openmpi/4.1.2/gcc/lib64/libmpi.so.40(PMPI_Comm_set_errhandler+0x43)[0x1533420be0c3]
[della-r3c4n15:633844] [ 2] {HOME}.conda/envs/pimd/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x86b20)[0x153340cd4b20] [della-r3c4n15:633844] [ 3] {HOME}/.conda/envs/pimd/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x2dee1)[0x153340c7bee1]
[della-r3c4n15:633844] [ 4] python(PyModule_ExecDef+0x48)[0x558dfc9ae118]
[della-r3c4n15:633844] [ 5] python(+0x2361b5)[0x558dfc9ae1b5]
[della-r3c4n15:633844] [ 6] python(+0x13b385)[0x558dfc8b3385]
[della-r3c4n15:633844] [ 7] python(PyVectorcall_Call+0x6f)[0x558dfc8d649f]
[della-r3c4n15:633844] [ 8] python(_PyEval_EvalFrameDefault+0x6152)[0x558dfc954af2]
[della-r3c4n15:633844] [ 9] python(_PyEval_EvalCodeWithName+0x260)[0x558dfc944600]
[della-r3c4n15:633844] [10] python(_PyFunction_Vectorcall+0x594)[0x558dfc945bc4]
[della-r3c4n15:633844] [11] python(_PyEval_EvalFrameDefault+0x4f83)[0x558dfc953923]
[della-r3c4n15:633844] [12] python(_PyFunction_Vectorcall+0x1b7)[0x558dfc9457e7]
[della-r3c4n15:633844] [13] python(_PyEval_EvalFrameDefault+0x4c0)[0x558dfc94ee60]
[della-r3c4n15:633844] [14] python(_PyFunction_Vectorcall+0x1b7)[0x558dfc9457e7]
[della-r3c4n15:633844] [15] python(_PyEval_EvalFrameDefault+0x71b)[0x558dfc94f0bb]
[della-r3c4n15:633844] [16] python(_PyFunction_Vectorcall+0x1b7)[0x558dfc9457e7]
[della-r3c4n15:633844] [17] python(_PyEval_EvalFrameDefault+0x71b)[0x558dfc94f0bb]
[della-r3c4n15:633844] [18] python(_PyFunction_Vectorcall+0x1b7)[0x558dfc9457e7]
[della-r3c4n15:633844] [19] python(+0x142714)[0x558dfc8ba714]
[della-r3c4n15:633844] [20] python(_PyObject_CallMethodIdObjArgs+0xf5)[0x558dfc8fe275]
[della-r3c4n15:633844] [21] python(PyImport_ImportModuleLevelObject+0x366)[0x558dfc89f8c6]
[della-r3c4n15:633844] [22] python(+0x1de0b8)[0x558dfc9560b8]
[della-r3c4n15:633844] [23] python(+0x13c00e)[0x558dfc8b400e]
[della-r3c4n15:633844] [24] python(_PyEval_EvalFrameDefault+0x5c58)[0x558dfc9545f8]
[della-r3c4n15:633844] [25] python(_PyEval_EvalCodeWithName+0x260)[0x558dfc944600]
[della-r3c4n15:633844] [26] python(_PyFunction_Vectorcall+0x594)[0x558dfc945bc4]
[della-r3c4n15:633844] [27] python(_PyEval_EvalFrameDefault+0x71b)[0x558dfc94f0bb]
[della-r3c4n15:633844] [28] python(_PyEval_EvalCodeWithName+0xd5f)[0x558dfc9450ff]
[della-r3c4n15:633844] [29] python(_PyFunction_Vectorcall+0x594)[0x558dfc945bc4]
[della-r3c4n15:633844] *** End of error message ***
/var/spool/slurmd/job45544893/slurm_script: line 24: 633844 Segmentation fault (core dumped) python mpi4py_lmp.py > output

akohlmey · February 13, 2023, 8:52pm

Please note that the latest feature release of LAMMPS is version 8 Feb 2023.

This looks very wrong.

From your description it seems that the “full install” is not the best choice. That would be the case, if your target folder would be some folder like /usr/local or ${HOME}/.local or similar. The corresponding “bin” and “lib” or “lib64” folders would need to be referenced in ${PATH} and ${LD_LIBRARY_PATH} (or ${DLYD_LIBRARY_PATH} on macOS), respectively.

Instead what would serve you better would be to not do cmake --install, but rather cmake --build . --target install-python. That does two things:

build a so-called “wheel” file. That should be named for your version lammps-2022.11.3-cp310-cp310-linux_x86_64.whl (with python 3.10)
install this “wheel” into your global python environment with python -m pip install lammps-2022.11.3-cp310-cp310-linux_x86_64.whl

You don’t really need the second part and can easily undo it with python -m pip uninstall lammps to avoid confusion over which LAMMPS Python module environment is used.
You want to keep the wheel file, as that has your LAMMPS shared library and python module bundled ready for installation with pip.

Then, whenever you set up a virtual environment, you can add the LAMMPS Python module to it with python -m pip install lammps-2022.11.3-cp310-cp310-linux_x86_64.whl

This is what Python is looking for on most machines (macOS and Windows are different). The lib vs lib64 is a Linux x86 specialty and not relevant for python. It assumes that you have only one viable platform locally.

This is a side effect of the “strange” choice of making your build folder the install folder. The two shared library files are essentially the same. Only the one in “build” is the original and the one in build/lib64 the “installed” version that has been modify to remove the hard coded “rpath” entry for the build folder.

lammps package in this context refers to the LAMMPS python module, and thus would be the site-packages folder.

Your questions indicate, that you are conflating the way of using LAMMPS with a full installation and the use of LAMMPS and its python module directly from the build folder. In the latter case, no cmake --install . or make install is needed or use, but instead ${PATH} and ${LD_LIBRARY_PATH} augmented to directly point to the build folder and PYTHONPATH needs to point to the python folder of the LAMMPS source code. This is why using the wheel package and installing it into the individual virtual environment is preferred, since that one does not require any updates of environment variables for using the LAMMPS python module.

Your guess is wrong. However strange your LAMMPS configuration and installation procedure, it could not lead to this kind of segmentation fault.

If you look at the paths you can see that this originates from issues with your “mpi4py” installation, which has problems initializing your OpenMPI environment. There is no mention of LAMMPS anywhere in the stack trace, so that seems to be happening before LAMMPS itself is executed. LAMMPS is loading mpi4py, if available so you can run on a sub-communicator, if desired.

I suggest you run some simple tests to verify that mpi4py on its own is working correctly.

Kehan · February 13, 2023, 10:08pm

Hi Akohlmey,

Thanks for your so comprehensive response! It clarifies many things and makes me understand LAMMPS and its related module much more deeply.

As you said, what I did was trying to conflate using LAMMPS with a full installation and using its python module, because I would like to include both the functionalities of executing lmp from shell and executing it with python. Perhaps there exists a more elegant way of compiling and installing it, but I did not find that, lol.

In your suggestion, it says I could make a wheel file so that I could install it into any virtual environment flexibly later. It sounds very fascinating, but I am concerning that if the part of “install this ‘wheel’ into global python environment” would require administrator’s right of the cluster, where I guess the “global python environment” implies the (base) environment of Conda, where we students can not modify its composition.

Your answers to my three strange questions are very clear. Now I know how to set those environment variables if needed. Thank you very much!

Last but not the least, it is out of my expectation that the main question originates from the mpi4py module. I will move to solve it right now. By the way, after some brief tests, I found that, even if with my strange installation of LAMMPS and its python module, things went well when I ran LAMMPS by python script in (base) environment which is well established, but it failed when I tried in myenv environment, where mpi4py module may not be installed in a proper way. So I think this indicates that what you said is right.

In a nutshell, if my understanding is correct, regarding my target of possessing both the functionalities of installing a full executable lmp and turning on the python module, would it be better that

I first compile and install LAMMPS in shared mode as usual to complete a full installation;
then use command cmake --build . --target install-python to create a wheel file;
add $(PATH} and ${LD_LIBRARY_PATH} to the build folder, and let ${PYTHONPATH} point to the python folder of the LAMMPS source code;
pip install {wheel file} into the virtual environment I plan to use.

Please correct me if there are any mistakes in above recipe.

Thank you again for you kindness and patience!

Best,

akohlmey · February 13, 2023, 10:31pm

The default python installation knows about two “system” locations for the “site-packages” folder. One is indeed the system location that you don’t have access to on multi-user systems, but the other is in ${HOME}/.local/lib/pythonx.y/site-packages and the latter is quite usable. If your pip module is sufficiently recent, then it will detect that and install the wheel into your (global) user folder automatically, when the system location is inaccessible. You can enforce the user folder by adding the flag --user to the pip command. The install-python target of the LAMMPS build system goes one step further and will auto-detect installation failure on the first attempt (into the system folder) and fall back to explicit installation into the user folder.

Whenever you create and activate a virtual environment, it is by construction writable and the pip command using the wheel will automatically install the LAMMPS module and shared library there.

I recommend against a full installation with a shared library this way. It can lead to confusion about which liblammps.so shared library is loaded when. It would be OK, if the global installation is all you would be using, but once you add virtual environments to the mix, things can get very confusing and messy. If you build the LAMMPS executable with for a global installation with using a shared library, you also must set LD_LIBRARY_PATH to the location of the installation and you must (manually) build a wheel of the LAMMPS python module that does not contain the shared library.

What I would recommend is the following:

configure two build folders: build-static and build-shared with identical configurations but only the second uses -D BUILD_SHARED_LIBS=on.
only compile (don’t do an installation) in both folders. Now the lmp executable in the build-static folder can be copied anywhere you need it (and where it is in the PATH) so tat you can run calculations. Because of the static library, you don’t need to set LD_LIBRARY_PATH
create the wheel file in the build-shared folder (and either leave the global installation in ${HOME}/.local or uninstall it as you like) and then install it with pip into the virtual environments where you want to use it.
This way you don’t need to change any environment paths and there is no accidental collision with older virtual environments that may have been using an older LAMMPS module and library, because LD_LIBRARY_PATH may point to a different version of the LAMMPS library.

If you absolutely want to do a full installation, I suggest to configure with -D CMAKE_INSTALL_PREFIX=${HOME}/.local, as the ${HOME}/.local tree has been the de facto standard convention for installing software packages into a user’s tree (as opposed to the convention and default for /usr/local as a system location). LAMMPS should use this setting by default if you do not set CMAKE_INSTALL_PREFIX on your CMake command line.

Kehan · February 13, 2023, 11:24pm

I got it. Since the executable has been created after compilation, I actually do not require a full installation which includes cmake --install .. Instead, I will add the folder /build-static itself to $PATH so that I can run build-static/lmp anywhere.

Regarding the build-shared folder, if my understanding is correct, I do not need to set cmake -DPKG_PYTHON=ON in the configuration step either. This is supposed to be achieved by cmake --build . --target install-python. In addition, since I will not execute build-shared/lmp explicitly later, it does not matter if this lmp is in $PATH. In contrary, all I need is to ensure the wheel file is successfully installed in virtual environment that will be used later. Please correct me if there is anything misunderstood.

Best,

akohlmey · February 13, 2023, 11:37pm

I consider that a bad idea. You may need to do additional updates and compilations or make changes to the configuration. Including the build folder in your PATH would automatically change the executable with many unexpected consequences. If you want use an executable globally, it should be a copy. Otherwise (this is what I usually do, because I do not do production calculations, but tests and debugging), I would not add the folder to PATH at all and run in the build folder using ./lmp.

The LAMMPS Python module and the PYTHON package are two very different things. The Python module is can always be installed for as long as LAMMPS is compiled as a shared library. It allows to call LAMMPS from Python. The PYTHON package allows to call out to Python from LAMMPS. That can be used to define python style variables, implement pair styles in python, or have fixes that regularly execute python code. Of course you can also do both at the same time, if you configure for building a shared library and include the PYTHON package, but neither depends on the other.

Kehan · February 13, 2023, 11:50pm

Thanks for informing me that LAMMPS python module and PYTHON package are different, I misunderstood them previously. Regarding the $PATH, I will consider it more carefully.

Thanks for your help, it is very nice of you!

Best,

Kehan · February 14, 2023, 9:58pm

Hi Akohlmey,

Thanks for you yesterday’s suggestions. They are very helpful.

After following the in-a-nutshell instructions you gave me, I have handled most of the troubles, but there is still one left which I suppose to be relevant to the compatibility of LAMMPS and mpi4py. Briefly speaking, when my virtual conda environment does not include the module mpi4py, everything can be compiled and installed smoothly, and running LAMMPS from Python works well; however, when my virtual environment includes the module mpi4py, things still go well regarding compilation and installation, but running LAMMPS from Python fails. And I failed to figure out the explicit reason. I will write down what I did below:

module load gcc-toolset/12
module load openmpi/gcc/4.1.2
module load anaconda3/2022.10
(base) mkdir build-shared
(base) cd build-shared
(base) cmake -C …/cmake/presets/basic.cmake -DCMAKE_CXX_COMPILER=mpicxx -DPKG_REPLICA=ON -DPKG_MISC=ON -DPKG_MOLECULE=ON -DPKG_KSPACE=ON -DPKG_EXTRA-PAIR=ON -DBUILD_SHARED_LIBS=ON -DLAMMPS_EXCEPTIONS=ON -DPKG_PYTHON=ON …/cmake
(base) cmake --build .
(base) cmake --build . --target install-python

So far I have done what you suggested me to do, and successfully produced the wheel file lammps-2022.11.3-cp39-cp39-linux_x86_64.whl (for Python3.9) as planned. Then I activate the virtual environment

(base) conda activate myenv
(myenv) pip install lammps-2022.11.3-cp39-cp39-linux_x86_64.whl

In order to test if the lammps module has been added to myenv, I use the python syntax

from lammps import lammps

No matter whether mpi4py is installed in myenv or not, things go very well so far. But the difference arises when I try to implement it, namely, creating a lmp-instance in python script:

lmp = lammps.lammps()

when there is no mpi4py installed in myenv, things go well; but if there exists mpi4py in myenv, it will generate errors, which I will attach below. As you said, it seems not to be an issue regarding installing lammps into python, but because of some conflicts with mpi4py itself while implementing. What bothers me is, I install the module mpi4py into virtual environment via

(myenv) conda install -c conda-forge mpi4py

which should be a standard way to install mpi4py module. And I did not find anything wrong when merely applying this module without lammps. Thus I think the installation of mpi4py is correct. I guess the error arises from some unknown conflicts between modules mpi4py and lammps. But I could not figure it out due to my limitation knowledge about how lammps works with mpi4py, and it is strange to me that, even though I did not import mpi4py while just import lammps, creating the lmp-instance via lmp = lammps.lammps() still failed, where I thought mpi4py should not matter or trigger any error at all.

In your yesterday’s advice, you briefly mentioned “LAMMPS is loading mpi4py” somehow. I do not know if this is still the case when I do not import mpi4py in python prompt or script. Could you please explain some details about the connection between mpi4py and lammps modules a little bit? Thank you!

FYI, my LAMMPS version is 3 Nov 2022, the python version for both (base) and (myenv) environments are Python3.9.13, and the mpi4py=3.1.4.

FYI, the error message for the python code

import lammps
print(“Start create lmp”)
lmp = lammps.lammps()
print("LAMMPS Version: ", lmp.version())
lmp.close()

is

WARNING: There was an error initializing an OpenFabrics device.

Local host: della-r4c4n8
Local device: mlx4_0

[della-r4c4n8:1481728:0:1481728] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e9)
==== backtrace (tid:1481728) ====
0 /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x14c0c2978edc]
1 /lib64/libucs.so.0(+0x2b0bc) [0x14c0c29790bc]
2 /lib64/libucs.so.0(+0x2b28a) [0x14c0c297928a]
3 /usr/local/openmpi/4.1.2/gcc/lib64/libmpi.so.40(PMPI_Comm_set_errhandler+0x43) [0x14c0d8b170c3]
4 /home/kehanc/.conda/envs/pylmp/lib/python3.9/site-packages/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so(+0x750e0) [0x14c0d76780e0]
5 /home/kehanc/.conda/envs/pylmp/lib/python3.9/site-packages/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so(+0x47bc2) [0x14c0d764abc2]
6 /home/kehanc/.conda/envs/pylmp/bin/python(PyModule_ExecDef+0x70) [0x5977f0]
7 /home/kehanc/.conda/envs/pylmp/bin/python() [0x597760]
8 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4f7dab]
9 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x5ae9) [0x4edc89]
10 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4e729a]
11 /home/kehanc/.conda/envs/pylmp/bin/python(_PyFunction_Vectorcall+0xd5) [0x4f8645]
12 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x4d1f) [0x4ecebf]
13 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4f8923]
14 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x68b) [0x4e882b]
15 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4f8923]
16 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x3d1) [0x4e8571]
17 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4f8923]
18 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x3d1) [0x4e8571]
19 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4f8923]
20 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4f7f3a]
21 /home/kehanc/.conda/envs/pylmp/bin/python(_PyObject_CallMethodIdObjArgs+0x131) [0x507ef1]
22 /home/kehanc/.conda/envs/pylmp/bin/python(PyImport_ImportModuleLevelObject+0x4d1) [0x507471]
23 /home/kehanc/.conda/envs/pylmp/bin/python() [0x513324]
24 /home/kehanc/.conda/envs/pylmp/bin/python() [0x507f97]
25 /home/kehanc/.conda/envs/pylmp/bin/python(PyObject_Call+0x158) [0x506498]
26 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x5ae9) [0x4edc89]
27 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4e729a]
28 /home/kehanc/.conda/envs/pylmp/bin/python(_PyFunction_Vectorcall+0xd5) [0x4f8645]
29 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x3d1) [0x4e8571]
30 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4e729a]
31 /home/kehanc/.conda/envs/pylmp/bin/python(_PyFunction_Vectorcall+0xd5) [0x4f8645]
32 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4f7f3a]
33 /home/kehanc/.conda/envs/pylmp/bin/python(_PyObject_CallMethodIdObjArgs+0x131) [0x507ef1]
34 /home/kehanc/.conda/envs/pylmp/bin/python(PyImport_ImportModuleLevelObject+0x94f) [0x5078ef]
35 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x4162) [0x4ec302]
36 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4e729a]
37 /home/kehanc/.conda/envs/pylmp/bin/python(_PyObject_FastCallDictTstate+0x13e) [0x4f078e]
38 /home/kehanc/.conda/envs/pylmp/bin/python() [0x50304f]
39 /home/kehanc/.conda/envs/pylmp/bin/python(_PyObject_MakeTpCall+0x303) [0x4f0f33]
40 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x5285) [0x4ed425]
41 /home/kehanc/.conda/envs/pylmp/bin/python() [0x4e729a]
42 /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalCodeWithName+0x47) [0x4e6f27]
43 /home/kehanc/.conda/envs/pylmp/bin/python(PyEval_EvalCodeEx+0x39) [0x4e6ed9]
44 /home/kehanc/.conda/envs/pylmp/bin/python(PyEval_EvalCode+0x1b) [0x594f7b]
45 /home/kehanc/.conda/envs/pylmp/bin/python() [0x5c2457]
46 /home/kehanc/.conda/envs/pylmp/bin/python() [0x5be3b0]
47 /home/kehanc/.conda/envs/pylmp/bin/python() [0x456265]
48 /home/kehanc/.conda/envs/pylmp/bin/python(PyRun_SimpleFileExFlags+0x1a2) [0x5b81a2]
49 /home/kehanc/.conda/envs/pylmp/bin/python(Py_RunMain+0x37e) [0x5b570e]
50 /home/kehanc/.conda/envs/pylmp/bin/python(Py_BytesMain+0x39) [0x5890f9]
51 /lib64/libc.so.6(__libc_start_main+0xe5) [0x14c0da5d7d85]
52 /home/kehanc/.conda/envs/pylmp/bin/python() [0x588fae]

[della-r4c4n8:1481728] *** Process received signal ***
[della-r4c4n8:1481728] Signal: Segmentation fault (11)
[della-r4c4n8:1481728] Signal code: (-6)
[della-r4c4n8:1481728] Failing at address: 0x254b700169c00
[della-r4c4n8:1481728] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x14c0db0ffcf0]
[della-r4c4n8:1481728] [ 1] /usr/local/openmpi/4.1.2/gcc/lib64/libmpi.so.40(PMPI_Comm_set_errhandler+0x43)[0x14c0d8b170c3]
[della-r4c4n8:1481728] [ 2] /home/kehanc/.conda/envs/pylmp/lib/python3.9/site-packages/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so(+0x750e0)[0x14c0d76780e0]
[della-r4c4n8:1481728] [ 3] /home/kehanc/.conda/envs/pylmp/lib/python3.9/site-packages/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so(+0x47bc2)[0x14c0d764abc2]
[della-r4c4n8:1481728] [ 4] /home/kehanc/.conda/envs/pylmp/bin/python(PyModule_ExecDef+0x70)[0x5977f0]
[della-r4c4n8:1481728] [ 5] /home/kehanc/.conda/envs/pylmp/bin/python[0x597760]
[della-r4c4n8:1481728] [ 6] /home/kehanc/.conda/envs/pylmp/bin/python[0x4f7dab]
[della-r4c4n8:1481728] [ 7] /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x5ae9)[0x4edc89]
[della-r4c4n8:1481728] [ 8] /home/kehanc/.conda/envs/pylmp/bin/python[0x4e729a]
[della-r4c4n8:1481728] [ 9] /home/kehanc/.conda/envs/pylmp/bin/python(_PyFunction_Vectorcall+0xd5)[0x4f8645]
[della-r4c4n8:1481728] [10] /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x4d1f)[0x4ecebf]
[della-r4c4n8:1481728] [11] /home/kehanc/.conda/envs/pylmp/bin/python[0x4f8923]
[della-r4c4n8:1481728] [12] /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x68b)[0x4e882b]
[della-r4c4n8:1481728] [13] /home/kehanc/.conda/envs/pylmp/bin/python[0x4f8923]
[della-r4c4n8:1481728] [14] /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x3d1)[0x4e8571]
[della-r4c4n8:1481728] [15] /home/kehanc/.conda/envs/pylmp/bin/python[0x4f8923]
[della-r4c4n8:1481728] [16] /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x3d1)[0x4e8571]
[della-r4c4n8:1481728] [17] /home/kehanc/.conda/envs/pylmp/bin/python[0x4f8923]
[della-r4c4n8:1481728] [18] /home/kehanc/.conda/envs/pylmp/bin/python[0x4f7f3a]
[della-r4c4n8:1481728] [19] /home/kehanc/.conda/envs/pylmp/bin/python(_PyObject_CallMethodIdObjArgs+0x131)[0x507ef1]
[della-r4c4n8:1481728] [20] /home/kehanc/.conda/envs/pylmp/bin/python(PyImport_ImportModuleLevelObject+0x4d1)[0x507471]
[della-r4c4n8:1481728] [21] /home/kehanc/.conda/envs/pylmp/bin/python[0x513324]
[della-r4c4n8:1481728] [22] /home/kehanc/.conda/envs/pylmp/bin/python[0x507f97]
[della-r4c4n8:1481728] [23] /home/kehanc/.conda/envs/pylmp/bin/python(PyObject_Call+0x158)[0x506498]
[della-r4c4n8:1481728] [24] /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x5ae9)[0x4edc89]
[della-r4c4n8:1481728] [25] /home/kehanc/.conda/envs/pylmp/bin/python[0x4e729a]
[della-r4c4n8:1481728] [26] /home/kehanc/.conda/envs/pylmp/bin/python(_PyFunction_Vectorcall+0xd5)[0x4f8645]
[della-r4c4n8:1481728] [27] /home/kehanc/.conda/envs/pylmp/bin/python(_PyEval_EvalFrameDefault+0x3d1)[0x4e8571]
[della-r4c4n8:1481728] [28] /home/kehanc/.conda/envs/pylmp/bin/python[0x4e729a]
[della-r4c4n8:1481728] [29] /home/kehanc/.conda/envs/pylmp/bin/python(_PyFunction_Vectorcall+0xd5)[0x4f8645]
[della-r4c4n8:1481728] *** End of error message ***
srun: error: della-r4c4n8: task 0: Segmentation fault (core dumped)

akohlmey · February 14, 2023, 10:23pm

I disagree with this assessment. The error message very clearly indicates that this is a problem happening before anything LAMMPS related is executed. The error message is actually not from mpi4py itself but the OpenMPI library.

Please try to run the following python program in parallel from within your submit script.

from mpi4py import MPI

me = MPI.COMM_WORLD.Get_rank()
np = MPI.COMM_WORLD.Get_size()

print("Hello, World from MPI rank %d of %d" % ( me, np))

It should print lines like the following:

Hello, World from MPI rank 0 of 4
Hello, World from MPI rank 3 of 4
Hello, World from MPI rank 1 of 4
Hello, World from MPI rank 2 of 4

Kehan · February 14, 2023, 10:33pm

Yep, it indeed produces the result

Hello, World from MPI rank 0 of 4
Hello, World from MPI rank 1 of 4
Hello, World from MPI rank 2 of 4
Hello, World from MPI rank 3 of 4

Do you mean the OpenMPI library itself on our school’s cluster has something out of expectation?

akohlmey · February 14, 2023, 10:44pm

Yes, but is this when you submit to the batch system with the Conda environment loaded?

Kehan · February 14, 2023, 10:48pm

Yes, the Slurm script I used is

#!/bin/bash
#SBATCH --job-name=mpi4py # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=4 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)

module purge
module load anaconda3/2022.10
conda activate pylmp
module load gcc-toolset/12
module load openmpi/gcc/4.1.2

srun --mpi=pmi2 -n 4 python test.py

srtee · February 14, 2023, 11:16pm

Hi Kehan,

It sounds like you are trying to run on Della, which (from public information) has nodes with many different architectures. I hope you are going over your problems with your cluster administrator (or someone in your group who is experienced with HPC), because it is very tricky to use self-installed programs in such contexts, and double-tricky to use conda.

In particular, it could be that running your installation procedure on a login node may produce packages and executables that work only with that node structure. For example, you may have something that works on Intel CPUs but can’t run when SLURM puts your job on an AMD node.

Your error code mentions node r4c4n8, which is one of the older nodes in your cluster (Intel Broadwell). Try excluding the Broadwell nodes using this SLURM flag in your submit script and see what happens:

#SBATCH --exclude=della-r4c[1-4]n[1-16]

I reiterate that this sounds like something you should pursue with your local system administrator, instead of relying exclusively on remote help, since your sysadmin will know all kinds of things about your cluster that we don’t.

Kehan · February 14, 2023, 11:55pm

Thanks, srtee!

I will consult IT staffs in our school more detailedly. Actually I have discussed with them about the issue of installing mpi4py before, but they seemed not very clear about the potential mistakes may occur, where our school recommended to install mpi4py in virtual environments via pip instead of conda, which indeed triggered some errors and bothered me.

Regarding the machine architectures, the error I posted above arises across nodes, but I think it is still a good consideration to take architectures into account.

Thank you guys for comprehensive help again!

Best,

srtee · February 15, 2023, 12:29am

This sounds like good advice. Why not do that? The last time someone on my university’s cluster tried to conda something, it took a week of sustained sysadmin back-and-forth to unscramble their dotfiles.

My own anecdotal experience seems to be that conda works best on machines I own / administer, for data analysis – on clusters, it’s better to either dig down into pip, or go up to full containerization using Singularity / Apptainer.

akohlmey · February 15, 2023, 12:37am

I can confirm that there are multiple problems with using conda on HPC clusters. Frankly, it is a mystery to me, why it is so popular, since it often creates a mess, especially when installing binaries. The package maintainers for conda packages seem to be rather “casual” about compatibility.

We have users on our HPC clusters that use conda to install software and that has repeatedly cause complex and hard to debug problems (that were always first blamed on us and then turned out to be conda packaging issues). Not to mention that it often caused installation of binaries that would create “illegal instruction” or other errors during initialization, which seemed to have been mostly ignored by their users.

I have not had myself or seen with others similar problems with pip. Their packaging “rules” seem to be much more stringent and aimed at portability and consistency.

srtee · February 15, 2023, 12:46am

From Jake Vanderplas’s blog (2016):

If you have an existing system Python installation and you want to install packages in or on it, use pip+virtualenv.
…
If you want to flexibly manage a multi-language software stack and don’t mind using an isolated environment, use conda.

Assuming he’s (still) right, it makes a lot of sense that (1) lots of people are very happy with conda on their own boxes where they manage their own simple environment and (2) conda should absolutely be avoided on clusters where the whole point is to let the cluster admins manage the very complex environment. I suppose the snake metaphor is appropriate: conda works well as long as you’re happy for it to swallow your box whole …

Kehan · February 15, 2023, 1:09am

Lol, it is very interesting to know that experts like you prefer pip rather than conda. I could not answer why conda is so popular, at least as for me, I was taught to use conda instead of pip, perhaps because of the “simplicity”. For most cases, the packages existing on conda channels can be readily installed via a simple command conda install xxx, and I do not need to concern dependencies and tracks of packages. Of course, we may face problems reported by conda like version conflicts, which are annoying. The only life motto I have learned about conda and pip is, one should carry out all the needed conda installs before using pip. This prevents me from confronting so many error messages of conflicts, but it forces me to create new virtual environments so frequently in order to escape from using conda install after pip install.

I was happy to do so and followed instructions on our school’s website, until I found that pip install mpi4py would fail when python=3.9 in virtual environment, at least this happens on our school’s cluster. The instructions on website worked for python3.8, but not for python3.9, which took me some time to figure it out. On the contrary, conda install mpi4py would easily succeed. Of course, this is another issue, irrelevant to LAMMPS software.

Issue about installing LAMMPS Python module and implement it

Local host: della-r3c4n15
Local device: mlx4_0

Local host: della-r4c4n8
Local device: mlx4_0

Issue about installing LAMMPS Python module and implement it

Local host: della-r3c4n15 Local device: mlx4_0

Local host: della-r4c4n8 Local device: mlx4_0

Local host: della-r3c4n15
Local device: mlx4_0

Local host: della-r4c4n8
Local device: mlx4_0