Thank you so much for your response, I appreciate it.
So the segfault persists when I run a single MPI process with one GPU, and also when I run the in.rhodo example (both on one or two GPUs).
The segfault looks like this:
[email protected]…5531…:~/pleuX4_pe384pg128_r2$ ./equil.sh
[picadilly:14250] *** Process received signal ***
[picadilly:14250] Signal: Segmentation fault (11)
[picadilly:14250] Signal code: Address not mapped (1)
[picadilly:14250] Failing at address: (nil)
[picadilly:14250] [ 0] [0xb778540c]
[picadilly:14250] *** End of error message ***
I cannot reproduce the issue with your input script and data (config) file on my side with the 18Apr15 version.
Can you try specifying “neigh no” for the gpu package command, which enforces the neighbor list builds to be
done on the host, to see what happens (also with single MPI process on only one GPU)?
Normally the GPU package should work with all the given examples in stable LAMMPS versions.
When it segfaults with in.rhodo, something wrong happens beyond the code itself.
Can you also try running with other examples with the GPU package (in.lj and in.phosphate)?
If the segfault persists for those runs, I think it might result from hardware, which you can use
a tool like cudagpumemtest to detect, as previously discussed on this mailing list.
Sorry for the slightly late reply - I wanted to ensure I had exhausted all options.
The segfault persists with the suggestions you made, so as you say it is likely a problem with the hardware I’m using. Thank you for the recommendation of the cuda_memtest tool. It didn’t throw up any problems so I think I’m going to have to find someone local who would be able to look at it more closely with me, and in the mean time wait for one of my simulations to finish on another machine.
Thank you so much for answering my questions and sharing your expertise. I really appreciate it.
With best wishes,
MRes student in Biophysics
King’s College London