Dear Axel,
Thanks for your reply. I realized my previous mistakes in the emails you mentioned were resolved by appropriate changes to the input script… However this problem seems to require something beyond this (although I might be wrong). I don’t see why my input script works if I use the default Hertzian potential as opposed to a custom written contact law (which works for 600 granules as opposed to the current sim of 2720 granules). I took your advice into consideration and tried debugging using valgrind. I am not well versed with this but I gave it a shot from the documentation you linked and this is what I get for the MPI version:
[akesnoff2@ssm-serv-03 new]$ valgrind mpirun -np 4 ./lmp_mpi -in jamming_3_cont3 _ep_ep.spheres1
==233344== Memcheck, a memory error detector
==233344== Copyright © 2002-2017, and GNU GPL’d, by Julian Seward et al.
==233344== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==233344== Command: /home/akesnof2/LAMMPS/bin/mpirun -np 4 ./lmp_mpi -in jamming _3_cont3_ep_ep.spheres1
==233344==
LAMMPS (16 Feb 2016)
Reading data file …
triclinic box = (-1e+07 -1e+07 -1e+06) to (1e+07 1e+07 1e+06) with tilt (0 0 0 )
2 by 2 by 1 MPI processor grid
reading atoms …
2720 atoms
reading velocities …
2720 velocities
Changing box …
triclinic box = (-1e+07 -1e+07 -1e+06) to (1e+07 1e+07 1e+06) with tilt (0 0 0 )
2716 atoms in group granules
4 atoms in group walls
1 atoms in group tw
1 atoms in group bw
1 atoms in group lw
1 atoms in group rw
2 atoms in group stationary
2718 atoms in group non_stationary
1109 atoms in group atoms
137 atoms in group atoms2
Neighbor list info …
2 neighbor list requests
update every 1 steps, delay 100000 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 200000
ghost atom cutoff = 200000
binsize = 100000, bins = 200 200 20
Setting up Verlet run …
Unit style : si
Current step: 0
Time step : 2e-07
rank 1 in job 224 ssm-serv-03.cluster.edu_33462 caused collective abort o f all ranks
exit status of rank 1: killed by signal 11
Similarly for the Serial compiled version, I get this:
[ akesnoff2@ssm-serv-03 new ]$ valgrind ./lmp_serial -in jamming_3_cont3_ep_ep.spheres1
==239204== Memcheck, a memory error detector
==239204== Copyright © 2002-2017, and GNU GPL’d, by Julian Seward et al.
==239204== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==239204== Command: ./lmp_serial -in jamming_3_cont3_ep_ep.spheres1
==239204==
LAMMPS (16 Feb 2016)
Reading data file …
triclinic box = (-1e+07 -1e+07 -1e+06) to (1e+07 1e+07 1e+06) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
2720 atoms
reading velocities …
2720 velocities
Changing box …
triclinic box = (-1e+07 -1e+07 -1e+06) to (1e+07 1e+07 1e+06) with tilt (0 0 0)
2716 atoms in group granules
4 atoms in group walls
1 atoms in group tw
1 atoms in group bw
1 atoms in group lw
1 atoms in group rw
2 atoms in group stationary
2718 atoms in group non_stationary
1109 atoms in group atoms
137 atoms in group atoms2
Neighbor list info …
2 neighbor list requests
update every 1 steps, delay 100000 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 200000
ghost atom cutoff = 200000
binsize = 100000, bins = 200 200 20
Setting up Verlet run …
Unit style : si
Current step: 0
Time step : 2e-07
==239204== Invalid write of size 8
==239204== at 0x5F8A8A: LAMMPS_NS::Neighbor::granular_bin_no_newton(LAMMPS_NS::NeighList*) (neigh_gran.cpp:396)
==239204== by 0x6490B3: LAMMPS_NS::Neighbor::build(int) (neighbor.cpp:1598)
==239204== by 0x4E2881: LAMMPS_NS::Verlet::setup() (verlet.cpp:117)
==239204== by 0x6A411F: LAMMPS_NS::Run::command(int, char**) (run.cpp:170)
==239204== by 0x477E15: void LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run>(LAMMPS_NS::LAMMPS*, int, char**) (input.cpp:723)
==239204== by 0x476488: LAMMPS_NS::Input::execute_command() (input.cpp:706)
==239204== by 0x476F41: LAMMPS_NS::Input::file() (input.cpp:243)
==239204== by 0x402932: main (main.cpp:31)
==239204== Address 0x0 is not stack’d, malloc’d or (recently) free’d
==239204==
==239204==
==239204== Process terminating with default action of signal 11 (SIGSEGV)
==239204== Access not within mapped region at address 0x0
==239204== at 0x5F8A8A: LAMMPS_NS::Neighbor::granular_bin_no_newton(LAMMPS_NS::NeighList*) (neigh_gran.cpp:396)
==239204== by 0x6490B3: LAMMPS_NS::Neighbor::build(int) (neighbor.cpp:1598)
==239204== by 0x4E2881: LAMMPS_NS::Verlet::setup() (verlet.cpp:117)
==239204== by 0x6A411F: LAMMPS_NS::Run::command(int, char**) (run.cpp:170)
==239204== by 0x477E15: void LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run>(LAMMPS_NS::LAMMPS*, int, char**) (input.cpp:723)
==239204== by 0x476488: LAMMPS_NS::Input::execute_command() (input.cpp:706)
==239204== by 0x476F41: LAMMPS_NS::Input::file() (input.cpp:243)
==239204== by 0x402932: main (main.cpp:31)
==239204== If you believe this happened as a result of a stack
==239204== overflow in your program’s main thread (unlikely but
==239204== possible), you can try to increase the size of the
==239204== main thread stack using the --main-stacksize= flag.
==239204== The main thread stack size used in this run was 8388608.
==239204==
==239204== HEAP SUMMARY:
==239204== in use at exit: 31,507,245 bytes in 890 blocks
==239204== total heap usage: 1,166 allocs, 276 frees, 34,100,453 bytes allocated
==239204==
==239204== LEAK SUMMARY:
==239204== definitely lost: 0 bytes in 0 blocks
==239204== indirectly lost: 0 bytes in 0 blocks
==239204== possibly lost: 0 bytes in 0 blocks
==239204== still reachable: 31,507,245 bytes in 890 blocks
==239204== of which reachable via heuristic:
==239204== stdstring : 7,062 bytes in 204 blocks
==239204== newarray : 320 bytes in 5 blocks
==239204== suppressed: 0 bytes in 0 blocks
==239204== Rerun with --leak-check=full to see details of leaked memory
==239204==
==239204== For lists of detected and suppressed errors, rerun with: -s
==239204== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
I’m working on using a newer LAMMPS but since there are custom written files from past members in my lab, the previous emails from you suggest that incorporating these features won’t be trivial and I have been struggling a bit with the installation process. My apologies.
Do you happen to see anything glaringly wrong from the above error message for the MPI version?
Thanks as always!
Best,
Aved