Recently, I ran a simulation using LAMMPS. The script works when the run step is less than about 9e8. However, the simulation usually stops without any error but shows “failed: Connection reset by peer (104)” (see the figure).
How should I fix it? If you need more information, please let me know. Thanks.
I agree that this is not a LAMMPS issue and am unclear on the appropriateness of offering off-topic advice. That being said, I’d suppose the preferred solution to this is to use a job scheduling system on the remote server (SLURM) but if that is not applicable, I would recommend reading about the screen command as a potential solution.
You are using TCP/IP communication for a parallel LAMMPS run. That is likely going to be very inefficient. The error indicates that you may be overloading the network.
It is unlikely to be fixed from the LAMMPS side.
Whenever you report issues you should report: the exact LAMMPS version you are using, how it was compiled/installed, what platform you are running on, what your command line was, what your input is.