about the scale of lammps-gpu

I tried your input on my box with a single MPI process.

Without gpu acceleration, the host uses about 600MB RAM with 407720 atoms.

With gpu package, you can expect a little more than twice that because full neighbor lists are used (neighbor list is twice as big).

Running

nvidia-smi -l

while your simulation is running indicates that 1478 MB are in use.

I did notice that there were dangerous builds from your input script. You might want to look at the documentation for neigh_modify.

- Mike