please also note, that LAMMPS will print out a Performance estimate at the end of a run, and you can assume, that if you run for, say, 1000 MD steps of a properly equilibrated system, you should get a fairly accurate estimate about how much (wall) time is required to simulate a nanosecond for the given choice of processors/GPUs/threads.
p.s.: here is an example of one such output from the rhodo benchmark bundled with LAMMPS (bench/log.6Oct16.rhodo.fixed.icc.4):
Loop time of 9.39107 on 4 procs for 100 steps with 32000 atoms
Performance: 1.840 ns/day, 13.043 hours/ns, 10.648 timesteps/s
99.8% CPU use with 4 MPI tasks x no OpenMP threads