Dear all,
I’m currently using lammps with gpu acceleration. My gpu model is rtx4090. I have specified one gpu. My simulation system has 60,000 beads. My current fastest speed is 500 ns/day. It seems a bit slow. I wonder if there are any methods or benchmarks to improve the speed for my reference. I’m currently invoking 4 MPIs and 4 OpenMPs.
Hi @ULTRAMAN,
A bit slow compared to what? This also depends on the complexity of your model and what you are trying to compute.
This is only relevant with regards to the number of available CPU and GPU on your machine.
As stated in the documentation of the gpu package:
You should also experiment with how many MPI tasks per GPU to use to give the best performance for your problem and machine. This is also a function of the problem size and the pair style being using. Likewise, you should experiment with the precision setting for the GPU library to see if single or mixed precision will give accurate results, since they will typically be faster.
You can also follow the guidelines to improve the performance of your simulation.
This is a bad property to describe the speed of the calculation, since it depends on the simulation settings like the choice of timestep. A better property is the atom-step/s property also output by LAMMPS. This one only depends on the choice of model, so it much more comparable. There is a discussion of how to compare and document performance in the LAMMPS manual at: 9.2. Measuring performance — LAMMPS documentation
Before figuring out the optimal performance for your specific system, you should use one of the input decks in the bench folder of the LAMMPS distribution (make sure you read the README file) and provide some performance information for that. You also need to provide the usual info that should always be reported like:
- exact LAMMPS version
- compilation settings
- compiler and toolkit versions
- host operating system
- full command line
For example, it is not even clear from your message whether you are using the GPU or the KOKKOS package and whether you run with single, double, or mixed precision settings (which has a significant impact on performance since your GPU has - like all consumer GPUs from Nvidia - a deliberately crippled floating-point unit with very restricted double precision performance.