Hello,
I’m a relatively new LAMMPS user but an even newer HPC attempted-user. If it sounds like I don’t know what I’m talking about, it’s because I don’t, and I appreciate your patience in advance.
I have scripts that work on LAMMPS, and I’d like to run them on my campus HPC cluster. They work on the login node, but to run them on the cluster I have to submit a job, which has a bunch of preamble lines and then a final line directing the script to a specific compiled executable file. I can get this to work with a hello world C program. If it helps, these are the instructions for submitting jobs: Article - HPC User Guide
My actual LAMMPS command would be like “lmp_serial < file.in”, which I can modify to include the full path to both lmp_serial and file.in if need be. This runs in the login node. If I write a bash script that implements these lines and call it with “bash script.x” it still works. Then when I put it at the end of the job submission file and submit it, it does not work. What happens is that the job “starts” and then immediately terminates. There is some other issue where I am not receiving error logs, but the HPC guy told me the following:
“error “line 15: /home/username/rl2.x: Permission denied”. It appears to me that rl2.x is feeding a configuration file called /home/username/lammps/knots/run.lam to /home/username/lammps/src/lmp_serial. “lmp_serial” appears to be a compiled file so I cannot read what line 15 is doing.”
After that, he will provide no more help. I appreciate any you may have. Is there a way I can get LAMMPS to run via submitted jobs?
Compile a recent version of LAMMPS with parallel support, otherwise what is the point of using HPC at all?
Tip: when you compile LAMMPS from the login node, you will have to load the modules that will also go in the submission script, eg openMPI, GNU or Intel compilers, etc. This is to ensure that the libraries present at the time of compiling match those at the time of execution.
Surely, there must be other HPC users at your institution that are successfully running on your specific machine. I suggest reaching out to (some of) them and sit down with one that is willing to look over your shoulder and provide instructions. You should also consult with your adviser; after all, it is the job of your adviser to advise and train you in the tasks you are supposed to be doing or at least assist in arranging for some tutoring.
I can really only give one very specific advice: please stop using I/O redirection (via the ‘<’ character) to feed an input to LAMMPS, but use the ‘-in’ command line flag instead. The redirection is an inherited mechanism from the early days of LAMMPS when it was written in Fortran (since at the time Fortran had no standard and portable way to access command line flags, unlike C++). It was carried over for backward compatibility, but it is very problematic when running in parallel and can cause unexpected failures. The support for I/O redirection will eventually be removed from LAMMPS, so - as a beginner - it is a good idea to start using the preferred way of launching LAMMPS calculations.
Since launching jobs on HPC clusters is very system specific, there is little advice to be had from people outside your campus. From what I see on the website you were pointing out, the instructions are not aimed at beginners but at people who do have sufficient familiarity with Linux and advanced computing environments. This is unfortunate, but also a fair approach, since it should be expected that people using an HPC facility have made some effort and got trained or trained themselves to be competent users and not waste quite expensive (to procure and operate) resources. If you were working in an experimental lab, nobody would let you touch equipment without giving you the proper training and explaining the usage policies. Also, I see that your cluster is using a commercial queueing software that is different from the most commonly used SLURM package. That means that a lot of advice you will find online will not apply or only apply after some modifications.
Thanks. It would be nice if I could ask my advisor, but unfortunately I am the advisor
I’ve spoken with some of the users to no avail, but I can keep trying at others. We are not a major research institute so there isn’t a huge amount of institutional knowledge.
As @hothello pointed out, getting help depends on how easy you make it to help you. If you communicated with your local people (including the HPC admin) in the same, rather vague way as you did in your post here, you make it difficult to help you.
There is plenty of tutorial material available for different HPC facilities at different institutions. Some of them do a pretty good job at writing accessible documentation. Also there is a lot of self-training material in the national supercomputing centers and the NSF sponsored HPC cyberinfrastructure: https://support.access-ci.org/
It is fairly easy to get access and a small allocation on one of those and that will give you access to a whole new level of expertise and people more accustomed to dealing with inexperienced users.