[Possible Bug] read_restart only reads first 29 characters of filenames

When using the read_restart command in a script with a wildcard, it seems like LAMMPS messes up if the filename is greater than 29 characters. Take the following files in a folder:

	0123456789012345678901234567_A-200.restart
	0123456789012345678901234567_B-100.restart

If I try reading the second restart file using read_restart 0123456789012345678901234567_B-*.restart, I get the following error:

Cannot open restart file 0123456789012345678901234567_B-200.restart: No such file or directory (../read_restart.cpp:112)

This error should not be happening, because the file ...B-200.restart does not exist.

Reducing the filename by 1 character overcomes this issue:

	012345678901234567890123456_A-200.restart
	012345678901234567890123456_B-100.restart

Restart file is correctly read and resumed from 012345678901234567890123456_B-100.restart

I didn’t see any documentation about filenames needing to be shorter than 29 characters, so to me this looks like a bug. My filenames are usually long as I have a lot of variables in my filenames.

What is your LAMMPS version?

It is 29 Sep 2021. I can try updating to the latest version and try again

I’ve looked at the changes between the current code and the 29 Sep 2021 release and there is nothing that would affect the length of filenames.

I just made a quick test with much longer file names and have no problem on my Linux desktop.

What kind of OS and file system are you running on?

Please try to run the following input:

units           lj
atom_style      atomic
lattice         fcc 0.8442
region          box block 0 2 0 2 0 2
create_box      1 box
create_atoms    1 box
mass            1 1.0

velocity        all create 3.0 87287 loop geom
pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5

neighbor        0.3 bin
neigh_modify    every 1 delay 5 check no

fix             1 all nve

restart 100 01234567890123456789012345678901234567890_A-*.restart

thermo          50
run             500 post no

clear

shell rm 01234567890123456789012345678901234567890_A-500.restart
shell ls 01234567890123456789012345678901234567890_A-*.restart
read_restart 01234567890123456789012345678901234567890_A-*.restart
run 0 post no

It should generate 5 restart files on steps 100, 200, 300, 400, 500, then wipe out LAMMPS, delete the restart for step 500, read with wildcard and then pick up the step 400 restart as expected.

I tried the script. It runs fine, but if I change the shell command to the following, it starts throwing the error.

shell mv 01234567890123456789012345678901234567890_A-500.restart 01234567890123456789012345678901234567890_B-500.restart

The error that I get:

Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 2.581 | 2.581 | 2.581 Mbytes
Step Temp E_pair E_mol TotEng Press
       0            3   -6.7733681            0   -2.4139931    -3.781861
      50    1.6613601   -4.8315081            0   -2.4173443    5.3583974
     100    1.8620796    -5.124342            0   -2.4185075    4.0555983
     150     1.646154   -4.8145181            0   -2.4224505    5.6638944
     200    1.4445937   -4.5194162            0    -2.420241     6.895903
     250     1.378496   -4.4156077            0   -2.4124807    6.9913609
     300    1.6899999   -4.8748573            0   -2.4190763     5.313943
     350    1.6283077   -4.7849736            0   -2.4188389    5.7547692
     400     1.643819   -4.8059329            0   -2.4172585    5.5594792
     450    1.8192217   -5.0631528            0   -2.4195963    4.3212518
     500    1.5074663   -4.6113638            0   -2.4208269    6.1858876
Loop time of 0.062669 on 1 procs for 500 steps with 32 atoms

Reading restart file ...
ERROR on proc 0: Cannot open restart file ./01234567890123456789012345678901234567890_A-500.restart: No such file or directory (../read_restart.cpp:112)

If I comment out the mv shell command, there is no more error:

Per MPI rank memory allocation (min/avg/max) = 2.581 | 2.581 | 2.581 Mbytes
Step Temp E_pair E_mol TotEng Press
       0            3   -6.7733681            0   -2.4139931    -3.781861
      50    1.6613601   -4.8315081            0   -2.4173443    5.3583974
     100    1.8620796    -5.124342            0   -2.4185075    4.0555983
     150     1.646154   -4.8145181            0   -2.4224505    5.6638944
     200    1.4445937   -4.5194162            0    -2.420241     6.895903
     250     1.378496   -4.4156077            0   -2.4124807    6.9913609
     300    1.6899999   -4.8748573            0   -2.4190763     5.313943
     350    1.6283077   -4.7849736            0   -2.4188389    5.7547692
     400     1.643819   -4.8059329            0   -2.4172585    5.5594792
     450    1.8192217   -5.0631528            0   -2.4195963    4.3212518
     500    1.5074663   -4.6113638            0   -2.4208269    6.1858876
Loop time of 0.0673708 on 1 procs for 500 steps with 32 atoms

Reading restart file ...
  restart file = 29 Sep 2021, LAMMPS = 29 Sep 2021
  restoring atom style atomic from restart
  orthogonal box = (0.0000000 0.0000000 0.0000000) to (3.3591924 3.3591924 3.3591924)
  1 by 1 by 1 MPI processor grid
  restoring pair style lj/cut from restart
  32 atoms
  read_restart CPU = 0.003 seconds
WARNING: No fixes defined, atoms won't move (../verlet.cpp:55)
Neighbor list info ...
  update every 1 steps, delay 10 steps, check yes
  max neighbors/atom: 2000, page size: 100000

Here is information about my Linux release I’m using:

lsb_release -a
LSB Version:  :core-4.1-amd64:core-4.1-ia32:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-ia32:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-ia32:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description:  Red Hat Enterprise Linux Server release 7.9 (Maipo)
Release:  7.9
Codename: Maipo

It looks like the filesystem of the folder I’m running the script in is nfs.

If possible, kindly see if my shell command gives you the same error.

Thank you!

I have no problem with using the “shell mv” command. It picks up the step 400 file as expected.

The NFS file system is likely the problem. It probably does what is called “attribute caching” which can lead to the directory listing becoming inconsistent with the files.

Okay I see, thank you very much for testing it out on your own machine. I wasn’t aware of a possible caching issue, as I thought it would be all right as long as the filenames were under the ~255 character limit. I will bring this up with my system administrator.

Please wait a little bit. I have found a possible candidate for a bug.

1 Like

Please go to your LAMMPS source and edit the file utils.cpp as follows, recompile, and try again.

  diff --git a/src/utils.cpp b/src/utils.cpp
  index eb9e48985a..aaca268174 100644
  --- a/src/utils.cpp
  +++ b/src/utils.cpp
  @@ -1516,8 +1516,8 @@ static int re_matchp(const char *text, re_t pattern, int *matchlen);
   
   /* Definitions: */
   
  -#define MAX_REGEXP_OBJECTS 30 /* Max number of regex symbols in expression. */
  -#define MAX_CHAR_CLASS_LEN 40 /* Max length of character-class buffer in.   */
  +#define MAX_REGEXP_OBJECTS 256 /* Max number of regex symbols in expression. */
  +#define MAX_CHAR_CLASS_LEN 256 /* Max length of character-class buffer in.   */
   
   enum {
     RX_UNUSED,

The problem is not the length of the filename, but how many characters are identical between files and how difficult it is to program a search to find the file with the largest number in place of the *.

With the old settings, strings that were the same in the first 30 characters were considered the same. So if a filename where character 31 was a B instead of an A but everything before was identical, it would still match the one with the A. This then would result in the wrong file name being read because the code would extract the number and then “synthesize” from the largest number found.

This is what I was thinking was the issue as well.

Should I try updating the definitions in utils.cpp? I’m not sure if that would solve it.

Or maybe I could just use a shell command to obtain the path of the latest restart file:

find ./ -name "01234567890123456789012345678901234567890_A-*.restart" -printf "%f\n" | tail -1

I could then pipe this to a variable in LAMMPS and then restart the simulation from that variable

Yes. It will solve the issue.

1 Like