Problem with fix reaxff/species command

Dear developers
I am using LAMMPS version 27 June 2024 on Intel Gold 56 cores server.
I am using reaxFF force field and use the command fix reaxff/species

fix  2 all reaxff/species 2 6 1000 ${rad}_output${T}K/species_${myfile}_${seed1} element C Cl H N O S

I am running a lammps script using partition flag -p 50x1
So 50 species _ files are successfully created in the targeted folder.
But some of them have a structure problem that appears randomly.
The problem is that the # time line is not written at a beginning of a line.
Below is a part of a faulty species_ file

#  Timestep    No_Moles    No_Specs  C22H23O9N2         H2O        H3O2          HO           O
       1000          58           5           1          41           7           8           1
#  Timestep    No_Moles    No_Specs C22H23O10N2         H2O          HO        H3O2        H4O2
       2000          57           5           1          40           9           6           1
#  Timestep    No_Moles    No_Specs  C21H19O9N2        CH3O        H4O2         H2O          HO        H5O3        H3O2
       3000          54           7           1           1           1 #  Timestep    No_Moles    No_Specs  C22H22O9N2         H2O        H3O2          HO
       4000          56           4           1          39           9           7
#  Timestep    No_Moles    No_Specs C22H24O12N2         H2O        H5O3          HO        H3O2           H
       5000          55           6           1          40           2           7           4           1
#  Timestep    No_Moles    No_Specs C22H22O11N2         H2O        H3O2        H3O3           H          HO
       6000          56           6           1          43           6           1           1           4
#  Timestep    No_Moles    No_Specs C22H22O11N2         H2O         H3O        H3O2        H2O2           H          HO
       7000          56           7           1          40           1           7           1           1           5
#  Timestep    No_Moles    No_Specs C22H22O11N2         H2O        H3O2          HO           H
       8000          55           5           1          40           9           4           1
#  Timestep    No_Moles    No_Specs C22H20O11N2         H2O        H3O2          HO
       9000          58           4           1          47           5           5

As you can see, it happens, in this case, at timestep 3000. But, at the end of this line the next # timestep … line is not starting at the beginning of the next row as for the other couple of output lines.

It is a problem when I wish to read all the generated files for post-treatment purpose. My Python code dedicated to identifying generated species and doing averaging of recorded values at each timestep fails thus to read faulty files.

Have you an idea from the origin of this random error ?
Thanks for you help.
Best regards
Pascal

Some questions:

  • What platform are you running on?
  • Does the same effect happen with fewer replica? If yes, how few?
  • Does the same effect happen on a different platform?
  • Do you observe the same issue if you modify the input for the RDX or TATB example from the LAMMPS distribution accordingly?

If you can reproduce it with RDX/TATB, then please share the input deck. Otherwise, provide that input deck you are using.

At the moment, I can only speculate about the reason for the difference.
Can you also please attach a compressed version of a corrupted file (please compress with zip or gzip right where you simulate and then transfer/attach the file).

Dear Axel
Thanks a lot for your advices.
I am running with updated (few days ago) Centos 7.9, with gcc 11
I have checked the 50 species files (species_doxycycline_number) and the corresponding log files in the attached archive.
There are 11 corrupted files over the 50 generated as species and log files. They are those with 258728, 336234, 373869, 405468, 475319, 507009, 595634, 613828,743873, 764643, 773440 as number character chain in file name.
I also check that there is a problem in the corresponding log files (doxycycline_number.log). In both cases number refers to a seed value used in the script. It allows to identify log file associated with species file.
Problems are : Some output values are not printed in species and log files, carriage return error,
It suggest this a problem with the computer. I should notice that it does not occur on my windows computer running with -p 10x1. I cannot run 50x1 on it. I will submit it on an AMD Geona computer for comparison under debian Linux. But first of all I will redo the calculation for checking if the errors are also randomly occuring .
thanks a lot in advance
Pascal
HO_output1000K.zip (262.1 KB)

The corruption that you see in the log files is a strong hint that you had two processes writing to the same file. This could also explain the unexpected behavior of fix reaxff/species output files.

Thanks a lot Axel
I will check. I replay the calculation and found the errors at same places. I run it also without kokkos enabled and the error occured but at different timestep. I have ran it remotely with and without nohup for checking if background operation play a role. Same error occured.
I will check if my script is able to allow such behaviour.
Below are the commands:

# Organochloride advanced oxidation
# This version executes 50 runs with different initial conditions for one molecule, one radical and one temperature
# mpirun -np 50 lmp -var myfile doxycycline -var rad HO -var T 1000 -p 50x1 -k on -sf kk -in Organochloride_HO-H2O.lammps
# Each job runs on a single mpi task
#
# Settings 
echo            both
units		    real # t= fs; L= A; v =A/fs,; E = kcal/mol
dimension	    3
boundary	    p p p
atom_style	    charge
#
# Input variable definition
# Different initial conditions : here 50 different sets of seeds for molecules location and orientations, define 50 runs.
variable seed1   world   457297  679560  464378  533583  683578  717732 546440 392254   88993   57341   214814  37932   738574  747376  743873  12109   855462 585730  361279   477688  674176  475319  481690  562015  531934  732934  191346  456669 147216  447568   595634  336234  773440  258728  373869  405468  507009  764643  613828 47568   595634  336234  773440  258728  373869  405468  507009  764643  613828   901033
variable seed2   world   648688  779971  476638  393257  695979  17438  679636 718766   303177  16064   598231  696570  443079  106265  642989  94057   450420 569829  228088   61003   419723  975553  545607  848464  112307  207999  402768  82910  928663  559699   138609  393002  899716  454217  845918  704536  845474  995786  20591  414227  270766   268298  881858  471881  360223  587249  872812  760449  88040   574702
variable seed3   world   966749  361726  692867  690610  878076  192171 976121 872511   867498  669079  593461  252681  203555  22811   144885  457195  469102 540218  333279   385879  615085  345048  615618  114381  641393  807834  402028  773040 577949  185725   389609  843356  87463   454757  164797  290757  800172  344521  491688 191036  625731   350407  469155  846875  55858   256163  85875   204857  778405  563375
variable R    string rad
variable file string myfile
variable temperature string T
shell mkdir ${rad}_output${T}K
# Log file 
log ${rad}_output${T}K/${myfile}_${seed1}.log
# molecules in the box
molecule radical     ${rad}
molecule water       H2O
molecule organochlo  ${myfile}
# Simulation box
region 		    mybox block 0 15 0 15 0 15
create_box	    6  mybox  
create_atoms    0 single  7.5 7.5 7.5       mol organochlo ${seed2}
create_atoms    0 random  45 ${seed2} mybox mol water      ${seed3} overlap 2.0 maxtry 200
create_atoms    0 random  20 ${seed3} mybox mol radical    ${seed1} overlap 2.0 maxtry 200
....
# identify species created/destroyed
fix  2 all reaxff/species 2 6 1000 ${rad}_output${T}K/species_${myfile}_${seed1} element C Cl H N O S

Thanks again for your help
Best.
Pascal

You are using these values for seed1 which is also used in the file name of the species file and the log file. But some of those seeds are repeated, so there are only 41 unique numbers instead of 50. Thus some simulations are using the same seed1 value and are writing to the same output files. There should be only 41 of those.

Thanks a lot Axel
I did not think about such a problem, being sure that my python code generating randomly such seeds can not generate multiple time the same seed.
I will modify it for avoiding such a problem.
Thanks again
Best
Pascal