We do test of GULP with gfnff … We use MPI parallelized GULP. All works OK e.g. for X23 test structures (e.g. CYHEXO) = different MPI parallelization gives identical number of cycles and identical results.
For a more complex structure (.gin follows) gfnff gives different results based on MPI cores used. In addition some of the results do not give sense according geometry. Can anybody test what is wrong with this structure?
fractional
Cl 0.58506000000000 0.36384000000000 0.24194000000000
O 0.66500000000000 0.25634000000000 0.34250000000000
O 0.67710000000000 0.38079000000000 0.09400000000000
O 0.64180000000000 0.46860000000000 0.35680000000000
O 0.36990000000000 0.35523000000000 0.15310000000000
N -0.25060000000000 0.65783000000000 0.22100000000000
H -0.30920000000000 0.60610000000000 0.14830000000000
H -0.31550000000000 0.70880000000000 0.25110000000000
N 0.24440000000000 0.55704000000000 0.28600000000000
H 0.29050000000000 0.62000000000000 0.25660000000000
H 0.28470000000000 0.49850000000000 0.23990000000000
H 0.28580000000000 0.55050000000000 0.40250000000000
C -0.03980000000000 0.65910000000000 0.29400000000000
H 0.00330000000000 0.65340000000000 0.42110000000000
H 0.00870000000000 0.72880000000000 0.26340000000000
C 0.03370000000000 0.56050000000000 0.21830000000000
H -0.01640000000000 0.49140000000000 0.24870000000000
H -0.01140000000000 0.56650000000000 0.09120000000000
There are 2 issues here & so I’ll respond separately:
Why are the results different as a function of the number of cores?
From my tests with variable numbers of cores, the only point at which things start to deviate is when the structure goes crazy. This is not surprising, since when things go unphysical then the answers can be become sensitive to numerical precision and under MPI the mathematical operations are in a different order (e.g. there are global sums across processors that don’t exist on a single core) and so the answers change, which then is propagated as the system optimises. The important thing is that everything is OK (for me anyway) while the system is at a sensible geometry.
Why does the system not behave sensibly based on the input file given?
The reason why things go crazy is because of the electronic structure that is implied by the input file. Your system consists of organic molecules and a ClO4 fragment. By default, each molecular fragment is charge neutral & so this constrains the charges in GFN-FF. Of course, this isn’t physical as you should really have a ClO4- anion and the organic molecule is a cation due to the -NH3+ group. The solution is to initialise the charges of the fragments to be the correct ones. All that matters is that the sum of the charges within each fragment is correct & not the distribution (which is determined by GFN-FF). So if you make Cl = -1 and one of the N atoms +1, then things are sensible and structure optimises reasonably.
Here is my changed coordinate input that fixes things:
fractional
Cl 0.58506000000000 0.36384000000000 0.24194000000000 -1.0
O 0.66500000000000 0.25634000000000 0.34250000000000
O 0.67710000000000 0.38079000000000 0.09400000000000
O 0.64180000000000 0.46860000000000 0.35680000000000
O 0.36990000000000 0.35523000000000 0.15310000000000
N -0.25060000000000 0.65783000000000 0.22100000000000 1.0
H -0.30920000000000 0.60610000000000 0.14830000000000
H -0.31550000000000 0.70880000000000 0.25110000000000
N 0.24440000000000 0.55704000000000 0.28600000000000
H 0.29050000000000 0.62000000000000 0.25660000000000
H 0.28470000000000 0.49850000000000 0.23990000000000
H 0.28580000000000 0.55050000000000 0.40250000000000
C -0.03980000000000 0.65910000000000 0.29400000000000
H 0.00330000000000 0.65340000000000 0.42110000000000
H 0.00870000000000 0.72880000000000 0.26340000000000
C 0.03370000000000 0.56050000000000 0.21830000000000
H -0.01640000000000 0.49140000000000 0.24870000000000
H -0.01140000000000 0.56650000000000 0.09120000000000
Thank you for the explanation. I can confirm that after adding the charges as you suggest the MPI parallelization have no influence on the result.
I believed pGFN-FF can take care about charges fully automatically. Does it mean pGFN-FF will need such sort of initialization for any salt crystal (e.g. sulfates, salts of organic acids, chlorides …) ? Or is this structure an exception ? To be honest this structure is an artificial one probably not existing in reality (generated as a part of a fraudulent article). Our idea is to process structures from crystallographic databases and find the structures differing from their energy minimized equivalent. But if pGFN-FF require charges for fragment, we must implement some sort of fragment charge recognition engine - very far from trivial.
This is discussed in the pGFN-FF paper (https://pubs.acs.org/doi/full/10.1021/acs.jctc.1c00832). GFN-FF takes care of the charges automatically, given a particular connectivity and overall fragment charge. It’s possible to vary the results for ionic materials according to whether you treat them as ionic or a covalent framework. So in short there is nothing special about the structure - it’s just that it can be treated either as composed of two neutral radical species or two closed shell charged ones. I’ll think about adding an option that allows the default choice to be changed for users that don’t want to delve into the structural details, but want a complete black box.
I understand. So we must handle the charge detection on our side anyway.
Just in case somebody want to automatically detect bond order of all atoms + charge of a fragment for organic molecules, python code utilizing rdkit follows. It is a part of our engine for fully automatic MM calculation GULP setup from CIF file.
from rdkit import Chem
from rdkit.Chem import rdDetermineBonds,rdForceFieldHelpers,AllChem
from rdkit.Chem import Draw
#charges = [0,1,-1,2,-2,3,-3,4,-4,5,-5,6,-6] # try up to +-6
charges = [-2,-1,0,1,2] # try up to +-6
solutions=dict([])
xyz_path="molecules/maleic_acid_ion.xyz"
for charge in charges:
try:
raw_mol = Chem.MolFromXYZFile(xyz_path)
if raw_mol == None:
print('Unable to load XYZ file')
rdDetermineBonds.DetermineBonds(raw_mol,charge=charge)
rdForceFieldHelpers.MMFFHasAllMoleculeParams(raw_mol)
solutions[charge]=raw_mol
except:
pass
assert len(solutions)>0, "ERROR: didn't find a fitting charge"
best_charge = 0
lowest_energy = 99999
best_mol = None
for charge in solutions:
mp = AllChem.MMFFGetMoleculeProperties(solutions[charge])
FF = AllChem.MMFFGetMoleculeForceField(solutions[charge], mp)
e = FF.CalcEnergy(FF.Positions())
if e < lowest_energy:
lowest_energy = e
best_mol=solutions[charge]
best_charge=charge
print(f'final molecule is {Chem.MolToSmiles(Chem.RemoveHs(best_mol))} with a charge of {best_charge}')
img = Draw.MolToImage(best_mol)
img.show()