Join molecules objects

mdigennaro · April 14, 2020, 7:28am

Hello everyone,
I have two molecules in two separate xyz files: benzene.xyz and methane.xyz .
I would like to create a methylbenzene (C7H8) molecule out of these two using pymatgen.

First I defined a function which uses pybel to read a pymatgen object and return a SMILES string out of it.

from IPython.core.display import SVG, display_svg
from pymatgen.io.babel import BabelMolAdaptor
import pybel as pb

def quick_print_pmg( pmg_obj ):

    bma_obj = BabelMolAdaptor(pmg_obj)
    pbm_obj = pb.Molecule(bma_obj.openbabel_mol)
    smi_obj = pb.readstring('smi', pbm_obj.write("can"))
    print("Canonical SMILES = {}".format(smi_obj.write("can")))

    #lines below are to show the result
    #smi_obj.make3D()
    #svg_obj = SVG(smi_obj.write("svg"))
    #display_svg(svg_obj)

Now, we can open the two text files and read the molecules

from pymatgen import Molecule

benz_pmg = Molecule.from_file("benzene.xyz")
meth_pmg = Molecule.from_file("methane.xyz")

Two hydrogen atoms have to be removed (and thrown away). I have tried to simply removing the two hydrogen atoms and make a for loop to append() the atoms from methane to benzene, but this gives no guarantee that the result is correct, plus I have to make sure atoms do not overlap.

benz_pmg.pop(-1)
meth_pmg.pop(-1)

for atom in meth_pmg:
    benz_pmg.append(atom.specie, atom.coords + [1,1,1])

quick_print_pmg( benz_pmg )

Canonical SMILES = [CH]=[C][CH]C=[CH].[CH2][CH2]

Another option that I have tried is the substitute() method of the Molecule class from pymatgen, which seems like exactly what I am looking for, but I cannot get it working straight.

# read coordinates from files again

benz_pmg = Molecule.from_file("benzene.xyz")
meth_pmg = Molecule.from_file("methane.xyz")

benz_pmg.substitute(0, meth_pmg)
quick_print_pmg( benz_pmg )

Canonical SMILES = [CH]/C=C\[CH].C.[H].[H]

The canonical smiles I am looking for is rather C1=CC=CC=C1C. Could anyone explain me how to use this properly?

Thank you.

Marco

peterschindler · April 17, 2020, 12:34am

@mdigennaro On top of my head, I cannot think of a quick and easy solution because both molecules have degrees of freedom in terms of their rotation to one another. There is definitely a way of hard-coding this but I am wondering, is there a reason why you don’t just get the final molecule structure from a database (like you did with benzene and methane)?

mdigennaro · April 17, 2020, 1:36pm

Hello @peterschindler,

I just brought an example here, a very easy one! I want to use this routine to create new, unknown structures. I actually found out that instead of this

benz_pmg.substitute(0, meth_pmg)

I should rather do:

#From the documentation:
#The first atom must be a DummySpecie X, indicating the position of
#nearest neighbor. The second atom must be the next nearest atom.

# reorder and rename Dummy atom
new_meth= Molecule.from_sites([meth_pmg [1], meth_pmg [0], meth_pmg [2], meth_pmg [3], meth_pmg [4]])
met2[0].species = "X"

#substitute
benz_pmg.substitute(6, new_meth) #atom 6 is an H, while 0 was a C

and it seems to be working.