How can I convert xyz to full style lammps data file?

JeJoon_Yeon · October 20, 2015, 8:12pm

Dear all,

How can I get the full style lammps data file from xyz format?

I tried topotools, but the molecular ID numbers in the converted full style data file are wrong. All atom’s molecular ID numbers are 1. In addition, I’m not sure whether the bonds information in the vmd-created full style data files are correct or not. The bonds list also looks strange.
I can’t use Moltemplate since our server’s python is the old version, and I can’t upgrade the server’s python by myself.
I searched in the pizza and other scripts in the tools folder, but there was no proper converting script to create a lammps full style data file from a xyz file. (Maybe I missed one?)

So, I’m not sure how can I get the a style data file from a xyz format. I think there should be a way to get a proper full style data by topotools, but I’m not sure how can I deal with molecular ID numbers and bond list.

Thank you very much

Best wishes,

Joon

akohlmey · October 20, 2015, 8:23pm

Dear all,

How can I get the full style lammps data file from xyz format?

1) I tried topotools, but the molecular ID numbers in the converted full
style data file are wrong. All atom's molecular ID numbers are 1. In
addition, I'm not sure whether the bonds information in the vmd-created
full style data files are correct or not. The bonds list also looks
strange.

2) I can't use Moltemplate since our server's python is the old version,
and I can't upgrade the server's python by myself.

3) I searched in the pizza and other scripts in the tools folder, but
there was no proper converting script to create a lammps full style data
file from a xyz file. (Maybe I missed one?)

So, I'm not sure how can I get the a style data file from a xyz format. I
think there should be a way to get a proper full style data by topotools,
but I'm not sure how can I deal with molecular ID numbers and bond list.

the main problem is, that a .xyz file has only information about atom
name/type and coordinates. no information about molecule id, bonds and so
on.
thus as a general rule, lost information is just that: gone. to give an
example, if i give you a black-and-white picture, you cannot (immediately)
tell, what color an item has.

however, if you make certain assumptions, you can use some heuristics to
recover (some) of the information (same as in the black-and-white picture,
you can make assumptions about the colors of skin, the sky, grass etc.). so
you can make a guess about bonding information based on assigning atom
radii to elements and then using distance criteria, infer angles/dihedrals
from the bond topology, re-assign atom types (if needed), infer
bond/angle/dihedral types from the combination of atom types, guess box
dimensions from min/max extent of the atoms and so on.
to get a "full" data file, you have the problem, that charges may be very
difficult to recover, unless they are uniquely defined by the atom type
(which often is not the case).

in short, most likely it is not possible unless you are very, very lucky
and very, very smart in coming up with good heuristics.

axel.

JeJoon_Yeon · October 20, 2015, 8:34pm

Ok. I thought that there might be the way for VMD topotools to “properly” calculate and print out molecular ID numbers and bonds lists, but it seems it is not.

Then, how can I get molecular ID numbers and bonds list? are there any good software for these data?

akohlmey · October 20, 2015, 8:45pm

Ok. I thought that there might be the way for VMD topotools to "properly"
calculate and print out molecular ID numbers and bonds lists, but it seems
it is not.

Then, how can I get molecular ID numbers and bonds list? are there any
good software for these data?

you don't seem to understand the issue. you cannot easily recover
information that you don't have. this problem applies to *any* software. it
is not a problem of the software, but how you use it and what kind of
additional information you can provide about your system that will aid the
software to make better guesses. you need to know details about your system
and assign that information properly in order to recover a topology based
on heuristics. ...and even then things can go wrong.

there are suitable heuristics for a lot of that in VMD (e.g. as i have
already outlined you can use atom radii and distance search to auto-assign
bonds, etc. you can use fragment numbers (incremented by one) as molecule
ids, once you have a bond topology and told VMD to reanalyze the topology
information). there are detailed examples for how to do that in the
topotools tutorials on my homepage. they also show, where it is necessary
to apply corrections and use additional information about a system or the
force field to assign information that VMD doesn't have or where the
heuristics don't work correctly.

that is pretty much as good as it gets.

axel.

Andrew_Jewett · October 20, 2015, 8:50pm

Axel is correct that xyz files contain only coordinate information. (They also contain atom type names, but I don’t think there is a standard atom type naming system that all XYZ files share, so this information is not very helpful.)

For comparison, PDB files are an ancient increasingly obsolete format. However in some limited cases, (biomolecules with standard residue types) there are conventions for atom-type-names and residue-type-names. (Actually, there are multiple conventions, unfortunately.) But, in principle, a program which specialized at converting PDB files into simulation input files could lookup mass and force-field information, and infer bond topology and partial charges from atom-pairwise-distances and these PDB naming conventions.

As for python version, you could always create a local ~/bin directory, install a newer version of python, (such as python3) in that directory. Then you could change your $PATH environment variable to search in ~/bin first. (I suppose this frowned upon from a security standpoint, but you can do it.)
To do that you would put this in your ~/.bashrc file:

export PATH="$HOME/bin:$PATH"

(…assuming you use the BASH shell)

or you could put this in your ~/.tcshrc file:

setenv PATH “$HOME/bin:$PATH”
(…assuming you use the TCSH shell)

Cheers

Andrew

(Alternately, if you don’t want to mess with your PATH variable, you could edit the “moltemplate.sh” file and insert this command:
PYTHON_COMMAND="$HOME/bin/python3"

somewhere after line 30)

JeJoon_Yeon · October 20, 2015, 9:03pm

Thanks to let me know those hints. I will definitely check those information.
I never used PDB format before, but I expect to find something regarding PDB format files.

Thank you.

Best,

Joon

Andrew_Jewett · October 20, 2015, 9:29pm

Sorry, I was not definitely recommending to use PDB files.

(They are a very old file format. They are somewhat unreliable, and have severe size and usage limitations.)

The tool you should use to build your simulation input files depends on the kind of molecule you have.

What kind of molecules did you plan to simulate?

_James_Kress · October 21, 2015, 3:37pm

“For comparison, PDB files are an ancient increasingly obsolete format. “

Really? What is replacing it? I’m sure the people at the Protein Data Bank, Molecular Biologists and Biochemists would like to know so they can move to the new, preferred format.

Jim

James Kress Ph.D., President

The KressWorks^TM Foundation ©

An IRS Approved 501 ©(3) Charitable, Nonprofit Organization

“Engineering The Cure” ^TM

(248) 605-8770

Learn More and Donate At:

Website: http://www.kressworks.org

Facebook: https://www.facebook.com/KressWorks-Foundation-118648221550330/timeline/

Twitter: @KressWorksFnd

Confidentiality Notice | This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message.

akohlmey · October 21, 2015, 6:13pm

“For comparison, PDB files are an ancient increasingly obsolete format. “

Really? What is replacing it? I’m sure the people at the Protein Data
Bank, Molecular Biologists and Biochemists would like to know so they can
move to the new, preferred format.

PDBx/mmCIF has been advertised as the successor format for a very long
time. it has become the official standard format last year and PDB will be
phased out next year.

http://mmcif.wwpdb.org/docs/faqs/pdbx-mmcif-faq-general.html

axel.

_James_Kress · October 22, 2015, 12:47am

Thanks Axel.

Another impediment to getting real work done. Hopefully there will be translation/ conversion software available to “painlessly” allow us to make the transformation without a significant investment in time, money and training.

Will LAMMPS use the new format in its native form or will it require major user intervention using other tools to go from PDBx/mmCIF to LAMMPS and back again?

Jim

James Kress Ph.D., President

The KressWorks^TM Foundation ©

An IRS Approved 501 ©(3) Charitable, Nonprofit Organization

“Engineering The Cure” ^TM

(248) 605-8770

Learn More and Donate At:

Website: http://www.kressworks.org

Facebook: https://www.facebook.com/KressWorks-Foundation-118648221550330/timeline/

Twitter: @KressWorksFnd

Confidentiality Notice | This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message.

akohlmey · October 22, 2015, 1:58am

Thanks Axel.

Another impediment to getting real work done. Hopefully there will be
translation/ conversion software available to “painlessly” allow us to make
the transformation without a significant investment in time, money and
training.

that is not really a concern for the LAMMPS developers i know.

Will LAMMPS use the new format in its native form or will it require major
user intervention using other tools to go from PDBx/mmCIF to LAMMPS and
back again?

LAMMPS doesn't support PDB natively either, so i don't see how this changes
anything w.r.t. LAMMPS.
features that are available in LAMMPS have 3 major origins:
- somebody thinks that something is a cool idea with a lot of "hack value"
and implements it for the fun of doing it.
- somebody collaborates with people that need a particular feature and
thus does the implementation as part of the collaboration
- somebody needs a particular feature personally and thus goes about and
implements it.

mmCIF is an much uglier file format than PDB, so there isn't any hack
value in it; i don't know anybody hacking LAMMPS that is collaborating with
somebody using PDB data or needing that personally, so my guesstimate is
that the probability of having a "painless" LAMMPS vs. mmCIF two-way
converter tool is very close to 0.

axel.

Andrew_Jewett · October 22, 2015, 2:53am

Thanks Axel.

Another impediment to getting real work done. Hopefully there will be
translation/ conversion software available to “painlessly” allow us to make
the transformation without a significant investment in time, money and
training.

that is not really a concern for the LAMMPS developers i know.

Will LAMMPS use the new format in its native form or will it require
major user intervention using other tools to go from PDBx/mmCIF to LAMMPS
and back again?

LAMMPS doesn't support PDB natively either, so i don't see how this
changes anything w.r.t. LAMMPS.
features that are available in LAMMPS have 3 major origins:
- somebody thinks that something is a cool idea with a lot of "hack value"
and implements it for the fun of doing it.
- somebody collaborates with people that need a particular feature and
thus does the implementation as part of the collaboration
- somebody needs a particular feature personally and thus goes about and
implements it.

mmCIF is an much uglier file format than PDB, so there isn't any hack
value in it; i don't know anybody hacking LAMMPS that is collaborating with
somebody using PDB data or needing that personally, so my guesstimate is
that the probability of having a "painless" LAMMPS vs. mmCIF two-way
converter tool is very close to 0.

Agreed. (Unless some knowledgeable LAMMPS user is forced to start doing
all-atom MD on proteins and is too proud to resort to buying an AMBER
license...)

On the topic of file formats, there is a "PDBML" which is based on
XML. Some people like it because those files appear to be somewhat easier
to read and to parse compared to mmCIF format. Although it is not the
official PDB database, www.rcsb.org and other databases allows users
download structures directly in PDBML, and I am under the impression there
is public software to reliably convert PDBx/mmCIF files into PDBML.

Yep. It is truly an impediment to getting work done. I think it's a
substantial fraction way many people in computational chemistry spend their
time. No escape, as Axel would say.

Cheers

Andrew

P.S.
Actually, I explored the idea of adding fully-automated PDB reading
support to moltemplate, (one of the molecule builder tools for LAMMPS).
For a couple weeks I began writing some python scripts to do that. I gave
up eventually because it can be really hard to assign atom charges without
using 3rd-party tools. (I think AmberTools does that by invoking mopac to
perform a quantum calculation.). Also most PDB files lack hydrogen atoms
and may have other issues that need to be fixed before you can attempt to
use them to create a simulation input file. There were just too many messy
details. PDB conversion is a field unto itself beyond what I want to know.

I remember the AmberTools conversion tools are reasonable to use, and
there are a lot of tutorials how to use them. Gromacs is open-source.
Lately I've been curious about the more modern OpenMM software and the
tools it comes with to deal with PDB files. I think it is completely
open-source, and I think it supports a variety of force-fields. In
summary, there are other simulation software packages that are more
convenient for simulating all-atom biomolecules, compared to LAMMPS.
(Although I prefer LAMMPS.)

axel.

_James_Kress · October 22, 2015, 3:46pm

“is too proud to resort to buying an AMBER license.”

Or refuses to spend precious research funding on a non-reactive MD implementation that doesn’t simulate all aspects of the processes going on in a biological/ biochemical system.

And yes, I was (and probably will still) use LAMMPS for bio systems simulation.

Jim

James Kress Ph.D., President

“Engineering The Cure” ^TM

(248) 605-8770

Learn More and Donate At:

Website: http://www.kressworks.org

Facebook: https://www.facebook.com/KressWorks-Foundation-118648221550330/timeline/

Twitter: @KressWorksFnd

Confidentiality Notice | This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message.

Carlos_Campana · October 22, 2015, 4:19pm

“is too proud to resort to buying an AMBER license.”

Or refuses to spend precious research funding on a non-reactive MD
implementation that doesn’t simulate all aspects of the processes going on
in a biological/ biochemical system.

Disclaimer: My next few lines do not aim to start and argument by any
means.

I am yet to see a "reactive" classical force field that can endure a
bearable degree of parameter transferability. Change a bit the simulation
conditions or toss in some new species in the mix and chances are one will
have to completely reparametrize the force field. They all rely too heavily
on empirical expressions. Nothing wrong with this but I just cannot see the
current models delivering enough to satisfy your tall order from above and
quote "...simulate all aspects of the processes going on in a biological/
biochemical system".
Reactivity is not even needed to study the core of certain biological
problems, i.e. protein folding, to a reasonable degree of accuracy. Of
course, one's research $ is to be deployed at will according to individual
goals.
Carlos

_James_Kress · October 22, 2015, 9:25pm

“I am yet to see a “reactive” classical force field that can endure a bearable degree of parameter transferability. Change a bit the simulation conditions or toss in some new species in the mix and chances are one will have to completely reparametrize the force field.”

The same can be said for classical force fields. If it was easy, it would already be done.

“Reactivity is not even needed to study the core of certain biological problems, i.e. protein folding, to a reasonable degree of accuracy.”

Then you are not doing CHEMISTRY. You are doing classical mechanics. Have you ever looked at the energy density of states surrounding the HOMO and LUMO associated even with simple proteins? The ability to move from one to another involves small amounts of energy. At that point, conical intersections come into play, bonds break, reform and the molecular structure changes (i.e. folding, allostery). None of these phenomena are manifested in classical mechanics but are critical in Chemistry – and thus in the processes that occur in the cell, including folding, docking, etc… Reactive MD at least gives you the opportunity to take that into account.

“Nothing wrong with this but I just cannot see the current models delivering enough to satisfy your tall order from above and quote “…simulate all aspects of the processes going on in a biological/ biochemical system”.”

We will be publishing soon, after our patents clear.

Jim

James Kress Ph.D., President

“Engineering The Cure” ^TM

(248) 605-8770

Learn More and Donate At:

Website: http://www.kressworks.org

Facebook: https://www.facebook.com/KressWorks-Foundation-118648221550330/timeline/

Twitter: @KressWorksFnd

Confidentiality Notice | This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message.

Carlos_Campana · October 22, 2015, 9:56pm

“I am yet to see a "reactive" classical force field that can endure a
bearable degree of parameter transferability. Change a bit the simulation
conditions or toss in some new species in the mix and chances are one will
have to completely reparametrize the force field.”

The same can be said for classical force fields. If it was easy, it would
already be done.

True but my comment only intended to highlight that fact. Its been done
actually, its called quantum mechanics (give or take a few
approximations). The hurdle is that our computing tools cannot handle the
quantum numerics in real time for most of the systems. Thus, empirical
potentials come up, ad-hoc, with a bunch of fitting parameters, and
therefore limited in their application range.

“Reactivity is not even needed to study the core of certain biological
problems, i.e. protein folding, to a reasonable degree of accuracy.”

Then you are not doing CHEMISTRY. You are doing classical mechanics.
Have you ever looked at the energy density of states surrounding the HOMO
and LUMO associated even with simple proteins? The ability to move from
one to another involves small amounts of energy. At that point, conical
intersections come into play, bonds break, reform and the molecular
structure changes (i.e. folding, allostery). None of these phenomena are
manifested in classical mechanics but are critical in Chemistry – and thus
in the processes that occur in the cell, including folding, docking, etc..
Reactive MD at least gives you the opportunity to take that into account.

Not sure what are you trying to teach me here. All I said is that there are
problems that in order to tackle them one does not require to consider the
reactive nature of matter. If all those problems lack importance that is
your personal judgement not mine.

“Nothing wrong with this but I just cannot see the current models
delivering enough to satisfy your tall order from above and quote
"...simulate all aspects of the processes going on in a biological/
biochemical system".”

We will be publishing soon, after our patents clear.

Only the passage of time can tell when a model/theory/invention has what it
takes to be considered a breakthrough. I look forward to reading your
contributions a few years down the road once they've been given the
opportunity to show what they can deliver.
Best,
Carlos