[lammps-users] dump to xyz file - atom symbols

dump xyz writes a file with number of type instead of symbol in the
first column. Usually xyz files have atom symbols as names:
http://openbabel.org/wiki/XYZ_(format)
and some visualization programs rely on it.

I wrote a code for myself that dumps a file with symbol names. Since
lammps doesn't know names, I use masses to guess symbols. Obviously it
won't work in all cases, but it can be very useful, because the .xyz
file doesn't need to be converted to .xyz with symbols and it shows if
the masses were correctly assigned to atoms (it's easy to mismatch
data file and lammps script and have e.g. 1=Si 2=C in one and 1=C,
2=Si in the other).

If you think it can be useful, I'll try to add it as a feature of dump command.

Marcin

marcin,

dump xyz writes a file with number of type instead of symbol in the
first column. Usually xyz files have atom symbols as names:
http://openbabel.org/wiki/XYZ_(format)
and some visualization programs rely on it.

in general, i would strongly recommend to not use the .xyz file format
in the first place if it can be avoided. you lose too much information
and it is non-standard (there are many variations) and text files are
generally slow to read.

I wrote a code for myself that dumps a file with symbol names. Since
lammps doesn't know names, I use masses to guess symbols. Obviously it

that would only work, if you "atoms" are proper atoms and have the
proper masses, but that is also difficult. there are slightly
different conventions on what is the proper mass for a hydrogen,
carbon etc. and how do you handle united atoms or other systems.
also some people would prefer to have some arbitrary text label.

for (bio)molecular systems, i usually prepare a .psf file with
the information that i want to assign to atoms (and i can
have atom name and type) and then use the .dcd + .psf combination
in analysis and visualization.

won't work in all cases, but it can be very useful, because the .xyz
file doesn't need to be converted to .xyz with symbols and it shows if
the masses were correctly assigned to atoms (it's easy to mismatch
data file and lammps script and have e.g. 1=Si 2=C in one and 1=C,
2=Si in the other).

this is more a question of good data management. i.e. if you have
one directory per job, and use proper names and perhaps a README
file per directory listing the actions that you did, you wouldn't
easily mix up calculations. i routinely use binary trajectory files
where there is no indication of what is written where...

If you think it can be useful, I'll try to add it as a feature of dump command.

well, it is not up to me to decide what goes into the code or not.

cheers,
   axel.

in general, i would strongly recommend to not use the .xyz file format
in the first place if it can be avoided. you lose too much information

Well, I'm not encouraging anyone to use it.

I wrote a code for myself that dumps a file with symbol names. Since
lammps doesn't know names, I use masses to guess symbols. Obviously it

that would only work, if you "atoms" are proper atoms and have the

Yes. Although you could also add things like CH4 to the atom-mass table.

proper masses, but that is also difficult. there are slightly
different conventions on what is the proper mass for a hydrogen,

In my code I assign a symbol that has the closest mass to the given
mass, so slightly different mass is not a problem.

carbon etc. and how do you handle united atoms or other systems.
also some people would prefer to have some arbitrary text label.

There are visualization programs that can understand atom symbols, and
that's why I like to have symbols in the file. If you use arbitrary
text labels, I suppose they won't be of any help to a visualization
program anyway.

for (bio)molecular systems, i usually prepare a .psf file with
the information that i want to assign to atoms (and i can
have atom name and type) and then use the .dcd + .psf combination
in analysis and visualization.

OK, no doubt .xyz is not the best format for everything.

If you think it can be useful, I'll try to add it as a feature of dump command.

well, it is not up to me to decide what goes into the code or not.

I'm glad that you wrote what you think about my proposal although I'm
not sure what are you conclusions.
xyz is not a perfect format, and the symbol guessing, as I wrote,
won't work in all cases.
Perhaps I didn't write it explicitely, but since it sometimes wouldn't
work, I meant it as an optional feature, perhaps triggered by
additional argument to dump command.

Marcin

Yes. Although you could also add things like CH4 to the atom-mass table.

and then confuse it with oxygen???
or CH2 with nitrogen?

> proper masses, but that is also difficult. there are slightly
> different conventions on what is the proper mass for a hydrogen,

In my code I assign a symbol that has the closest mass to the given
mass, so slightly different mass is not a problem.

> carbon etc. and how do you handle united atoms or other systems.
> also some people would prefer to have some arbitrary text label.

There are visualization programs that can understand atom symbols, and
that's why I like to have symbols in the file. If you use arbitrary
text labels, I suppose they won't be of any help to a visualization
program anyway.

au contraire, there are a lot of reasons to differentiate
between different atoms of the same element. e.g. atoms in
different regions or in different compounds.

> for (bio)molecular systems, i usually prepare a .psf file with
> the information that i want to assign to atoms (and i can
> have atom name and type) and then use the .dcd + .psf combination
> in analysis and visualization.

OK, no doubt .xyz is not the best format for everything.

exactly. it is more like the lowest common denominator.

>> If you think it can be useful, I'll try to add it as a feature of dump command.
>
> well, it is not up to me to decide what goes into the code or not.

I'm glad that you wrote what you think about my proposal although I'm
not sure what are you conclusions.

my conclusion is that rather than trying to improve output in
a file format that has many known deficiencies, i'd rather try
make the programs that don't read the native lammps format better.

e.g., i just had an off-list discussion and received some code fragments
that will in combination with some additional changes help making
the native text mode LAMMPS support in VMD much better. this way
more information is preserved and passed on.

xyz is not a perfect format, and the symbol guessing, as I wrote,
won't work in all cases.
Perhaps I didn't write it explicitely, but since it sometimes wouldn't
work, I meant it as an optional feature, perhaps triggered by
additional argument to dump command.

that is what i understood. i just wanted to point out that if you
add a change like this, it should be as generic as possible, and
i know from my work on the VMD plugins, that mass based guessing
is a matter of last resort and fails quite often.

cheers,
   axel.

Marcin,

While I know the benefit of having an output file that has
symbol names instead of numerical names (I've had problems with
VMD truncating my numerical identifiers and then refusing to
accept 11, 12, and 13 as distinct types from 1 for default
coloring purposes), I'm unclear why you want to do it as a flag
specific to xyz format instead of as part of the post
processing. Maybe I'm biased because I do a lot of
postprocessing anyway so the extra overhead of translating my
numerical names to characters for visualization is negligible.

If you want to include the option for the convenience of
others, I think you have to do something slightly more
complicated than a mass conversion. I often have multiple
kinds of atoms that have the same mass (for example, I have a
current simulation that has eight kinds of carbon, four kinds
of oxygen, and three kinds of nitrogen) and I need each one to
be recognized as different for visualization purposes. VMD is
fine with using different characters for that, but merely
mapping 12.011 to C would not help me.

Another consideration is that I often need to take the
configuration from a dump file, manipulate it, and then stick
it back into a data file to be read again. Sure, I can write a
script to back translate from character to integer or if you do
modify the dump to have the character option, you could modify
the read data subroutine to include a read atom type as
character option.

However, I think, and like Axel I have no control over what
goes into LAMMPS, that you seem to want to increase complexity
for a minor gain.

Joanne Budzien

There are visualization programs that can understand atom symbols, and
that's why I like to have symbols in the file. If you use arbitrary
text labels, I suppose they won't be of any help to a visualization
program anyway.

au contraire, there are a lot of reasons to differentiate
between different atoms of the same element. e.g. atoms in
different regions or in different compounds.

I hate to write it again and again: obviously changing numbers to
symbols is not a good idea in every case.
Giving a case where it is not useful doesn't add to the discussion.
In a case when you want xyz file with numbers as atom types, you
simply do what you did before and you get such a file.

> for (bio)molecular systems, i usually prepare a .psf file with
> the information that i want to assign to atoms (and i can
> have atom name and type) and then use the .dcd + .psf combination
> in analysis and visualization.

OK, no doubt .xyz is not the best format for everything.

exactly. it is more like the lowest common denominator.

I was trying to say that there is no need to write about it, because
nobody disagrees.

>> If you think it can be useful, I'll try to add it as a feature of dump command.
>
> well, it is not up to me to decide what goes into the code or not.

I'm glad that you wrote what you think about my proposal although I'm
not sure what are you conclusions.

my conclusion is that rather than trying to improve output in
a file format that has many known deficiencies, i'd rather try
make the programs that don't read the native lammps format better.

That's good that are trying to improve other program, good luck.

e.g., i just had an off-list discussion and received some code fragments
that will in combination with some additional changes help making
the native text mode LAMMPS support in VMD much better. this way
more information is preserved and passed on.

cool

xyz is not a perfect format, and the symbol guessing, as I wrote,
won't work in all cases.
Perhaps I didn't write it explicitely, but since it sometimes wouldn't
work, I meant it as an optional feature, perhaps triggered by
additional argument to dump command.

that is what i understood. i just wanted to point out that if you
add a change like this, it should be as generic as possible, and
i know from my work on the VMD plugins, that mass based guessing
is a matter of last resort and fails quite often.

It may depend on what you are simulating. I don't think this will
often fail when simulating ceramics, at least it works fine for me. I
can see this is not an elegant approach, but I don't see a better one.

Marcin

Marcin,

While I know the benefit of having an output file that has
symbol names instead of numerical names (I've had problems with
VMD truncating my numerical identifiers and then refusing to
accept 11, 12, and 13 as distinct types from 1 for default
coloring purposes), I'm unclear why you want to do it as a flag
specific to xyz format instead of as part of the post
processing.

IMHO writing it directly would be a bit simpler, but I also have
scripts that can do it as postprocessing.

If you want to include the option for the convenience of
others, I think you have to do something slightly more
complicated than a mass conversion. I often have multiple
kinds of atoms that have the same mass (for example, I have a
current simulation that has eight kinds of carbon, four kinds
of oxygen, and three kinds of nitrogen) and I need each one to
be recognized as different for visualization purposes. VMD is
fine with using different characters for that, but merely
mapping 12.011 to C would not help me.

I didn't do it in the code I have, but it should be easy to check if
atom types are duplicated and add numbers in such a case, e.g. C1, C2,
...
BTW, I'm using AtomEye for visualization, it's a very fast program
that can handle milions of atoms and if it can recognize atom symbol,
it has a reasonable default settings for atom types I use (size, color
and distance in which two atoms must be to draw a bond between them)

Another consideration is that I often need to take the
configuration from a dump file, manipulate it, and then stick
it back into a data file to be read again.

I'm also doing it sometimes, but I suppose you are not using xyz
format for this.

Sure, I can write a
script to back translate from character to integer or if you do
modify the dump to have the character option, you could modify
the read data subroutine to include a read atom type as
character option.

I'd like such a modification, but this would be more complicated and
I'm not sure if LAMMPS developers would like it.
I'm only an ordinary user who is trying to submit a small improvement.

However, I think, and like Axel I have no control over what
goes into LAMMPS, that you seem to want to increase complexity
for a minor gain.

yes, it's a minor increase in complexity for a minor gain. IMO the
gain is worth it, but YMMV.

Cheers,
Marcin

I have no substantive input on this thread, other than 2 small comments:

1) Someone sent me some dump code to output CFG files (for AtomEye)
which I think does do symbols. I won't be able to look at it for a couple
weeks, but I think it does the symbols via the dump_modify command.

2) Rather than have LAMMPS output N formats, I think a better strategy
is for it to output a few common formats, and have people create good
post-processing tools that will convert to whatever people want. I'm happy
to distribute those tools with LAMMPS if people contribute them.

Steve