I used these commands to find the radial distribution function.

“compute myRDF all rdf 100”

```
"fix 1 all ave/time 100 1 100 c_myRDF file tmp.rdf mode vector "
```

The output file has 4 columns. The first one is number of bins, the second is bin 's coordinate, third and forth columns is g® and coordination number, respectively.

For plotting RDF, i simply need to plot the 2nd and 3rd column. But, the maximum value for g® are pretty high e.g. It 's 1300.

Do i need to scale the g® by density of system (number of atoms / volume) ?

Also, based on LAMMPS document “A coordination number coord® is the sum of g® values for all bins up to and including the current bin”

But in my output, the summation of 3rd column is not equal to the last value in the 4th column (coordination number for last bin).

Is there something wrong with the way which i try to interpret the output data for RDF ?

I used these commands to find the radial distribution function.

"compute myRDF all rdf 100"

"fix 1 all ave/time 100 1 100 c_myRDF file tmp.rdf mode vector "

The output file has 4 columns. The first one is number of bins, the second

is bin 's coordinate, third and forth columns is g(r) and coordination

number, respectively.

For plotting RDF, i simply need to plot the 2nd and 3rd column. But, the

maximum value for g(r) are pretty high e.g. It 's 1300.

so? have you looked at what the g(r) represents? it is not impossible

for it to be very large under certain circumstances. so it is not the

absolute value that you need to look at, but whether the value is

consistent with its physical interpretation.

Do i need to scale the g(r) by density of system (number of atoms / volume)

?

see my previous comment.

Also, based on LAMMPS document "A coordination number coord(r) is the sum

of g(r) values for all bins up to and including the current bin"

But in my output, the summation of 3rd column is not equal to the last

value in the 4th column (coordination number for last bin).

Is there something wrong with the way which i try to interpret the output

data for RDF ?

are you by any chance doing variable cell simulations?

axel.

As Axel said, you should know the physical interpretation (and definition) of g® when computing it, if you want to make sense of outputs. Knowing the def answers nearly all of your questions. However, there are two reasons I can think of for large peak values in g®. First is a perfect storm of small system size, too small of bin size, and you are not sampling long enough; in this case your large peaks are simply fluctuations(this should be easy to tell if you know the interpretation of g®). Second, you have a relatively dilute system that is condensing. In any case, its not likely that LAMMPS is computing your RDF output incorrectly.

1. Coordination number: The definition of coordination number in the

documentation is slightly off. Instead of:

"A coordination number coord(r) is also calculated, which is the sum

of g(r) values for all bins up to and including the current bin."

it should read:

"A coordination number coord(r) is also calculated, which is the

number of atoms of type jtypeN within the current bin or closer,

averaged over atoms of type itypeN. This is calculated as the volume

or area sum of g(r) values over all bins up to and including the

current bin, scaled by the average volume density of atoms of type

jtypeN."

2. Large rdf value: As Eric suggested, an arbitrarily large values of

the rdf can be produced by an ordered cluster in a box that is mostly

empty. In such cases, the LAMMPS calculation of rdf is not going to

be realistic, but the coordination number is still good. Here is a

minimal script that demonstrates the problem for a single pair of

atoms in a big box. Note that the coordination number is still spot

on.

[[email protected] src]$ more in.pair

# RDF for a single pair of atoms

units lj

atom_style atomic

atom_modify sort 0 0

region box block 0 1.0e6 0 1.0e6 0 1.0e6

create_box 1 box

create_atoms 1 single 0 0 0

create_atoms 1 single 0.995 0 0

mass * 1.0

pair_style lj/cut 1.0

pair_coeff * * 0.0 0.0 1.0

neighbor 0.0 nsq

compute myRDF all rdf 100

fix avrdf all ave/time 100 1 100 c_myRDF file rdf.dat mode vector

run 0 pre no post no

[[email protected] src]$ lmp_mac_mpi -in in.pair -echo none

LAMMPS (12 Aug 2013)

Created orthogonal box = (0 0 0) to (1e+06 1e+06 1e+06)

1 by 1 by 1 MPI processor grid

Created 1 atoms

Created 1 atoms

Setting up run ...

Memory usage per processor = 2.33475 Mbytes

Step Temp E_pair E_mol TotEng Press

0 0 0 0 0 0

Loop time of 7.86781e-06 on 1 procs for 0 steps with 2 atoms

[[email protected] src]$ tail -2 rdf.dat

99 0.985 0 0

100 0.995 4.01893e+18 1

Thanks to all.

I am going to compare my g® with experiments to find the crystallographic structure (BCT, HX, or WZ). The maximum value in the g® (experiments) is “11” for WZ structure but mine is 1300 (LAMMPS).

This is the reason which i can not interpret the g® result from LAMMPS.

Also, I believe LAMMPS exactly calculates the radial distribution function which i want :

g®=(number of atoms in nth bin / volume of nth bin) / (system density)

In my model the simulation box is scaled to model tensile loading and i have periodic boundary condition only in the length of NW.

So, you are right Axel, i am doing variable cell simulation. Does it cause any problem in calculating g® ?

Eric and Aidan:

Yes, i used a very large box to introduce the surface effect in the lower dimensions of NW. So, this maybe the reason for high peaks in g®.

For the bin size, i do not think so. Since I used different bin sizes to understand the sensitivity of my results and in all cases i 've got high peaks.

Thanks to all.

I am going to compare my g(r) with experiments to find the crystallographic

structure (BCT, HX, or WZ). The maximum value in the g(r) (experiments) is

"11" for WZ structure but mine is 1300 (LAMMPS).

This is the reason which i can not interpret the g(r) result from LAMMPS.

Also, I believe LAMMPS exactly calculates the radial distribution function

which i want :

g(r)=(number of atoms in nth bin / volume of nth bin) / (system density)

In my model the simulation box is scaled to model tensile loading and i have

periodic boundary condition only in the length of NW.

So, you are right Axel, i am doing variable cell simulation. Does it cause

any problem in calculating g(r) ?

it can lead to inconsistencies when doing the integration, as your

particle density is changing during the simulation. now the

coordination number can still be computed exactly, if you keep the

original histogram data around. the g(r) plugin in VMD does that for

example.

Eric and Aidan:

Yes, i used a very large box to introduce the surface effect in the lower

dimensions of NW. So, this maybe the reason for high peaks in g(r).

For the bin size, i do not think so. Since I used different bin sizes to

understand the sensitivity of my results and in all cases i 've got high

peaks.

the g(r) normalization is only meaningful for a (homogeneous) bulk

system. if you have a system with surfaces (and vacuum), this will

mess up the volume for computing the particle density and thus make

the absolute value of the g(r) meaningless. the peak positions are

still correct and the relative heights. well, almost, there is a tiny

system size dependent error that results from using a finite size

system due to the excluded volume of the reference particle.

axel.