Ovito RDF computational time

Hello all,

I am computing partial RDF for pure water with both Ovito application and Python libraries. I am using the following code, number of bins 1024 and cutoff 8.0 A

# Insert the RDF calculation modifier into the pipeline:
pipeline.modifiers.append(CoordinationAnalysisModifier(cutoff = 8.0, number_of_bins = 1024, partial = True))

# Insert the time-averaging modifier into the pipeline, which accumulates
# the instantaneous DataTable produced by the previous modifier and computes a mean histogram.
pipeline.modifiers.append(TimeAveragingModifier(operate_on='table:coordination-rdf'))

# Data export method 1: Convert to NumPy array and write data to a text file:
total_rdf = pipeline.compute().tables['coordination-rdf[average]'].xy()
np.savetxt(f"{model_path}/{run_name}/rdf.txt", total_rdf)
np.save(f"{model_path}/{run_name}/rdf.npy", total_rdf)

I found that the time to compute this RDF is a lot higher using python libraries. For a trajectory of 100 steps it takes 2 and half minutes while using ovito application is instanteneous. To compute time averages I find faster to export every single ovito frame compute in the app and than perform the time average import those files as numpy arrays instead of using python libraries.

Is this normal?

Thanks

Sorry, I think this should be moved to the Ovito subforum.

The Python script should not be slower than the GUI. I have just tested it myself with OVITO 3.11.0: An external Python script using the OVITO Python Module performs time averaging a little faster than the OVITO Pro GUI (27 vs. 31 seconds) - as expected. This is for a trajectory of 320 atoms with ~22,000 frames.

Can you please add the following to the top of your script:

import ovito
ovito.enable_logging()

You should then see log messages in the terminal indicating what OVITO is doing. Please check how many trajectory frames are being loaded and how often the coordination analysis is being performed.

This is the script I’ve used for the timing:

from ovito.io import *
from ovito.modifiers import *
from ovito.pipeline import *
import time

start = time.time()
pipeline = import_file('LiPSI_long.xyz', multiple_frames = True)
pipeline.modifiers.append(CoordinationAnalysisModifier(cutoff = 8.0, number_of_bins = 1024, partial = True))
pipeline.modifiers.append(TimeAveragingModifier(operate_on='table:coordination-rdf'))
pipeline.compute()
end = time.time()
print("Time spent: %f sec" % (end-start))

On a trajectory of 40000 frames with 125 water molecules per frame it takes ~10 minutes to compute the following script (partial RDF for every frame, not time average and than saved into a numpy array)

on this machine, the DCGP partition (CPU using only 1 node with 32 cores)

print("Number of MD frames:", pipeline.num_frames)

partial_rdf_list = []

# Insert the RDF calculation modifier into the pipeline:

pipeline.modifiers.append(CoordinationAnalysisModifier(cutoff = 8.0, number_of_bins = 1024, partial = True))

for frame in range(pipeline.num_frames):

    partial_rdf = pipeline.compute(frame).tables['coordination-rdf'].xy()

    partial_rdf_list.append(partial_rdf)
    #np.save(f"{model_path}/{run_name}/rdf_no_time.npy", total_rdf)

partial_rdf_array = np.array(partial_rdf_list)

I also tried your logging routine on my local machine (Intel core i5 1240P) and it seems that it loads in parallel a lot of configurations but it only uses one core of the cpu.

One thing that I can add is that when I open the Ovito GUI and I scroll the frame bar in the Coordination analysis modifier it seems really smooth at least at visualizing partial RDFs for every frame.
Instead if I try to export to text files every frame partial RDFs it takes really a lot of time, maybe that’s a problem in file writing, I don’t know.

Could you please let me know which versions of the OVITO GUI and the OVITO Python module you are using for these benchmarks? In both the GUI and the Python module optimizations have been made in recent OVITO versions, that’s why it matters.

What can also matter is the type of input file you are working with. In particular, in case it is a compressed trajectory file. Could you please provide some details – or even better, share the actual file so I can reproduce your tests and investigate possible bottlenecks? Thank you.

This observation raises questions. Current versions of OVITO load and process trajectories always sequentially, one frame at a time. So there should be no parallel loading of frames. The RDF computation itself is parallelized, i.e., OVITO uses all available CPU cores for computing the RDF for a single atomic configuration (parallelization over atoms). However, as your system is quite small, this will probably not have a very positive effect. The overhead eats up the advantages that result from multi-threading and the other CPU cores will hardly be active.

In your case you could get a performance benefit from using the Python multiprocessing module as explained in this section of the OVITO Python manual. Each CPU core can then compute RDFs for different frames of the trajectory in parallel and communicate the results back to the main process (in the form of NumPy arrays), which finally performs the averaging.

This is the link to my trajectory, a zip compressed extxyz file (it will expire in 7 days). I directly load in Ovito the uncompressed version (2.4 GB more or less).

Maybe I am wrong and is not reading in parallel all the frames but just sequencially but for sure only one core is used for the computation.

I am using Arch Linux (updated daily) and the latest Ovito’s version (3.11) both for Python libraries and for GUI program.

Thanks for the multiprocessing tip. At first if you can, try on your machine my same script and trajectory and let me know how long it takes without multiprocessing.

Thanks for providing your trajectory file.

I ran my Python script above on your file, which has 40,000 frames. It takes 4:08 minutes on my MacBook Pro to compute the time average of the partial RDFs. If I do the same using the OVITO Pro GUI, it finishes after 4:30 minutes. This behavior appears reasonable. The Python program processes about 161 trajectory frames per second, which is within my expectation given that the system is so small that the RDF calculation runs on almost one CPU core only.

In both cases the processing happens sequentially, i.e, frame by frame. You may be able to get more performance with the multiprocessing approach, as I suggested above. However, I have not yet tested this myself.

Thank you very much. I don’t think multiprocessing is needed, there is something strange in my system: it can’t take hours on my i5 1240P if on your macbook pro it it takes 4 minute. I’ll test on a different operating system and let you know.

If it’s not too much of an hassle for you, can you test my script and/or try to export every partial rdf frame in a text file with the Ovito Gui program?