I have some reasonably large dump files (15 million atoms: ascii dump file=.5GB)
I want to compute the displacement of each atom since time 0. I usually use pizza.py: sort the data and diff the 2 position vectors (modulo PBCs). For the present case, the dump file is large enough that pizza’s dump.py exhausts memory on loading. I tried shaving the header off the dump files and giving them to GNU sort, but that exhausts the memory too.
I was about to implement a mergesort but was wondering if anyone has a better solution. (Sorry, slightly off-topic).