Inquiry about setting up a server for DFT calculation in the lab

gbkim · September 7, 2024, 9:37am

Hello, everyone. I’m a new graduate student conducting DFT calculations in our lab.

I’m currently looking into setting up a server in the lab, but I’m not sure whether to go with a CPU server, a GPU server, or a CPU+GPU server for DFT calculations. So, I wanted to ask for advice on this matter.

I plan to use the VASP software for DFT calculations. From what I’ve learned so far, VASP has been optimized for CPU servers over a long period of time, and I’ve also found out that in the latest versions, it can now run on GPU servers, enabling parallelized calculations.

In our lab, we plan to perform a range of calculations, from simple structure optimizations and electronic structure calculations to more resource-intensive ones like NEB(Nudged elastic band) calculations and AIMD(Ab initio molecular dynamics).

I also don’t fully understand the concept of parallelization in calculations, so if someone could kindly explain that as well, I’d greatly appreciate it.

Looking forward to your responses.

srtee · September 7, 2024, 10:15pm

You should check if your university has a high performance computational cluster already ready for calculations.

Doing this by yourself is a lot of thankless work that might be duplicating resources you can already get elsewhere.

gbkim · September 9, 2024, 7:39am

Thank you for your response and advice regarding the university’s high-performance computational cluster.

While I understand that many labs in our department already make use of the HPC cluster, we have decided to set up our own server specifically for our research group.

The primary reasons for this decision are that our research focus differs from other groups, requiring specific configurations, and having dedicated access would significantly streamline our workflow.

We are now deciding between a CPU-based server or a CPU+GPU hybrid server. Our calculations will include optimizing supercell structures with more than 100 atoms, electronic structure calculations such as DOS, as well as more computationally intensive tasks like NEB (Nudged Elastic Band) calculations and AIMD (Ab initio Molecular Dynamics) simulations.

Given that VASP has been traditionally optimized for CPU servers but can now take advantage of GPU servers, we are curious whether investing in a CPU+GPU hybrid setup would offer any significant benefits for these types of calculations.

If you has experience with similar setups, I would greatly appreciate any advice or suggestions on which setup would provide the most efficiency for our needs.

Thank you!

srtee · September 9, 2024, 8:36am

I do not have specific experience with procuring or maintaining a HPC setup.

I will say that, in my prior experience, if your group has funding for computer procurement, you can often usefully involve your university cluster by giving them the money to add nodes to their existing setup, and setting up a queue with preferential (or sole) access for your group members.

This saves you significant hassle:

The compute is not affected by your office space constraints (power usage, heat generation, occasional power shutoffs for building maintenance or upgrades)
You have dedicated maintenance via your university cluster maintainers, especially for networking (you can simply SSH into the uni HPC login node). By contrast, setting up your own cluster involves either (1) only local networking – so no remote login from home; (2) remote login enabled – in which case your server must be secured against hackers and runs the risk of being not just abused for crypto mining / other illicit activities, but as a gateway into your university’s intranet.
You do not need to maintain credential setup for new members and credential deletion for leaving members (necessary for security – see the above point)
You do not need to worry about networked filesystems and access to your local / regional academic data storage archives (which in turn have high-speed Globus endpoints for large data transfers to international collaborators)
You do not need to worry (as much) about data storage redundancy against occasional disk or machine failures. You do still need robust backups, but you would have needed them anyway.
You do not need to worry about shipping, installation, and decommissioning.
Your group members will learn highly transferable skills of job scheduling and scaling measurement, not just for future academic use, but as a general data science skill (partially applicable to, for example, budgeting compute requirements on commercial clouds like AWS or Microsoft Azure).

Against all these, I’m not sure I see the advantage of maintaining a local server – unless you are working with a very small budget, in which case you could … also just buy time on your university or national cluster.

Germain · September 9, 2024, 8:50am

For having inquired about that kind of stuff in my current research group, I can only back-up the arguments of @srtee. Maintaining a computing cluster requires dedicated resources both on the human side and on the material side. There are often specific university teams or national infrastructure that can leverage most of the hassle of investing and maintaining your own group computing cluster. Yes, you need to justify for computing hours and get into queues of users. But overall, this is generally better in term of management and resources usage. That said, “You do you” as the saying goes.

But if you want to know the best hardware regarding the current development of VASP, the best place to ask is… the VASP developpers communication channel. If you have a license they are the best person to tell you which hardware is the best to invest in depending on your goals and resources.

Then you have to look into dedicated markets like DELL or HPE and… well good luck, depending on where you are and the support team around you. It took me a year to get a rack of 6 GPUs bought last time.

srtee · September 9, 2024, 8:56am

I am waiting eagerly for that NVIDIA stock crash. Any day now …

While we wait, there’s even the AWS Research Cloud Program: AWS RCP: Research Cloud Program - Amazon Web Services (AWS) grant budget to compute time from the comfort of your cushy office chair!

gbkim · September 9, 2024, 9:47am

Thank you!

gbkim · September 9, 2024, 9:48am

Thank you!!