Single Elastic cluster for multiple NOMAD instances

Hi

I am planning to set up multiple NOMAD Oasis instances and considering using a centralized Elasticsearch cluster instead of running separate Elasticsearch instances on each machine. The goal is to reduce resource usage and maintain a unified search index across all NOMAD installations.

Has anyone successfully configured multiple NOMAD instances in such way? If so:

  • What are the key challenges (e.g., data synchronization, latency, security)?
  • How should NOMAD’s configuration (nomad.yaml) be adjusted for remote Elasticsearch?
  • Are there any known limitations or best practices for this approach?

Any insights or experiences would be greatly appreciated

Hi @k0tovski!

We are getting into a very detailed technical territory here, but let me try to give some advice on this.

We have several “flavours” of our central NOMAD installation that work in this way: they share the Elasticsearch and MongoDB server. In our case, they are actually not just sharing the node, but also the data on that node. This allows us to have several deployments (test, staging, production) with shared resources and data, but e.g. slightly different GUI and features.

Setting this kind of a system is not dramatically more complicated:

  1. Setting up connection from the Oasis to ES:
    • In nomad.yaml you can give specify elastic.host and elastic.port to specify where the server lives
    • In nomad.yaml you can specify which ES index to use (elastic.entries_index, elastic.materials_index). This way you can either use the same ES index for all of the instances (shared node, shared data), or then use a separate one for each (shared node, separate data).
    • Depending on your network setup, you can also setup authentication and then provide credentials through elastic.username, elastic.password.
  2. Separating ES service from docker-compose.yaml:
    • You need to take out the Elasticsearch service from the docker-compose.yaml or each Oasis, and run it as a separate instance on the same node or on some other node.
    • The node serving ES needs to be accessible by the node running NOMAD app/worker. This is typically easy if these services run under the same node or inside the same kubernetes cluster, but will get slightly trickier with other setups. Setting up the communication is very specific to your setup (docker-compose.yaml vs. kubernetes, single-node vs. multi-node, etc.) and I can’t give generic advice here without knowing more.
    • Security wise you need to ensure that the ES node is not directly accessible from external addresses.

Hopefully this is of some help!

Thank you for your detailed response!

Your explanation about using the elastic.host, elastic.port, and separate index configurations in nomad.yaml is particularly helpful. Also I got the confirmation that this approach works in production environments (if I got this right).

I’m curious about the flexibility of sharing configurations: Can a single NOMAD instance be configured to work with both private and shared indices simultaneously on the same ES node? Or would researchers need separate NOMAD instances - one for private work and another for collaborative activities? I’m particularly interested in understanding if there’s a way to selectively share certain datasets while keeping others private, all within a single NOMAD installation.

Thanks again for sharing your expertise on this setup!