Kubernetes with Kubeflow vs. Slurm: Scalability
Which one provides better scalability. Say I want to implement a 15k+ BM nodes HPC cluster... what would you go for?
Which one provides better scalability. Say I want to implement a 15k+ BM nodes HPC cluster... what would you go for?
Today I learned: SUNK, which stands for SlUrm oN Kubernetes
https://www.cncf.io/blog/2024/07/08/slurm-an-hpc-workload-ma...
Depends on the actual workflow and on the actual users. As a former manager of large HPC systems, I’d never ever burden myself with the Big K, as Slurm did a fine job for my workload and users.
D'accord with that.
Slurm does the job well.
My users liked the Kubernetes API to throw their workloads against the clusters, but they did not need/use many on the Kubeflow features.
But again, depends on your workflow needs.
slurm