Clusters are an enterprise offering. Capacity is reserved per contract — there is no self-service launch flow.
When you need a cluster
A cluster is the right primitive when:- Your training job spans multiple machines (NCCL all-reduce across nodes)
- You need a fixed pool of capacity reserved for a project, with predictable cost
- You want bare-metal performance with no neighbours and a known network topology
- You’re running for weeks or months, not hours
- A single VM for one machine of any size — see Instances
- Serverless runs for one-shot jobs that fit on a single node — see Runs
- Dedicated inference for serving models — see Dedicated Inference
What you get
- Bare-metal nodes (H100, H200, or B200) with 400 Gb/s InfiniBand
- Reserved capacity for the contract duration (1 week to 1 year+)
- Enterprise SLA support
- Per-cluster access credentials (SSH keys, VPN, kubeconfig as appropriate)
- Hardware health telemetry (GPU temperature, memory, power) and benchmark results
Managing a cluster
Provisioned multi-node deployments are exposed under/api/v2/external/infra/deployments. The same data backs the dashboard’s nodes, access, benchmarks, contract, and invoice views. Each cluster has:
- Nodes — per-node hardware specs and live health snapshots
- Access — credentials for SSH, VPN, kubeconfig, or web console
- Benchmarks — synthetic performance benchmarks run on the hardware (with downloadable raw results)
- Contract — the agreement covering this cluster (term, monthly price, notice period, downloadable PDF)
- Invoices — billing history specific to this cluster

