Skip to main content
Clusters are an enterprise offering. Capacity is reserved per contract — there is no self-service launch flow.
Clusters are dedicated multi-node GPU deployments built for distributed training and large-scale inference. They span H100, H200, and B200 hardware interconnected with 400 Gb/s NDR InfiniBand for the cross-node bandwidth distributed training requires.

When you need a cluster

A cluster is the right primitive when:
  • Your training job spans multiple machines (NCCL all-reduce across nodes)
  • You need a fixed pool of capacity reserved for a project, with predictable cost
  • You want bare-metal performance with no neighbours and a known network topology
  • You’re running for weeks or months, not hours
For everything else, use:
  • A single VM for one machine of any size — see Instances
  • Serverless runs for one-shot jobs that fit on a single node — see Runs
  • Dedicated inference for serving models — see Dedicated Inference

What you get

  • Bare-metal nodes (H100, H200, or B200) with 400 Gb/s InfiniBand
  • Reserved capacity for the contract duration (1 week to 1 year+)
  • Enterprise SLA support
  • Per-cluster access credentials (SSH keys, VPN, kubeconfig as appropriate)
  • Hardware health telemetry (GPU temperature, memory, power) and benchmark results

Managing a cluster

Provisioned multi-node deployments are exposed under /api/v2/external/infra/deployments. The same data backs the dashboard’s nodes, access, benchmarks, contract, and invoice views. Each cluster has:
  • Nodes — per-node hardware specs and live health snapshots
  • Access — credentials for SSH, VPN, kubeconfig, or web console
  • Benchmarks — synthetic performance benchmarks run on the hardware (with downloadable raw results)
  • Contract — the agreement covering this cluster (term, monthly price, notice period, downloadable PDF)
  • Invoices — billing history specific to this cluster

Get in touch

For pricing and provisioning details, contact info@lyceum.technology.