Pass an API key (prefixed lk_) or a JWT access token as a bearer token. Generate API keys in the dashboard at https://dashboard.lyceum.technology/api-keys.
Request body for POST /api/v2/external/inference/create.
HuggingFace model ID, e.g. 'meta-llama/Llama-3.2-1B'
Hardware profile, e.g. 'gpu.a100'
Target requests per second per replica for scale-up
Target p95 latency in milliseconds for scale-up
Stabilisation window in seconds for scale-down
HuggingFace token for gated models
Minimum replicas to keep running
x >= 1Maximum replicas allowed
x >= 1