Create Dedicated Deployment
Inference
Create Dedicated Deployment
POST
Create Dedicated Deployment
Documentation Index
Fetch the complete documentation index at: https://docs.lyceum.technology/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Pass an API key (prefixed lk_) or a JWT access token as a bearer token. Generate API keys in the dashboard at https://dashboard.lyceum.technology/api-keys.
Body
application/json
Request body for POST /api/v2/external/inference/create.
HuggingFace model ID, e.g. 'meta-llama/Llama-3.2-1B'
Hardware profile, e.g. 'gpu.a100'
Target requests per second per replica for scale-up
Target p95 latency in milliseconds for scale-up
Stabilisation window in seconds for scale-down
HuggingFace token for gated models
Minimum replicas to keep running
Required range:
x >= 1Maximum replicas allowed
Required range:
x >= 1
