Create Dedicated Deployment - Lyceum Documentation

curl --request POST \ --url https://api.lyceum.technology/api/v2/external/inference/create \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "hf_model_id": "<string>", "hardware_profile": "<string>", "target_rps": 123, "target_latency_p95_ms": 123, "stabilisation_window": 123, "hf_token": "<string>", "min_replicas": 1, "max_replicas": 1 } '

Authorizations

Authorization

string

header

required

Pass an API key (prefixed lk_) or a JWT access token as a bearer token. Generate API keys in the dashboard at https://dashboard.lyceum.technology/api-keys.

Body

application/json

Request body for POST /api/v2/external/inference/create.

hf_model_id

string

required

HuggingFace model ID, e.g. 'meta-llama/Llama-3.2-1B'

hardware_profile

string

required

Hardware profile, e.g. 'gpu.a100'

target_rps

number

required

Target requests per second per replica for scale-up

target_latency_p95_ms

number

required

Target p95 latency in milliseconds for scale-up

stabilisation_window

integer

required

Stabilisation window in seconds for scale-down

hf_token

string | null

HuggingFace token for gated models

min_replicas

integer

default:1

Minimum replicas to keep running

Required range: x >= 1

max_replicas

integer

default:1

Maximum replicas allowed

Required range: x >= 1

Response

Successful Response

Response body for POST /api/v2/external/inference/create.

deployment_id

string

required

Stable UUID used for all subsequent inference calls

status

string

required

Initial deployment status, e.g. 'pending'

Documentation Index

Authorizations

Body

Response