Skip to main content
POST
/
api
/
v2
/
external
/
inference
/
create
Create Dedicated Deployment
curl --request POST \
  --url https://api.lyceum.technology/api/v2/external/inference/create \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "hf_model_id": "<string>",
  "hardware_profile": "<string>",
  "target_rps": 123,
  "target_latency_p95_ms": 123,
  "stabilisation_window": 123,
  "hf_token": "<string>",
  "min_replicas": 1,
  "max_replicas": 1
}
'
{
  "deployment_id": "<string>",
  "status": "<string>"
}

Authorizations

Authorization
string
header
required

Pass an API key (prefixed lk_) or a JWT access token as a bearer token. Generate API keys in the dashboard at https://dashboard.lyceum.technology/api-keys.

Body

application/json

Request body for POST /api/v2/external/inference/create.

hf_model_id
string
required

HuggingFace model ID, e.g. 'meta-llama/Llama-3.2-1B'

hardware_profile
string
required

Hardware profile, e.g. 'gpu.a100'

target_rps
number
required

Target requests per second per replica for scale-up

target_latency_p95_ms
number
required

Target p95 latency in milliseconds for scale-up

stabilisation_window
integer
required

Stabilisation window in seconds for scale-down

hf_token
string | null

HuggingFace token for gated models

min_replicas
integer
default:1

Minimum replicas to keep running

Required range: x >= 1
max_replicas
integer
default:1

Maximum replicas allowed

Required range: x >= 1

Response

Successful Response

Response body for POST /api/v2/external/inference/create.

deployment_id
string
required

Stable UUID used for all subsequent inference calls

status
string
required

Initial deployment status, e.g. 'pending'