Active Models - Lyceum Documentation

A dedicated deployment is made up of one or more replicas, identical copies of the model serving requests. Each replica reports its own status (pending, running, stopped, failed) and a health flag updated by periodic health checks. The replica count autoscales between min_replicas and max_replicas based on the deployment’s scaling target. This page covers the read-side endpoints for inspecting what’s running, plus the model catalogue endpoints for discovering what’s available.

Inspecting deployments

Method	Endpoint	Purpose
`GET`	`/inference/list`	All your deployments
`GET`	`/inference/get?deployment_id={id}`	Deployment details with replicas
`DELETE`	`/inference/stop`	Stop a deployment
`POST`	`/v1/chat/completions`	Send a chat completion to a deployment

GET /inference/get returns the deployment’s status, scaling config, and a list of replicas with their individual health and last health-check timestamps. By default it returns only active deployments, pass include_terminated=true to see stopped ones too (useful for auditing or cost analysis).

Picking a model to deploy

There’s no platform-wide model catalogue API to browse before deploying, any Hugging Face model ID can be passed to POST /inference/create. The dashboard’s Dedicated Inference page offers a curated grid of popular models as a convenience.

​Inspecting deployments

​Picking a model to deploy

Inspecting deployments

Picking a model to deploy