Documentation Index
Fetch the complete documentation index at: https://docs.lyceum.technology/llms.txt
Use this file to discover all available pages before exploring further.
Dedicated deployments host any Hugging Face model on reserved GPU replicas and expose it through an OpenAI-compatible Chat Completions endpoint.
Deploy
CLI
lyceum infer deploy meta-llama/Llama-3.1-8B-Instruct \
--hardware-profile gpu.a100 \
--min-replicas 1 \
--max-replicas 3
REST API
curl -X POST https://api.lyceum.technology/api/v2/external/inference/create \
-H "Authorization: Bearer $LYCEUM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"hf_model_id": "meta-llama/Llama-3.1-8B-Instruct",
"hardware_profile": "gpu.a100",
"min_replicas": 1,
"max_replicas": 3
}'
For gated models, include hf_token in the request body or pass --hf-token to the CLI.
Wait for it to be healthy
lyceum infer status <deployment_id> --wait
Or poll the API:
curl "https://api.lyceum.technology/api/v2/external/inference/get?deployment_id=<deployment_id>" \
-H "Authorization: Bearer $LYCEUM_API_KEY"
Call the deployment
curl https://api.lyceum.technology/api/v2/external/v1/chat/completions \
-H "Authorization: Bearer $LYCEUM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "<deployment_id>",
"messages": [{"role": "user", "content": "Say hello in three languages."}]
}'
The response shape mirrors OpenAI’s: id, choices, usage, model, created.
From the OpenAI Python SDK
Because the endpoint is OpenAI-compatible, you can point the official openai client at it:
from openai import OpenAI
client = OpenAI(
api_key="<your_lyceum_api_key>",
base_url="https://api.lyceum.technology/api/v2/external",
)
resp = client.chat.completions.create(
model="<deployment_id>",
messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)
Stop the deployment
lyceum infer stop <deployment_id>
curl -X DELETE https://api.lyceum.technology/api/v2/external/inference/stop \
-H "Authorization: Bearer $LYCEUM_API_KEY" \
-H "Content-Type: application/json" \
-d '{"deployment_id": "<deployment_id>"}'