Inference Commands
Spin up models on dedicated GPU instances for inference. The Lyceum inference system handles model deployment, querying, and lifecycle management For now all models are deployed on NVIDIA A100 GPUs and will be charged per second of uptime.Commands
| Command | Description |
|---|---|
lyceum infer deploy | Deploy a model on a dedicated GPU instance |
lyceum infer models | List all public models and private deployments |
lyceum infer chat | Query a deployed model with a prompt |
lyceum infer spindown | Spin down a deployed model to free GPU capacity |
lyceum infer deploy
Deploy a model from Hugging Face on a dedicated GPU instance.Arguments
| Argument | Description |
|---|---|
model_id | (required) Hugging Face model ID (e.g., mistralai/Mistral-Small-24B-Instruct-2501) |
Options
| Option | Description |
|---|---|
--hf-token | Hugging Face token for gated models (required for Mistral, Llama, etc.) |
You will need a Hugging Face token to deploy gated models. Get your token at https://huggingface.co/
Examples
Finding Model IDs
Model IDs can be found on Hugging Face by navigating to a model page and copying its unique directory and name. For example:- Visit https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501
- Copy the model ID:
mistralai/Mistral-Small-24B-Instruct-2501
lyceum infer models
List all available public models and your private deployments.Examples
lyceum infer chat
Perform inference with chat, image, or batch processing capabilities.Options
| Option | Description |
|---|---|
-p, --prompt | The message or path to file (.txt/.yaml/.xml) |
-m, --model | Model to use. Default: gpt-4 |
-t, --tokens | Max output tokens. Default: 1000 |
-n, --no-stream | Disable streaming response |
--type | Output type (e.g., json, markdown). Default: text |
-i, --image | Image path or base64 |
--url | Image URL |
--dir | Directory of images |
--base64 | Treat image input as base64 |
-b, --batch | JSONL file for batch processing |
Examples
lyceum infer spindown
Spin down a deployed model to optimize costs and free GPU capacity.Arguments
| Argument | Description |
|---|---|
model_id | (required) Model ID to spin down |
Examples
Help
Every inference command is self-documenting. Use the--help flag to see available arguments and options:

