Inference Commands

Spin up models on dedicated GPU instances for inference. The Lyceum inference system handles model deployment, querying, and lifecycle management For now all models are deployed on NVIDIA A100 GPUs and will be charged per second of uptime.

Commands

Command	Description
`lyceum infer deploy`	Deploy a model on a dedicated GPU instance
`lyceum infer models`	List all public models and private deployments
`lyceum infer chat`	Query a deployed model with a prompt
`lyceum infer spindown`	Spin down a deployed model to free GPU capacity

lyceum infer deploy

Deploy a model from Hugging Face on a dedicated GPU instance.

lyceum infer deploy <model_id>

Arguments

Argument	Description
`model_id`	(required) Hugging Face model ID (e.g., `mistralai/Mistral-Small-24B-Instruct-2501`)

Options

Option	Description
`--hf-token`	Hugging Face token for gated models (required for Mistral, Llama, etc.)

You will need a Hugging Face token to deploy gated models. Get your token at https://huggingface.co/

Examples

# Deploy a public model
lyceum infer deploy mistralai/Mistral-Small-24B-Instruct-2501 --hf-token YOUR_TOKEN_HERE

# Deploy Llama with HF token
lyceum infer deploy meta-llama/Llama-3-70B-Instruct --hf-token YOUR_TOKEN_HERE

Finding Model IDs

Model IDs can be found on Hugging Face by navigating to a model page and copying its unique directory and name. For example:

Visit https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501
Copy the model ID: mistralai/Mistral-Small-24B-Instruct-2501

lyceum infer models

List all available public models and your private deployments.

lyceum infer models

Examples

# View all models
lyceum infer models

lyceum infer chat

Perform inference with chat, image, or batch processing capabilities.

lyceum infer chat [OPTIONS]

Options

Option	Description
`-p`, `--prompt`	The message or path to file (.txt/.yaml/.xml)
`-m`, `--model`	Model to use. Default: `gpt-4`
`-t`, `--tokens`	Max output tokens. Default: `1000`
`-n`, `--no-stream`	Disable streaming response
`--type`	Output type (e.g., `json`, `markdown`). Default: `text`
`-i`, `--image`	Image path or base64
`--url`	Image URL
`--dir`	Directory of images
`--base64`	Treat image input as base64
`-b`, `--batch`	JSONL file for batch processing

Examples

# Basic chat with a model
lyceum infer chat -m gpt-4 -p "Explain quantum computing"

# Use a custom deployed model
lyceum infer chat -m mistralai/Mistral-Small-24B-Instruct-2501 -p "Write a Python function to sort a list"

# Chat with prompt from file
lyceum infer chat -m gpt-4 -p prompt.txt

# Disable streaming
lyceum infer chat -m gpt-4 -p "What is AI?" --no-stream

# Request JSON output
lyceum infer chat -m gpt-4 -p "List 3 programming languages" --type json

# Process image with text prompt
lyceum infer chat -m gpt-4 -p "Describe this image" -i image.png

# Process image from URL
lyceum infer chat -m gpt-4 -p "What's in this image?" --url https://example.com/image.jpg

# Batch processing from JSONL file
lyceum infer chat -b requests.jsonl

lyceum infer spindown

Spin down a deployed model to optimize costs and free GPU capacity.

lyceum infer spindown <model_id>

Arguments

Argument	Description
`model_id`	(required) Model ID to spin down

Examples

# Spin down a model
lyceum infer spindown mistralai/Mistral-Small-24B-Instruct-2501

Help

Every inference command is self-documenting. Use the --help flag to see available arguments and options:

lyceum infer --help
lyceum infer deploy --help
lyceum infer chat --help

Getting Started

Commands

​Inference Commands

​Commands

​lyceum infer deploy

​Arguments

​Options

​Examples

​Finding Model IDs

​lyceum infer models

​Examples

​lyceum infer chat

​Options

​Examples

​lyceum infer spindown

​Arguments

​Examples

​Help

Inference Commands

Commands

lyceum infer deploy

Arguments

Options

Examples

Finding Model IDs

lyceum infer models

Examples

lyceum infer chat

Options

Examples

lyceum infer spindown

Arguments

Examples

Help