Smart Routing

Smart Routing selects the best model for each prompt automatically. Simple questions go to a fast, cost-efficient model; complex or multi-step tasks go to a more capable one. You use the same OpenAI-compatible API. Just change the model name.

Models

Model	What it does
`lyceum/router`	Classifies each prompt and routes to the optimal model automatically
`lyceum/simple`	Always routes to a fast, cost-efficient model
`lyceum/complex`	Always routes to a high-capability model
`lyceum/reasoning`	Always routes to a reasoning model

Use lyceum/router when you want automatic cost/quality optimisation. Use the others when you want explicit control over which tier handles your requests.

Usage

The API is identical to any other serverless inference call. Only the model name changes.

from openai import OpenAI

client = OpenAI(
    api_key="lk_...",
    base_url="https://api.lyceum.technology/api/v2/external",
)

resp = client.chat.completions.create(
    model="lyceum/router",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(resp.choices[0].message.content)

Streaming works the same way:

stream = client.chat.completions.create(
    model="lyceum/router",
    messages=[{"role": "user", "content": "Explain gradient descent step by step."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Score endpoint

To inspect the complexity score for a prompt without triggering a full inference request, use the score endpoint directly. It returns a value between 0 and 1. Lower means simpler, higher means more complex.

import httpx

response = httpx.post(
    "https://api.lyceum.technology/api/v2/external/serverless/route",
    headers={"Authorization": "Bearer lk_..."},
    json={"input": "What is the capital of France?"},
)
print(response.json())  # {"score": 0.12}

curl https://api.lyceum.technology/api/v2/external/serverless/route \
  -H "Authorization: Bearer lk_..." \
  -H "Content-Type: application/json" \
  -d '{"input": "What is the capital of France?"}'

This is useful for understanding how the router classifies your prompts, or for building custom routing logic on top of the score.

Billing

Requests are billed at the rate of whichever model the router selects. lyceum/simple, lyceum/complex, and lyceum/reasoning always route to the same tier, so their cost is predictable.

Serverless Inference

Learn more about pay-per-request inference.

Getting Started

Serverless

Instances

Inference

Workloads

Observability

Tools

Configuration

Account

Models

Usage

Score endpoint

Billing

Serverless Inference

​Models

​Usage

​Score endpoint

​Billing

Serverless Inference

Models

Usage

Score endpoint

Billing