Streaming Inference - Lyceum Documentation

For long-running inference where you want progress as it happens — rather than waiting for the full response — Lyceum Cloud exposes a Server-Sent Events (SSE) endpoint per inference request.

Endpoint

GET /api/v2/external/inference/streaming/stream/{inference_id}

The stream emits a queued event immediately, then either finished or failed when the inference resolves.

When to use this

You want to display progress to a user during a long inference
You’re building a chat UI where partial responses should appear as they’re generated
You’re integrating with a frontend that already speaks SSE (most do)

For a one-shot blocking call where you only need the final response, the regular chat completions endpoint is simpler.

Consuming the stream

import requests

inference_id = "..."
url = f"https://api.lyceum.technology/api/v2/external/inference/streaming/stream/{inference_id}"

with requests.get(
    url,
    headers={"Authorization": f"Bearer {api_key}"},
    stream=True,
) as resp:
    for line in resp.iter_lines():
        if line:
            print(line.decode())

Any standard SSE client works the same way — EventSource in browsers, sseclient-py, etc.

Documentation Index

​Endpoint

​When to use this

​Consuming the stream

Endpoint

When to use this

Consuming the stream