Endpoint
queued event immediately, then either finished or failed when the inference resolves.
When to use this
- You want to display progress to a user during a long inference
- You’re building a chat UI where partial responses should appear as they’re generated
- You’re integrating with a frontend that already speaks SSE (most do)
Consuming the stream
EventSource in browsers, sseclient-py, etc.
