Serverless inference is coming soon. Use Dedicated Inference for production workloads in the meantime.
How it differs from dedicated
| Dedicated | Serverless | |
|---|---|---|
| Billing | Per replica-hour | Per request |
| Idle cost | Yes — replicas stay warm | No |
| Cold-start latency | None (after first replica is healthy) | Yes, for the first request after idle |
| Best for | Production traffic, steady load | Sporadic or low-volume traffic |

