Documentation Index
Fetch the complete documentation index at: https://orbit-docs.devotel.io/llms.txt
Use this file to discover all available pages before exploring further.
Transport architecture
Orbit has two transports for real-time audio and video. Each call picks one at session establishment; you generally do not pick directly — the API selects based on the call shape. This page documents the decision tree so you can predict (and, when needed, force) which transport a given session will use.TL;DR
| Session shape | Transport | Reason |
|---|---|---|
| 1 caller ↔ 1 AI agent (browser softphone, voice agent) | WebSocket bridge | Lowest latency, one peer-pair, no media mixing |
| 1 caller ↔ 1 human agent over PSTN | PSTN ↔ SBC (no SFU) | Pure SIP/RTP via Jambonz; SFU adds nothing |
| Multi-party voice (3+ legs) | Orbit Media (SFU) | Server-side audio mixing + selective forwarding |
| Any video call | Orbit Media (SFU) | Simulcast + per-subscriber bitrate adaptation |
| Recorded session (any shape) | Orbit Media (SFU egress) | Recording pipeline lives on the SFU |
AI voice agent over sip_forward to 3rd-party (Vapi/Retell/callers.ai) | PSTN ↔ SBC re-INVITE | Carrier ↔ SBC ↔ vendor SIP; no SFU |
Why two transports
Real-time media has two cost dimensions: fan-out (how many participants receive each stream) and mixing (whether the server combines streams before sending). A 1:1 voice agent has neither — there is exactly one publisher per side, and no mixing happens. A direct WebSocket bridge (sub-100ms p50 on warm regions) is strictly cheaper and lower-latency than routing through an SFU. The moment you add a third participant, recording, simulcast, or any form of selective forwarding, the SFU pays for itself. Orbit Media handles per-subscriber bitrate, codec re-negotiation, server-side recording (via egress), and DTLS/SRTP key rotation — all features that would cost weeks to replicate over raw WebSocket.Decision tree (the actual logic)
apps/api/src/routes/voice/transport-selector.ts.
It is deterministic per call — the same input always yields the same
transport, so you can reason about a call’s path from the API request
alone.
Latency profile
Approximate p50 round-trip times, eu-west to a participant in the same region, on warm connections:| Hop | WS bridge | Orbit Media (SFU) |
|---|---|---|
| Client publish → server | 18 ms | 22 ms |
| Server → AI agent / mixer | 2 ms | 5 ms (mixer step) |
| Server → recipient | 18 ms | 24 ms (per-subscriber) |
| Total p50 mouth-to-ear | ~40 ms | ~55 ms |
How to force a transport
You generally shouldn’t, but two API surfaces accept an explicit override:POST /api/v1/voice/calls— settransport: "sfu"ortransport: "ws"to override the default selection. Returns400 INVALID_TRANSPORTif the chosen transport can’t satisfy the call shape (e.g.transport: "ws"on a 4-party call).POST /api/v1/video/roomsand/api/v1/video/rooms-scheduled— always use Orbit Media (the SFU). There is no WS-bridge mode for video.
Failure modes and fallback
If Orbit Media is unreachable at session start (the SFU pool returns503 or the JWT mint fails), the API responds with 503 SERVICE_UNAVAILABLE and the call does NOT silently fall back to the WS
bridge. Falling back would change the recording contract (no recording
on WS bridge) and the multi-party contract (WS bridge can’t fan out),
both of which violate the caller’s expectations. The dashboard
surfaces the SFU outage explicitly so the operator can retry.
If the WS bridge is unreachable on a 1:1 AI agent call, the API tries
the SFU as a fallback (the SFU CAN handle the 1:1 shape, just with
+15ms latency). This direction of fallback is safe — the recording
contract is unchanged because the original request didn’t require
recording.
Cross-references
- Voice API → Softphone token — how the WS-bridge token is minted for browser softphone clients.
- Video API — every endpoint here mints SFU tokens against Orbit Media.
- Voice quickstart — end-to-end walk-through of a 1:1 AI voice agent call, which uses the WS bridge by default.
- Attribution — Orbit Media is forked from LiveKit OSS under Apache-2; the WS bridge is in-house.