Skip to main content

Documentation Index

Fetch the complete documentation index at: https://orbit-docs.devotel.io/llms.txt

Use this file to discover all available pages before exploring further.

Transport architecture

Orbit has two transports for real-time audio and video. Each call picks one at session establishment; you generally do not pick directly — the API selects based on the call shape. This page documents the decision tree so you can predict (and, when needed, force) which transport a given session will use.

TL;DR

Session shapeTransportReason
1 caller ↔ 1 AI agent (browser softphone, voice agent)WebSocket bridgeLowest latency, one peer-pair, no media mixing
1 caller ↔ 1 human agent over PSTNPSTN ↔ SBC (no SFU)Pure SIP/RTP via Jambonz; SFU adds nothing
Multi-party voice (3+ legs)Orbit Media (SFU)Server-side audio mixing + selective forwarding
Any video callOrbit Media (SFU)Simulcast + per-subscriber bitrate adaptation
Recorded session (any shape)Orbit Media (SFU egress)Recording pipeline lives on the SFU
AI voice agent over sip_forward to 3rd-party (Vapi/Retell/callers.ai)PSTN ↔ SBC re-INVITECarrier ↔ SBC ↔ vendor SIP; no SFU

Why two transports

Real-time media has two cost dimensions: fan-out (how many participants receive each stream) and mixing (whether the server combines streams before sending). A 1:1 voice agent has neither — there is exactly one publisher per side, and no mixing happens. A direct WebSocket bridge (sub-100ms p50 on warm regions) is strictly cheaper and lower-latency than routing through an SFU. The moment you add a third participant, recording, simulcast, or any form of selective forwarding, the SFU pays for itself. Orbit Media handles per-subscriber bitrate, codec re-negotiation, server-side recording (via egress), and DTLS/SRTP key rotation — all features that would cost weeks to replicate over raw WebSocket.

Decision tree (the actual logic)

session = startCall(legs)

if legs.length == 1 && legs[0].kind == "ai_agent" && !legs[0].recording_required:
    return TRANSPORT.WS_BRIDGE         # browser ↔ agent-runtime WS

if all(leg.kind == "pstn" for leg in legs) && legs.length == 2:
    return TRANSPORT.SBC_DIRECT        # Jambonz ↔ Jambonz, RTPengine bridges

if any(leg.kind == "video" for leg in legs):
    return TRANSPORT.ORBIT_MEDIA       # SFU mandatory for video

if legs.length >= 3:
    return TRANSPORT.ORBIT_MEDIA       # SFU for multi-party

if any(leg.recording_required for leg in legs):
    return TRANSPORT.ORBIT_MEDIA       # egress lives on SFU

return TRANSPORT.WS_BRIDGE             # fall through: 1:1 audio, no recording
This logic lives in apps/api/src/routes/voice/transport-selector.ts. It is deterministic per call — the same input always yields the same transport, so you can reason about a call’s path from the API request alone.

Latency profile

Approximate p50 round-trip times, eu-west to a participant in the same region, on warm connections:
HopWS bridgeOrbit Media (SFU)
Client publish → server18 ms22 ms
Server → AI agent / mixer2 ms5 ms (mixer step)
Server → recipient18 ms24 ms (per-subscriber)
Total p50 mouth-to-ear~40 ms~55 ms
The 15ms delta is the price you pay for SFU features (recording, fan-out, simulcast). For a 1:1 voice agent it’s a noticeable regression; for any session that needs the SFU’s features it’s a rounding error against the network’s own jitter.

How to force a transport

You generally shouldn’t, but two API surfaces accept an explicit override:
  • POST /api/v1/voice/calls — set transport: "sfu" or transport: "ws" to override the default selection. Returns 400 INVALID_TRANSPORT if the chosen transport can’t satisfy the call shape (e.g. transport: "ws" on a 4-party call).
  • POST /api/v1/video/rooms and /api/v1/video/rooms-scheduled — always use Orbit Media (the SFU). There is no WS-bridge mode for video.

Failure modes and fallback

If Orbit Media is unreachable at session start (the SFU pool returns 503 or the JWT mint fails), the API responds with 503 SERVICE_UNAVAILABLE and the call does NOT silently fall back to the WS bridge. Falling back would change the recording contract (no recording on WS bridge) and the multi-party contract (WS bridge can’t fan out), both of which violate the caller’s expectations. The dashboard surfaces the SFU outage explicitly so the operator can retry. If the WS bridge is unreachable on a 1:1 AI agent call, the API tries the SFU as a fallback (the SFU CAN handle the 1:1 shape, just with +15ms latency). This direction of fallback is safe — the recording contract is unchanged because the original request didn’t require recording.

Cross-references

  • Voice API → Softphone token — how the WS-bridge token is minted for browser softphone clients.
  • Video API — every endpoint here mints SFU tokens against Orbit Media.
  • Voice quickstart — end-to-end walk-through of a 1:1 AI voice agent call, which uses the WS bridge by default.
  • Attribution — Orbit Media is forked from LiveKit OSS under Apache-2; the WS bridge is in-house.