Cloudflare Realtime SFU is a WebRTC Selective Forwarding Unit that runs on Cloudflare's global network, so you can route live audio, video, and data between WebRTC clients around the world without managing SFU infrastructure or regions.
When you use the WebSocket adapter to stream WebRTC media to a WebSocket endpoint, the adapter now auto-reconnects and buffers audio and video after brief endpoint disconnects or restarts.
Many teams also use Realtime SFU as the media layer for backend applications, such as transcription, recording, note-taking, and agentic media-processing services. These systems often need to consume live WebRTC audio or video from the SFU in backend infrastructure, including Durable Objects, Workers, Containers, or external services, without running a WebRTC client themselves.
The WebSocket adapter bridges that gap by streaming WebRTC media from the SFU to a standard WebSocket endpoint as application-consumable payloads: PCM audio frames and JPEG video frames.
When you use the WebSocket adapter in Stream mode (egress) to send live audio or video from the SFU to your own WebSocket endpoint, the SFU now automatically reconnects after brief endpoint disconnects or restarts. This is especially helpful for long-running media pipelines where the WebSocket endpoint may briefly restart while a recording, transcription, or live analysis job is still in progress.
Previously, a brief disconnect from your WebSocket endpoint could close the adapter and require your application to recreate it before media could resume. Now, the SFU retries the same endpoint for up to 5 seconds with no API change required. If the endpoint comes back within that window, audio and video delivery resumes automatically.
The reconnect behavior also includes live-first media buffering, so brief interruptions reduce media loss without replaying stale video.
During reconnect:
- Audio uses a short bounded backlog to reduce audible loss. If the interruption lasts longer than the backlog can cover, older audio may be dropped.
- Video resumes from the latest available JPEG frame instead of replaying stale frames.
- Recovery is best effort and does not guarantee gapless or exactly-once delivery.
If the endpoint remains unavailable after the 5-second reconnect window, the adapter closes and must be recreated.
You can now record specific participant audio tracks in RealtimeKit with track recording. Track recording creates separate WebM files for each participant instead of a single composite recording, which is useful for post-processing, transcription, and regulated or content-sensitive workflows.
To record specific participants, pass
user_idswhen starting a track recording:Terminal window curl --request POST \--url https://api.cloudflare.com/client/v4/accounts/<account_id>/realtime/kit/<app_id>/recordings/track \--header 'Authorization: Bearer <api_token>' \--header 'Content-Type: application/json' \--data '{"meeting_id": "97440c6a-140b-40a9-9499-b23fd7a3868a","user_ids": ["user-123", "user-456"]}'To pass
user_idsfor selective track recording, use the following minimum SDK versions:- Web Core:
@cloudflare/realtimekitversion1.4.0or later - Web UI Kit:
@cloudflare/realtimekit-ui,@cloudflare/realtimekit-react-ui, or@cloudflare/realtimekit-angular-uiversion1.1.2or later - Android Core or iOS Core: version
2.0.0or later - Android UI Kit or iOS UI Kit: version
1.1.0or later
RealtimeKit provides SDKs and UI components so that you can build your own meeting experience on Cloudflare's global WebRTC infrastructure. Teams today build products ranging from telehealth to education on RealtimeKit for global audiences. You can get started today with our Quickstart or take a look at our Cloudflare Meet repo ↗ as a reference.
- Web Core:
Real-time transcription in RealtimeKit now supports 10 languages with regional variants, powered by Deepgram Nova-3 running on Workers AI.
During a meeting, participant audio is routed through AI Gateway to Nova-3 on Workers AI — so transcription runs on Cloudflare's network end-to-end, reducing latency compared to routing through external speech-to-text services.
Set the language when creating a meeting via
ai_config.transcription.language:{"ai_config": {"transcription": {"language": "fr"}}}Supported languages include English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch — with regional variants like
en-AU,en-GB,en-IN,en-NZ,es-419,fr-CA,de-CH,pt-BR, andpt-PT. Usemultifor automatic multilingual detection.If you are building voice agents or real-time translation workflows, your agent can now transcribe in the caller's language natively — no extra services or routing logic needed.