How video stays in sync during an online watch party, why it sometimes doesn't, and what every piece of terminology actually means in plain language.
When people say their video is "out of sync," they usually mean one of two very different things. Understanding which one helps you fix it in seconds rather than spending twenty minutes rebooting everything.
Most people experience the first kind and search for solutions to the second kind. They're fixed differently.
In a screen-sharing watch party, there is only one video source: the host's computer. The host's browser captures the screen pixels and audio samples, encodes them into a stream, and transmits that stream to each viewer over a WebRTC peer-to-peer connection.
WebRTC (Web Real-Time Communication) is the open standard that makes direct browser-to-browser media streams possible without a central video server in the middle. When you start a WatchTogether room, WebRTC negotiates a direct connection between the host and each viewer. The video travels the shortest network path between those two specific machines — not through a datacenter routing everything.
This is why screen-sharing sync is fundamentally different from traditional streaming: there's no "server" buffering your video and serving it to you minutes later. The latency is typically 100–500 milliseconds from the host's screen to a viewer's monitor, compared to 5–30 seconds for traditional HLS streaming services.
The total delay between the host doing something (pressing play) and a viewer seeing it. In WebRTC, this is typically 150–400ms — fast enough to feel real-time but still a physical reality of the internet.
Variation in latency over time. If packets sometimes arrive in 100ms and sometimes in 400ms, the viewer's browser has to buffer and smooth them out. High jitter causes momentary stutters and the visual impression of desync.
A small buffer the viewer's browser maintains to absorb jitter. It holds a few hundred milliseconds of incoming video, reorders packets that arrive out of sequence, and plays them smoothly. If jitter is larger than the buffer, frames arrive too late and get dropped.
The amount of video data transmitted per second (measured in Mbps). Higher bitrate = better quality but more demand on the host's upload connection. WebRTC adapts the bitrate automatically based on available bandwidth — this is why a slow host connection causes the picture to look blurry rather than freezing entirely.
The protocols WebRTC uses to find a connection path between two browsers. STUN servers help browsers discover their public IP address. TURN relay servers are used as a fallback when a direct peer-to-peer path can't be established (e.g., behind a corporate firewall). WatchTogether uses TURN fallback automatically when needed.
Audio and video are encoded and decoded separately. They're kept in sync by timestamps embedded in the stream, called presentation timestamps (PTS). When the decoding pipeline runs smoothly, both streams play at the right time. When something disrupts that pipeline, one stream falls ahead of the other.
Modern browsers decode video on the GPU and audio on the CPU. These two components run on slightly different clocks. Over a long session, a small drift accumulates. After 30–60 minutes, a few milliseconds of drift per minute adds up to a visibly wrong lip sync. Disabling hardware acceleration in Chrome forces video decoding back onto the CPU — same clock as audio, drift eliminated.
If the host's CPU is overloaded (encoding the screen capture, running a game, processing a background download), it drops video frames to keep up. Audio continues uninterrupted. The result is video that falls behind audio. Closing unnecessary applications before sharing removes this bottleneck.
Every audio output device has its own buffer size and playback latency. Bluetooth headphones add 100–300ms of latency compared to wired output. If the viewer's system is mixing audio outputs — or if the audio driver's buffer changed — the sound arrives offset from the video frames. Switching to wired audio or restarting the audio driver resolves this.
In screen-sharing parties, group drift is less common than in synchronised-playback parties (like Teleparty) because there's only one stream — but it can occur when one viewer's network path to the host is congested or slow. Their browser has to buffer more aggressively, adding extra latency on top of the base WebRTC latency. The result: they're consistently a few seconds behind everyone else.
In synchronised playback, everyone plays their own local copy of the video. A shared control layer listens to the host's play/pause/seek events and sends commands to every other browser in the room to mirror those actions within milliseconds. There is no stream traveling between users — each person is watching Netflix's servers directly.
This is why the quality is better (you're watching at your own Netflix-tier quality) but the compatibility is worse (it only works for services the extension explicitly supports). Drift can occur if one viewer's stream buffers — Netflix pauses that stream to catch up, but the extension can't always detect this and pause everyone else in sync.
Screen sharing is lower quality than synchronised playback but dramatically more flexible. Synchronised playback is higher quality but more fragile and limited in scope. For casual, social watch parties — the kind where what matters is laughing together, not pixel-perfect 4K — screen sharing is the better tool. For film-club level quality with everyone already subscribed to the same service, synchronised playback has an edge.