Published on April 16, 2024 (8 months ago)

Which live stream ingest protocol is right for you?

Bobby Peck
By Bobby Peck10 min readVideo education

Internet Live Streaming has come a long way since it unofficially began with Severe Tire Damage’s June 24, 1993 performance on the Internet’s multicast backbone (Mbone). Their 152x76 resolution and 10-ish frames per second performance was encoded on a laptop and sent over multicast UDP frames to a data center where it was relayed using the same technology to their audience. Fast-forward over thirty years in 2024 the landscape of ingesting a live stream to a provider for delivery has changed quite a bit.

Live streaming now spans the gamut from everyday one-to-one calls to streaming concurrently to 59 million people. It should be no surprise that there is a wide variance in technologies used to get your live stream over that first mile from its source to the provider handling its delivery to the rest of the world.

So, how do you settle on the right technology to implement for your use case?

The choice of a live ingestion protocol for your use case can have a meaningful impact on all other areas of your stream. Your audience may get frustrated by the delay if you go with a technology that does not meet your latency requirements. Your desired quality may not be achievable depending on your network connection and protocol choice. You may even burn your hand if you go with something that is too CPU-intensive for your phone.

These live streaming ingestion protocols all have their own trade-offs that you’ll need to navigate in order to make the best decision for your use case. What follows are some of the most important considerations you’ll need to make as you navigate this decision.

LinkQuestions to ponder about your use case

There are many factors to consider when selecting what to use to get your stream over that first mile to your delivery provider:

LinkNarrowing your choices

Let’s set aside the technical protocol details for a moment. You may be able to easily whittle down your options by asking yourself these two critical questions upfront:

  1. What can my live stream source output?
  2. What does my ingest/delivery provider accept?

Based on the answers to these two questions, your choice for an ingestion protocol may either be decided already or narrowed down drastically.

Platform

Protocols

Codecs

Facebook

RTMP(S)

h.264

X (Twitter)

RTMP(S), HLS Pull

h.264

YouTube

RTMP(S), HLS Pull, SRT (Closed Beta)

h.264, HEVC, AV1

Twitch / IVS

RTMP(S), SRT

h.264

Mux

RTMP(S), SRT

h.264, HEVC

Still have lots of options at your disposal? Okay, let’s continue.

LinkBuilding Blocks

There are a few common building blocks that you should be familiar with before diving into any one specific protocol:

LinkTCP vs UDP

One of the defining characteristics of a protocol is whether it builds on top of TCP or UDP. This decision, by itself, can tell you a lot about the protocol’s goals and complexity.

TCP provides “reliable, ordered, and error-checked delivery of [bytes]”, which means that protocols built on top of it don’t need to worry about implementing those features since it’s already baked in. These guarantees come at the cost of extra latency and being susceptible to the head-of-line blocking problem.

Protocols built on UDP have the benefit of avoiding the latency and head-of-line blocking inherent with TCP but at the expense of complexity as the error recovery, reliability, and ordering need to be handled at the protocol level. Handling these aspects in the protocol level allows for more control and the ability to tune for a specific use case. A protocol may opt for increasing latency and performing extra retries to account for poor network conditions. Protocols may choose to prioritize retrying missing audio at the expense of video to ensure that conversations are still intelligible.

LinkReal-time Transport Protocol (RTP)

RTP is a network protocol for sending audio and video over UDP. It has a long history and the UDP frames sent by Severe Tire Damage’s broadcast were a direct precursor to the RTP we know today. RTP isn’t considered an ingestion protocol by itself but more of a building block that other protocols layer on top of much like how protocols build on top of UDP or TCP.

LinkThe major protocols

Looking at the field of protocols there are four that make up the vast majority of real-world usage: RTMP/S, SRT, Zixi, & RTSP. Their continued dominance can be attributed to encoder/provider support, established tooling, & feature sets.

LinkReal Time Messaging Protocol (RTMP) and Variations

RTMP started its life in the 90s as a proprietary protocol developed by Macromedia for streaming video to Flash Players. While RTMP’s use as a delivery protocol has mostly faded away in favor of newer technologies, it still reigns as the dominant technology for getting media from source to encoder. A number of factors have contributed to its dominance in the space but the largest one is the chicken and egg problem between encoders and live streaming providers. Encoders don’t want to build in support for a protocol that no providers support and providers don’t want to support a protocol that no encoders support.

Over the years there have been variations of RTMP that have been developed to modernize the protocol to varying degrees of success. The two most notable variations are

  1. RTMP over TLS (RTMPS) adds encryption to the protocol level making it more secure.
  2. Enhanced-rtmp which adds support for alternate codecs and HDR

LinkSecure Reliable Transport (SRT)

SRT is a UDP-based protocol developed by Haivision as a means of “delivering low latency video and other media streams across lossy networks”. It was first demoed at IBC, a global media conference, in 2013 and open sourced in 2017 since then it has gained traction as an alternative to RTMP due to the inherent limitations when using a TCP-based protocol. As streaming has gotten more mainstream, especially over mobile networks, and as more encoders plus providers support it SRT has gained traction over the past few years as a viable alternative to RTMP.

LinkZixi

Zixi is both a company and a protocol. Since its founding in 2008 Zixi has developed its proprietary protocol and accompanying products to “broadcast quality live video delivery over any IP network”. Zixi’s feature set makes it attractive to broadcasters looking for high quality, reliable transmission of their streams. Going with a proprietary protocol will lock you in to that ecosystem so it is important to decide if that is something you are ok with and are willing to go through the effort of changing if your technology stack changes in the future.

LinkReal Time Streaming Protocol (RTSP)

RTSP came about shortly after RTMP in 1998 by RealNetworks, Netscape, and Columbia University. RTSP is just the control protocol that defines actions (DESCRIBE, SETUP, PLAY, PAUSE, …) on a specific resource like a video. The actual streaming of that video is done by RTP. RTSP has mostly fallen out of favor for live stream ingest with the exception of security cameras where it has remained dominant.

LinkThe others

LinkWeb Real-Time Communication (WebRTC) and WebRTC-HTTP ingestion protocol (WHIP)

You can think of WebRTC as multiple protocols stacked in a protocol trench coat rather than a protocol by itself. One of WebRTC’s big selling points is that it is now baked into all major web browsers making it possible to easily “go live” straight from a web page.

Some of the underlying protocols that make WebRTC possible include

  • RTP
  • Session Description Protocol (SDP)
  • Real-Time Transport Control Protocol (RTCP)
  • Session Traversal Utilities for NAT (STUN)
  • Interactive Connectivity Establishment (ICE)
  • Traversal Using Relays around NAT (TURN)

What is not defined in WebRTC itself is how one client signals to another to kick off sharing media. In most systems using WebRTC this is done via a proprietary protocol which makes its use as an ingest protocol difficult when there are technologies from different providers in use. WHIP was developed as a common signaling protocol to make WebRTC more viable as a means of communication between disparate systems. WHIP describes a common set of HTTP calls in order to perform the requisite negotiation to set up a WebRTC session.

LinkHLS Pull

HLS was developed by Apple as a delivery protocol but some providers have adopted it as an ingest protocol. Since HLS is HTTP-based it shares the same cons of TCP-based protocols with the addition that it is segmented so latency is even higher. The benefits of HLS are

  • It’s a common protocol used by delivery providers so you may have an HLS stream at your disposal
  • Since HLS is a pull-based model, unlike the others which are push-based, you can ingest into multiple systems with only minor changes

LinkReliable Internet Stream Transport (RIST)

RIST is an open protocol built on top of RTP and UDP to avoid the limitations of TCP and to do it in an open source fashion to promote interoperability between vendors.

LinkMedia over QUIC (MoQ)

An emerging entrant to the space is MoQ which is aiming to “leverage the features of QUIC to create a simple yet flexible low latency protocol that can rapidly detect and respond to congestion”. It has larger ambitions than “just” an ingest protocol but its design makes it worth looking into as it evolves.

LinkProprietary Protocols

Many companies have developed their own protocols or enhanced open protocols that work solely with their own service/encoder. They each will have their own pros/cons but all share the con of not being compatible with other systems.

LinkSo…what? Which one should I use?

Well…unfortunately that answer is the typical non-answer of, “It depends.” While each protocol in this list performs a very similar task, how and where are critical aspects of why you’d pick one over the other.

Do you need to support any and every encoder under the sun, even really old ones? RTMP is, realistically, your best bet. Even RTMPS can run into compatibility issues with limited/older encoders. SRT might be what you reach for if you have access to modern encoders but are particularly worried about your upstream bandwidth or connection reliability.

Do you want to allow people to stream from a browser? RTMP and SRT both depend on network protocols that are inaccessible in a browser environment (RIP, Flash), which probably means WebRTC/WHIP is your best option.

As a startup launching live video in the very early days of SRT, we made the choice to stick with RTMP/S. We needed to support as many streamers as possible and SRT was so new that the encoder options were limited. Fast forward a few years and we’re finally excited to be able to say “why not both?”

Written By

Bobby Peck

Previously worked on VOD & Ad products at Brightcove. When not working usually out cell phone reception on a hiking trail.

Leave your wallet where it is

No credit card required to get started.