One bite and all your dreams will come true - Apple.
Last week at WWDC, Apple announced their usual cascade of software updates, and as has been tradition for the last 4 years, Roger Pantos took the stage to announce the latest swathe of changes to the HTTP Live Streaming (HLS) specification. This year's changes are intended to reduce the latency of live video streams, but at what cost?
HLS is a segmented delivery technology which enables live and on-demand video streaming to devices. While HLS was designed for Apple devices, it’s now used broadly across the video streaming ecosystem including on browsers, smart TVs, set top boxes, and games consoles. It's a simple protocol that’s easy to understand and implement. You deliver a master playlist (often called a manifest) text file, which describes the different resolution and bitrate combinations (renditions) of your content you have available, and then you have a separate playlist for each of those renditions, which contains a list of media segments, their durations, and the URLs where to get them.
While HLS is simple and scalable, it’s prone to high latency when used for live streaming. In this context, we’re talking about “wall-clock” or “glass-to-glass latency”, which is the time between something happening IRL, and being seen by the end user. In HLS, latency is closely tied to the duration of the media segments that you’re using - generally the lower limit of segment duration has been found to be around 2 seconds, which will deliver a passable streaming experience, while delivering a latency of around 10 seconds. More traditional HLS streaming setups with longer segment durations can have latencies of upwards of 30 seconds.
This year at WWDC, Pantos took to the stage to announce that Apple updated HLS to include a new low-latency mode. “Great!” you’re all saying, “Lower latency video, that’s a good thing, right?”. But here’s what’s interesting - this isn’t the first attempt to write a specification for low-latency HLS. There’s been a specification in open development within the video developer community for over a year, based on white papers over two years old. The community approach used on the surface simpler, more widely deployable and available technologies. So why didn’t Apple use the communities work? Let’s take a look at the approach Apple took, and how it differs from what the community has been working on.
First, let’s look at how Apple’s Low Latency HLS solution works. You can watch the presentation here, and read the spec here, but here’s the 30,000 foot summary of the changes Apple have made in the name of low-latency:
_HLSfor their new “Origin API” which can be used to manipulate the behavior of playlist generation.
What’s likely to jump out at you if you’ve used HLS before is “Wow, that’s a lot of moving parts”, and you’d be right, this is a pretty complex addition to an otherwise simple specification. For bonus points you’re going to have to implement all of these features, and some more I haven’t talked about (yes, including HTTP/2) in order to get your low-latency HLS stream to work. For the time being at least, you’ll have to get your application (and thus your low latency implementation) tested by Apple to get into the app store, signaled by using a special identifier in your application’s manifest.
The biggest departure from traditional HLS approaches that these changes have caused is the significant increase in the state that needs to be communicated between the playlist generation process and the encoder process. Historically, the process was simple. The encoder generated a new segment, put it onto some storage (a CDN or object store), and updated the playlist to indicate the new segment was available. Now there has to be a lot more logic performed when the playlist is generated, including in some cases, hanging the connection for a while while a part becomes available to download.
Alone, I actually don’t think that the ALHLS is a bad specification. Is it complicated? Yes. Does it have a lot of moving parts? Yes. Does that make it fundamentally bad? No.
There are things I don’t think are elegant in there for sure - the reserved use of some query parameters to change playlist generation behavior isn’t to my liking, and neither is the blocking playlist request behavior. Let’s take a look in more detail at the areas that are going to be challenges from an implementation perspective:
Query parameter usage
Most playlist requests in 2019 use query parameters as part of their content security mechanism, meaning that part of all of the URL to the playlist is signed in order to stop unauthenticated users from accessing content. Introducing new, functional query parameters to the URL introduces extra complexity into the signing and caching implementations for playlist requests, as well as introducing new challenges in third party player development.
Blocking playlist reloads
Blocking playlist requests are certainly going to be a headache to maintain, and the current documented timeout behavior seems unclear and frustrating to monitor (503ing after 3x the target duration). Beyond this, this strategy opens some interesting and quite concerning security and performance concerns on your web and CDN tier.
HTTP/2 server push at scale
However, the biggest challenge for adoption with Apple’s approach is the mandatory use of HTTP/2. In the announcement, Apple touted HTTP/2 as “widely adopted […] by CDNs”. While on the surface this is true, this statement doesn’t really hold true for the features of HTTP/2 that Apple require you to use.
HTTP/2 server push works by allowing a server (a node in a CDN in this case) to push an object back in the client without the client asking for it. This is pretty cute, but comes with 2 major headaches when we’re talking about using it at scale through name brand CDNs:
preloadkeyword in your
Linkheaders, in your origin response. This causes the CDN to link together the two objects in its cache, and push out the linked objects appropriately. However this brings us to problem 2…
Now let’s talk a little bit about how this differs from the community developed LHLS solution.
HLS.js in conjunction with a variety of others, including Mux, JW Player, Wowza, Elemental, and Akamai have been collaborating on a community driven approach to implement low latency streaming using HLS for well over a year. Much of the discussion around the formal standard can be found on this Github issue. The initial concept and terms come from a Periscope blog article published in mid-2017, describing how they had implemented their own approach for low latency HLS streaming. You can read this article here.
The approach is actually very simple (much more simple than ALHLS). Apart from some simple new playlist semantics, LHLS uses the same strategy used when delivering low latency MPEG DASH - HTTP 1.1 chunked transfer encoding. Chunked transfer encoding is a great fit here because it allows you to start sending your HTTP response as chunks of data as you have them, before the complete response is available.
This is helpful because it lets you send what Apple are calling “parts” of the segment of video as the encoder generates them, back to the client, which can start playing them as soon as it gets them, without needing to wait for a full segment to be available. The really great thing about chunked transfer mode is that it’s available on the overwhelming majority of CDNs - it's much more widely supported than HTTP/2 push is today.
Beyond its simple availability, this approach actually allows for less busy actions on the client device in comparison to ALHLS. On the surface, LHLS maintains the traditional HLS paradigm, polling for playlist updates, and then grabbing segments, however, because of the ability to stream a segment back as it's being encoded, you actually don’t have to reload the playlist that often, while in ALHLS, you’ll still be polling the playlist many times a second looking for new parts to be available, even if they’re then pushed to you off the back of the manifest request.
It would have been amazing to see Apple bring some of the concepts that it’s been developing for ALHLS (mainly delta playlists) to LHLS - these approaches combined would have made an elegant, powerful solution, so why didn’t they?
So if LHLS is so great, and supported in the community, why didn’t Apple just get involved? Honestly, I don’t know. Apple’s decision to ignore pre-existing communities or standards isn’t particularly new, but Apple had certainly been giving signs in the last few years that they were starting to move into alignment with the rest of the video streaming industry.
While Apple never adopted the MPEG DASH streaming standard (a competitive standard to HLS, despite being involved in the DASH Industry Forum), a couple of years ago, it started supporting fMP4 and CMAF media chunks. This support is now available in the overwhelming majority of Apple devices, which meant that the dream of delivering one set of media segments in one way through one endpoint, including low-latency modes was finally starting to become a reality.
However, with DASH’s ongoing standardization of an LHLS style chunked transfer delivery of low-latency streaming, it now seems that Apple is forcing us back into a segregated delivery stack strategy in order to support ALHLS, even if it is only for the hot end of the stream.
The biggest challenge here for many HLS and video platform vendors is going to be the mandatory HTTP/2 push, but I also strongly suspect this is the key as to why Apple chose to go in the direction they did. One of the big challenges for both ALHLS and LHLS is the problem of bandwidth estimation. In order to deliver a great streaming experience, you have to be able to measure and respond to changes in a user’s bandwidth. Historically estimating a user’s available bandwidth has been easy - you measure how long the last media segment took to download, and then check the size of that segment, do some simple math, and this gives you a good bandwidth estimation.
In the chunked transfer world however, estimating bandwidth when you expect every segment to take exactly as long as it took to generate to download isn’t easy, you need an alternate bandwidth performance measurement. It could be to use the playlist fetches, or using a small reference file, occasionally using a full segment, or something else.
My working theory is that Apple didn’t want to solve this problem in any of these ways, leaving the only option to be to allow AVPlayer (Apple’s streaming framework) to be able to measure the performance of individual chunks of a chunked-transfer response. I suspect Apple decided that it wasn’t interested in adding any new functionality to its legacy HTTP 1.1 stack on devices in order to support this.
Now, this said, HTTP/2 push absolutely does not solve this problem. There are no APIs in modern browsers or devices which allow you to check the download performance of a HTTP/2 push response, but with blocking playlist requests, the situation is actually worse: measuring the performance of a blocking playlist fetch along with a segment load doesn’t give you an accurate measurement, and you can’t use your playlist download performance as a proxy. We have to assume that Apple has a way of measuring this performance on their own devices when HTTP/2 is used for 2 reasons:
A Partial Segment must be completely available for download at the full speed of the link to the client at the time it is added to the playlist.
Another way to look at this decision is to classify it as “classic Apple”. It’s not like this is the first time Apple have taken a strongly opinionated deprecation timeline, though those decisions do tend to be rooted in more physical hardware. Headphone jacks, USB-A, a physical escape key… just to name a couple. Apple are the king of the dongle… maybe I can get an ALHLS to LHLS dongle too.
Apple’s beta of low latency is only compatible with iOS devices right now - not even the latest Safari technology preview on MacOS supports it (and I’m told it won’t for “some time”). However, Apple devices are really only a tiny part of the HLS ecosystem.
It’s worth keeping in mind that the amount of HLS that gets delivered to non-Apple devices is huge - with players like HLS.js and Video.js with their own HLS implementations supporting billions of impressions every day. So let’s assume that the video industry just follows Apple’s spec, and pivots away from any current approaches that they’ve been pursuing over the last year or two.
So would ALHLS be easy to implement on modern browsers or other devices? No, not really. The choice of technologies (namely HTTP/2) Apple has selected is going to make it really hard for non-Apple devices to implement ALHLS, and yes, that includes HLS.js, which Apple uses on their own website for their own developer videos.
HTTP/2 is a young technology, and the tooling to work with it are severely limited, and the web APIs in a browser just aren’t yet mature enough to build low-latency streaming technologies on top of the implementations. It’s likely Apple will be able to make it work well in Safari eventually since it can leverage private APIs, but the rest of the browser world is likely going to have to change rapidly in order to support and debug third party implementations of ALHLS.
Obviously, I’m sure that Apple performed a lot of due diligence, and investigated LHLS extensively. However, there’s some big challenges that need to be worked through before implementations of Apple’s specification will be ready.
Customers all over the video streaming industry are desperate for low latency solutions so that they can compete with the likes of Twitch or Twitter/Periscope. The community LHLS strategy is real and is available to implement against today (and indeed many have), and there’s nothing stopping you implementing it in a backwards compatible way across all major browsers.
Apple’s ALHLS however is clearly many months away even on iOS, since it’s very likely this won’t ship until iOS 13 at the earliest. This joined with the limited availability of HTTP/2 push on major CDNs, the requirement to use a single edge hostname, and Apple’s new app verification for ALHLS means that we're unlikely to see large scale deployments of ALHLS for a while yet. If you join this with wanting to offer the same experience with the same technologies on desktop, or other web players, you’ll also have to wait for web players to catch up with the dramatically more complex ALHLS implementation. This leaves vendors and customers in a challenging position for a while while the industry figures all this out. Do vendors continue to forge forward with chunked-transfer based solutions, or do they go all-in on Apple’s new ALHLS?
Its fair to say there’s a lot of people in the industry with a luke-warm (at best) reaction to Apple’s ALHLS specification, but ignoring community developed options while pushing for an excessively future facing approach isn’t exactly new ground. It really is a shame to see a lack of conversation, because in some cases, such as Swift, Apple are becoming a much more community centric organization.
Oh well, I guess we’ve got some work to do! 💪
Cover photo: Snow White and the Seven Dwarfs, Disney 1937