Author’s note: Since we last talked about this, Apple has started referring to their specification as LL-HLS, rather than the ALHLS term we coined. We’ll call it LL-HLS from now on.
Last year Apple announced a new revision to the HLS specification which included provisions for delivering streams with a low glass-to-glass latency, of around 5 seconds, intended to be competitive with traditional broadcast latency.
We wrote at length about this specification (take a look at my last post on the topic here), but in summary this revision introduced 5 new features intended to make low-latency HLS live streaming a reality. As a reminder, those changes were:
- Delivering shorter sub-segments of the video stream (Apple call these parts) more frequently (every 0.3 - 0.5s)
- Using HTTP/2 PUSH to deliver these smaller parts, pushed in response to a blocking playlist request
- Blocking playlist requests, eliminating the current speculative manifest request polling behaviour in HLS
- Smaller, delta rendition playlists, which reduces playlist size, which is important since playlists are requested more frequently
- Faster rendition switching, enabled by rendition reports, which allows clients to see what is happening in another playlist without requesting it in its entirety
We, like many others in the industry, weren’t super excited about this specification. While on its own it wasn’t a bad standard, it had some significant barriers to implementation and adoption at scale.
Firstly, the requirement for HTTP/2 PUSH for part delivery at the edge limited the CDNs that were compatible with this strategy. However more seriously the lack of a specification for chaining that HTTP/2 PUSH from the CDN edge to an origin created a requirement for a new way to eliminate secondary roundtrips from edge to origin. The lack of such a specification meant that no CDN was able to deliver Apple’s strategy without a time-consuming second round trip to the origin.
Secondly, there was a lot of concern with lack of interoperability with the chunked transfer delivery methodology adopted in MPEG’s DASH-LL standard.
Since Apple made their original announcement, we, and many others in the industry have been providing both public and private feedback to Apple. This culminated in 2 full-day workshops with Apple’s participation, one right after Mile High Video in Denver, and one after Demuxed in Cupertino.
I’m really happy to report that 2 weeks ago, Apple announced that they would be removing the requirement for HTTP/2 PUSH delivery of parts of media segments from their specification. We waited a little while to release this article as we wanted to give the dust a little time to settle, and to make sure that we understood the impact of these changes well.
How is the latest specification different?
The latest revision of the specification replaces the HTTP/2 PUSH with a new prefetch behaviour signalled by a new tag in the HLS playlist - EXT-X-PRELOAD-HINT. Otherwise the specification remains mostly unchanged.
EXT-X-PRELOAD-HINT allows the signalling of an upcoming part of the media which will become available in the near future. Clients are expected to make requests for these parts in advance, and the CDN edge and underlying origin are expected to block on this request until it can be fulfilled “at line speed” (chunked-transfer is still forbidden).
While the usage of HTTP/2 PUSH from edge to device is gone, it will still be mandatory to use HTTP/2 for edge delivery, though this is likely to only actually be enforced on Apple’s own devices (detecting HTTP/2 reliably in a browser can be a can of worms). The blocking request for the playlist is also still present in the specification, and so the updated simplest possible client side behaviour would be as follows: Client loads master playlist, picks a rendition
- Client loads the rendition playlist with the appropriate origin API query parameters for low-latency
- Client waits for the most recent EXT-X-PART in the manifest to contain INDEPENDENT:YES, immediately downloads this part (as well as any initialisation fragments if using CMAF media), allowing playback to be started
- Client sees an EXT-X-PRELOAD-HINT tag at the head of the manifest, and issues 2 concurrent blocking HTTP/2 requests one to retrieve the next delta in the rendition playlist, and one to request the next upcoming part
- Repeat until either the heat death of the universe, or the football match ends
A side effect of this specification change is that Apple’s specification has now become much more closely interoperable with MPEG DASH-LL. Parts signalled by EXT-X-PRELOAD-HINT can be signalled with byte offsets, just like normal segments can.
When working in an interoperable mode with DASH-LL, an LL-HLS client will be able to issue an open-ended byte range request against the start of a part of a segment in the manifest. An open byte range request is one which simply requests the starting offset, for example bytes: 0- but no ending offset, implying that you want “the rest of the object.”
The origin should block until the first part in the segment is available, releasing it via a CDN to the client. The client can continue to use that open ended range request, waiting for the next part to be delivered while transferring the just downloaded part into the MSE buffer. This effectively replicates the chunked transfer approaches used in DASH-LL, just with batched part responses, released at full speed, over range requests.
While the simplest implementation I described earlier will work, by requesting an open range request for the first part of a segment, and then leaving this request running greatly reduces the number of roundtrips required during low-latency streaming. This should significantly improve stability and resilience of the stream, while keeping the request rate in the player low.
I threw together a little diagram to illustrate this little trick in action with a 2s segment made up of four 500ms fragments. One day I’ll have enough artistic talent to also animate it!
So does it solve the challenges?
On the face of it, this specification change certainly seems to resolve most of the outstanding issues that the industry had identified. Let’s work through them one by one.
HTTP/2 PUSH isn’t supported on most CDNs
HTTP/2 PUSH is no longer required, but HTTP/2 at the edge is still required, which is much more widely supported by CDNs.
Multiple round trips are required from edge to origin for each part because HTTP/2 PUSH uses the link header, which isn’t cascading, which causes performance issues for edges with high latency back to the origin
With HTTP/2 PUSH being no longer required, origins must now block a request for a future part, and release it at full line speed when available, removing requirements for either a non-existent specification for chained HTTP/2 PUSH, or an extra round trip. While this isn’t the simplest task from an origin’s perspective, it is certainly easier than the extra effort required to support HTTP/2 PUSH. There are still concerns about the impact of large amounts of blocking connections, and the impact on scaling and resource availability at modern CDN edges which are designed to accept a request, and service it as quickly as possible.
Playlists and Parts have to be served from the same CDN edge hostname
With the removal of HTTP/2 PUSH, this is no longer a requirement.
The specification was not compatible with DASH-LL and results in double caching if you’re using both technologies
The specification should now be interchangeable with CMAF media being served for DASH-LL delivery. To a client, the particular type of open byte range request that the player will make is fairly functionally equivalent to the chunked transfer encoded response that DASH-LL is expecting, just with a subtly different pausing behaviour as segments are released in larger bursts by the origin and CDN.
It’s also worth mentioning that using open range requests in an interoperable mode results in fewer, larger objects being stored in CDN caches, which can also be directly re-used as parts transition into segments earlier in the playlist. This is likely to result in significantly improved cache efficiency at the CDN edge, and potentially cost savings depending on the CDN in question.
Request rate is too high, even with HTTP/2 PUSH
The expected request rate has been slightly increased from older versions of the specification due to segment and playlist requests being made separately, however when using a single, open range request for all the parts in a segment, this increase is negligible, but still a significant increase over traditional HLS request rates.
So is bandwidth estimation in a browser now solved?
Sadly not. While these changes certainly improve the specification significantly, and remove a lot of the barriers to implementation, there still remain open questions over how bandwidth estimation will work on browser based devices.
While the removal of HTTP/2 PUSH makes it slightly easier for browsers to detect network performance using the Resource Timing API, there are still significant limitations which will limit traditional bandwidth estimation techniques.
The Resource Timing API is currently unable to detect the periods during a long running transfer where there is no data being sent - you’re only able to see when the first byte of a response was received and when the last byte was received.
In the LL-HLS use case, this is fine if you assume that you make a request for every part, as you’ll be able to measure performance from the start of the part being sent, until the final byte, but this becomes an issue when using the open byte range requests I described above.
During requests that have pauses in data transfer, the Resource Timing API will count that pause time in response time, so the measured response time will be much higher than the actual time spent transferring data, rendering this approach useless for bandwidth estimation.
As I’ve previously observed, Apple doesn’t really have a problem to solve in their own ecosystem of devices, because AVPlayer has direct access to performance data on the underlying HTTP/2 stack. This means they will be able to measure either part download time, or the performance of the bursts of a part being downloaded in a range request accurately. In my opinion, this is why Apple continues to stick to the approach of using a “delivered at line speed” based approach - it makes it easy for them to continue reusing their current bandwidth estimations with minor changes.
So when are we going to see LL-HLS in the wild?
While this change in the specification certainly makes it easier for adoption within the industry, I’m expecting it’ll be a while until we see any implementations in the wild. I think the industry will suffer from a little “once bitten twice shy” attitude here.
Many companies have already spent significant time building, testing, and in some cases shipping support for Apple’s original revision of their LL-HLS specification, while Apple was still receiving feedback on their approach. With the latest version of the specification being fairly different, I suspect the industry may hold off a little longer this time until Apple declares this specification a final version.
As of right now, Apple hasn’t posted a client or server reference implementation for the latest version of the LL-HLS specification, and there’s no timeline for official support in iOS or safari. It’s likely that this revision of the specification will get the same treatment as the previous version, where support will be shipped to iOS beta devices, but hidden behind a feature flag which stops applications being accepted onto the App Store which use it.
As always, we remain committed to delivering a low-latency streaming experience that works across as many devices as possible. The latest version of this specification certainly helps accelerate that process.