beraliv

A story of an unknown low-tier device and its MSE issues / London Video Tech 2023 notes

⚠️ WARNING: The existing version is in its initial phase and will undergo enhancements and improvements.

A slide about final approach to mitigate MSE issues on a low-tier device
A slide about final approach to mitigate MSE issues on a low-tier device

Prerequisite

I've been working with a low-tier device at DAZN for quite a while. Because I had so many helpful materials I've decided to share them with people in outer video community. It took me only 2 weeks to prepare, including several runs at DAZN to collect the feedback, a refinement of the structure and slides updates.

The goal of the talk was:

  • to highlight specifics of working with low-tier devices,
  • to demonstrate its MSE issues,
  • to show how to mitigate them.

Because I was a part of video community for some time, I knew there is London Video Technology. I've been on LVT, The Summer of Streaming S06E01 on summer 2022. But I never applied myself.

Luckily it's actually a small world, especially in video streaming. Nearly everyone knows everyone. With the help of Ant Stansbridge who works with me at DAZN, I met Phil Cluff and had a chance to present my talk.

On October 3, I went to BBC Broadcasting House where LVT, Upping the Auntie - S07E03 took a place. It was a pleasure to meet Phil Cluff and Alan Robinson in person. I had people from DAZN supporting me: Luke Belfield (Engineering Manager), Ash Byrom (Staff Engineer) and Ant Stansbridge (Principal Engineer). Thank you a lot for being there for me 🧡

It was my first talk in English and I am very very happy with the way it went. I can say that it went much easier than the one in Russian.

Let's walk through all materials that I have so far.

Table of contents

What a low-tier device is 📺

For me it's a device with limited resources, e.g. low energy consumption, low-tier hardware, etc.

Low-tier devices include Smart TVs (e.g. Samsung TV with Tizen OS, LG TV with WebOS, Panasonic, etc), dongles (such as Chromecast) and different set top boxes, or STBs (there are many UK providers and almost each of them have at least one STB).

Analysis of the common bits

Price

Low-tier devices are cheap. Although the price really depends on the type of the device, e.g. Smart TVs can be more expensive than STBs. The lower prices are mainly caused by low-tier hardware, e.g. weak processor, insufficient RAM, slow hard drive or even limited CPU.

Performance vs energy consumption

This low-tier hardware brings low performance. On the one hand, it can be beneficial for users when we're talking about low energy consumption as it may help people save their bills. On the other hand, users get low video quality (mostly SD or/and HD).

Maintainability

Developers have to work with bespoke Web API, meaning that even though Web API signatures are identical to, e.g. Web Browsers, but it doesn't work the same way which is a bit frustrating. So as a developer, you have to accept it as a risk and keep an eye on it.

What MSE issues we can face 🤒

What is MSE

First of all, let's define what MSE, or Media Source Extensions, is:

Media Source Extensions, or MSE, is a set of APIs that allows player developers to playback audio and video content as well as showing text content to the viewer

I've also attached the diagram from MSE specification that shows what MSE includes: it is a MediaSource instance with 3 SourceBuffer instances for each type of media type (i.e. video, audio and text). Under the hood, there are video and audio decoders that are responsible for decoding audio and video information and play the content to end user.

The diagram with MSE components from MSE specification
The diagram with MSE components from MSE specification

Examples from Samsung, Panasonic, etc

Here I'd like to mention 4 examples of MSE issues I've worked before.

The stalled event dispatched on HTMLVideoElement towards the end of buffered ranges

Many Living Room devices, such as old Samsung TVs, have stalled event dispatched near the end of buffered ranges. It can be easily mitigated by introducing a safe gap to the end of buffered ranges.

The SourceBuffer.remove call with small time range throws an error

Such SourceBuffer's remove method behaviour could be observed on shaka-player for Samsung TVs. This could be mitigated by introducing a threshold for small time ranges.

Simultaneous SourceBuffer.appendBuffer calls on audio and video SourceBuffer instances could complete to each other leading to unhealthy buffer

This issue depends on MSE/EME player implementation. When audio and video SourceBuffer instances are managed separately, some Living Room devices and STBs cannot cope with simultaneous SourceBuffer's appendBuffer method calls correctly which leads to unhealthy buffer.

It can be mitigated by introducing some kind of "manager" which handles SourceBuffer['appendBuffer'] calls in one place (such as StreamingEngine in shaka-player).

MediaSource.isTypeSupported calls return true for any given MIME type of the media

This particularly happens on Samsung TVs 2.3 and 2.4. Although it's not a specification-compliant approach, `${mimeCodec};width=${width}` (where width is a representation width) can mitigate the issue when passed to MediaSource['isTypeSupported'] in place of mimeCodec.

MSE issues of an unknown device 🧪

Introduction to the problem

The issue that I've encountered was that the playback wouldn't start on this particular device and I couldn't understand why it doesn't work because playback successfully started for a couple of other devices.

Effectiveness

Each issue can be solved differently. While one solution can be effective and another one can be not as effective as first one, effectiveness has to be measured to understand what solution to choose. I've decided to use pass rate of one functional test which was run 100 times.

The functional test had a particular structure:

  1. Create player
  2. Load content
  3. Wait for start of playback
  4. Assert that playback status is playing

When playback reaches playing status, it means the test passed. Otherwise, it failed.

Definitions to diagrams

During my talk a lot of diagrams were used to better picture the problems and solutions.

Definitions
Definitions
  1. It will include video element events (such as loadedmetadata, seeking, canplay or others) as purple lines.
  2. Blue long rectangles represent the process of appending segments, e.g. audio or video.
  3. Timeline includes both audio and video source buffers with information when segments were appending.

To the bottom, you will also see buffered ranges, or the information of segments that are already appended to source buffers.

  1. When you will see a green short rectangle, there is an added segment in source buffer. For example, first video segment was appended here.
  2. When you will see a red short rectangle, there is a removed segment in source buffer. For example, first audio segment was removed here.

Problem 1. Previous segments are automatically removed

Given the playback hasn’t started, I’ve decided to collect the logs and put them as a diagram. I’ve spotted that before the potential start of the playback, there were multiple audio appends (audio 1 and audio 2). And when I looked at the buffered ranges at the start and the end of append buffer process, I’ve seen the buffer of the first appended audio segment disappeared.

Diagram with an automatically removed segment from source buffer
Diagram with an automatically removed segment from source buffer

Because I’m somewhat familiar with shaka-player, I’ve compared it to the way shaka works and I’ve seen it has less frequent appends so I’ve decided to delay the second audio segment append. The main question here is for how long I want to delay appends.

Solution 1. Delay appending segments

Given I have a functional test, I’ve run it several times and picked several events as candidates for a moment when I can continue appending segments. There were 3 of them. Effectiveness of loadedmetadata event has shown that it’s too early to continue appends as it would not let playback start. When we set start time, we have a seeking event dispatched on video element and it showed that it’s only 90% effective. canplay and canplaythrough events showed 100% effectiveness so I picked canplay as an earlier event.

Diagram with segment appends delayed
Diagram with segment appends delayed

Problem 2. Not enough data to start playback

The second issue that I’ve seen was related to start time we set. For different types of content we do it differently.

On the picture below there are 2 scenarios.

For the first scenario, start time is set closer to the beginning of last appended segment so it has enough data to start the playback. In this case we will start a playback with the delaying approach and it will work consistently.

For the second scenario, the start time is set closer to the end of last appended segment so it doesn't have enough data to start playback.

Diagrams showing how start of playback depends on position of start time in relation to buffered ranges
Diagrams showing how start of playback depends on position of start time in relation to buffered ranges

As we’re currently delaying appending more than one segment, stalled event will be dispatched, and playback wouldn't start.

Diagram with segment appends delayed when there is not enough data
Diagram with segment appends delayed when there is not enough data

Solution 2.1. Device-specific threshold

One way to mitigate the issue is to introduce a device-specific threshold that defines what the enough data is. If the amount of buffer ahead is less than or equal to the threshold, it's treated as not enough data. Otherwise, there is enough data to start playback.

You can use known limitations of the device and binary search to choose the optimal value.

Given the threshold was chosen, it is used when seeking event dispatched on video element. At this point, the decision has to be made whether to continue delaying segment appends or allow to append another audio/video segment.

Diagram with device-specific threshold with not enough data
Diagram with device-specific threshold with not enough data
Diagram with device-specific threshold with enough data
Diagram with device-specific threshold with enough data

It showed that this approach didn't change the effectiveness when there is enough data and the effectiveness is still 100%. When there is not enough data, it is 92% effective. Can we make it better?

Solution 2.2. Waiting event and video element ready state

Another way to mitigate the issue is to use waiting event and video element ready state. The waiting event dispatched when playback has stopped because of a temporary lack of data. The readyState indicates the readiness state of video.

In this solution waiting event would be a trigger to stop delaying segment appends in case readyState is less than HAVE_FUTURE_DATA. The readyState is used in an equation because some low-tier STBs may dispatch waiting events by accident so platform has to ignore these events.

Diagrams with solution 2.2, case with enough data
Diagrams with solution 2.2, case with enough data
Diagrams with solution 2.2, case with not enough data
Diagrams with solution 2.2, case with not enough data

Based on test runs, this approach demonstrated 100% effectiveness for both test cases: not enough data and enough data.

So that's it, that easy?

Problem 3. No waiting event

All previous player changes were tested in isolation meaning that there are no UI changes, no analytics and no third-party scripts loaded that usually exist in full application. But when I started testing the changes at the environment closer to the full application, I've seen that previous solution still doesn't work.

After pulling some logs it was clear that there is no waiting event dispatched on the video element before stalled event.

Diagrams with problem 3
Diagrams with problem 3

So I started thinking how I would mitigate this issue and what could be used instead of waiting event.

Solution 3.1. Stalled event

There were a couple of available solutions and stalled event was one of them. Eventually this event was dispatched on video element so why not use it?

Diagrams with solution 3.1
Diagrams with solution 3.1

Although it was a possible candidate, it was completely ineffective (0% in case of "not enough data") for our in-house video player so I decided to try another solution.

Solution 3.2. Timeout after seeking or stalled if earlier

Another way to workaround this is using timeout. Isn't it always a solution to any problem? 😅

The challenge with timeout was that it has to be small enough and be effective at the same time.

I've defined it as a smallest observable time after start time is set (therefore seeking event dispatched on video element).

Diagrams with solution 3.2
Diagrams with solution 3.2

The best timeout candidate was 90% effective which wasn't too bad, but it wasn't the best solution so it wasn't chosen as a final solution.

Solutions summary

The effectiveness for all solutions that I've mentioned in my talk is evident in this table:

SolutionEnough dataNot enough data
1. Delay appending segments100%0%
2.1. Device-specific threshold100%92%
2.2. Waiting event and video element ready state100%0%*
3.1. Stalled event100%0%
3.2. Timeout after seeking or stalled if earlier100%90%

Initially, the concept of using device-specific thresholds seemed unfavorable, but it turned out to be the most effective approach compared to all other suggestions.

The strategy involving the waiting event initially seemed highly promising but eventually posed a challenge as it became a limiting factor for the target, proving challenging to resolve.

Final solution

So final solution looks as a following state machine:

State machine with final solution
State machine with final solution
  1. Before start of a playback, segments for both audio and video started appending
  2. Once segment for either audio or video has appended, start delaying segment appends
  3. When start time has set on video element, event listener is added for seeking event dispatch
  4. When seeking event is dispatched, measurements of enough buffered data based on a device-specific threshold are made
  5. If there is NOT enough data on video element, stop delaying appends and start again from step 2.
  6. If there is enough data on video element, event listener is added for canplay event dispatch
  7. When canplay event is dispatched, stop delaying appends
  8. Playback has started 🟢

Conclusion ⭐️

Lesson learnt:

  1. Aggressive strategies (appending as many segments as possible) may not work on low-tier devices
  2. Generic approach on low-tier devices may not necessarily be effective
  3. End-to-end testing at early stages (full app) ⭐️

I'd like to stress that conducting functional testing in an environment closest to the end user is essential. It aids developers in detecting issues early on, thereby saving valuable time for the business.

  1. Google slides - https://docs.google.com/presentation/d/1M99IYUyWb0I3OJDng3CppqksxkXCHyIeMVkE81lSNSI/edit?usp=sharing
  2. Diagrams - https://www.tldraw.com/r/k08aBuV4b_maWj1Xv3DX7?viewport=-575%2C92%2C1513%2C910&page=page%3AA5up7ZSQODMZj5XCdf7h8
  3. Media Source Extensions Spec - https://www.w3.org/TR/media-source-2/
  4. shaka-player - MSE/EME OSS player - https://github.com/shaka-project/shaka-player
player

Comments

Alexey Berezin profile image

Written by Alexey Berezin who loves London 🏴󠁧󠁢󠁥󠁮󠁧󠁿, players ⏯ and TypeScript 🦺 Follow me on Twitter