Must Fix: a suitable bitrate ladder (content dependent)
One should prepare video content as a set of different bitrate tracks, with each of those tracks representing a different quality level. The selection of different bitrates is called a bitrate ladder.
A potential ladder could be (this should only be regarded as an example of how a ladder might look like, not as a recommendation to use this particular one):
Resolution (16:9) |
Bitrate (H264) |
---|---|
416x234 |
145 kb/s |
640x360 |
365 kb/s |
768x432 |
730 kb/s |
768x432 |
1100 kb/s |
960x540 |
2000 kb/s |
1280x720 |
4500 kb/s |
1920x1080 |
6000 kb/s |
1920x1080 |
7800 kb/s |
Choosing a good ladder is important for quality and efficient delivery. What is 'good' depends on the capabilities of the end users' devices, network capacity, the codec that is used, and the content itself. Ideally, you adjust the bitrate ladder per asset in your library (as some content requires higher bitrates to achieve the same quality, and other content can be encoded more efficiently).
Choosing a bitrate ladder is subject to different opinions and research. For example, the above ladder is taken from Apple HLS Authoring Specification. A more in-depth look at bitrate ladders can be found in Optimal Design of Encoding Profiles for ABR Streaming, a paper that Yuri Reznik presented at the Packet Video Workshop during ACM MMSys 2018.
A short summary of Reznik's paper is that the first thing to explore would be the quality versus bitrate curve of your content, which, as noted earlier, will differ per asset. That is, for some assets introducing a higher bitrate won't offer significant gains in quality (as measured by PSNR and SSIM).
The second thing to determine is the steps between the bitrates that you want to offer. These steps will often be bigger for Live content than for VOD.
Finally, you can optimize a bitrate ladder for the characteristics of your network. This step is a bit more challenging, as you'll need a model for your network bandwidth and client behavior. In Reznik's paper, a simple approach based on an LTE model is used, with a client that will always choose the highest possible bitrate within the constraints of estimated available bandwidth.
Note
The aspect ratio must remain the same across your entire bitrate ladder.
Audio
The need for an audio specific bitrate ladder is less obvious, since audio can be encoded at high quality using relatively low bitrates (compared to video). That is, the difference between a 128 kb/s encoded AAC stereo track and a 64 kb/s version of that same track might not be worth complicating your setup for when streaming video.
This is different from a setup where you expand the audio that you offer beyond stereo, to include surround sound. You may even use a variety of codecs for your surround sound offerings (e.g., Dolby EC-3 and DTS:X). However, it is recommended to follow the Apple HLS Authoring Specification and make all of your audio offerings (different language and audio description tracks) available in the same combinations of codec and bitrate, e.g. (where 'AD' stands for an accessibility track that offers audio description):
Codec |
Language |
Bitrate |
---|---|---|
AAC-LC |
English |
128 kb/s |
AAC-LC |
English (AD) |
128 kb/s |
AAC-LC |
Spanish (dubbed) |
128 kb/s |
Dolby EC-3 |
English |
384 kb/s |
Dolby EC-3 |
English (AD) |
384 kb/s |
Dolby EC-3 |
Spanish (dubbed) |
384 kb/s |
Note
When multiple languages are made available, all the audio profiles (codec and bitrate combinations) must be present for each language for Origin's HLS output to be compliant with the Apple HLS Authoring Specification.
Exception: radio (with audio only streams)
When an audio only streams are offered it may become more worthwhile to differentiate different bitrates for stereo tracks, so that end users can enjoy these streams even on very limited connections. In such cases, HE-AAC might be used for lower bitrates, and AAC-LC for higher ones. Perhaps even using different sample rates:
Bitrate (kbps) |
Samplerate (KHz) |
Audio codec |
---|---|---|
24 |
24 |
HE-AAC |
64 |
32 |
HE-AAC |
96 |
48 |
HE-AAC |
128 |
48 |
AAC-LC |
320 |
48 |
AAC-LC |
384 |
48 |
Dolby AC-3 |
Should Fix: all tracks are compliant with a CMAF media profile
CMAF media profiles are an important feature of MPEG's CMAF specifcation. They define specific configurations of media and content. These profiles are mainly defined by MPEG and the CTA-WAVE Content Specification Task Force.
Media profiles of a visual track may for example define the following:
Maximum frame width and height
Visual Usability Information (VUI) usage
Codec usage and profile of the codec (e.g., AVC high profile)
Other codec specific settings
A brand name (4 character letters) to identify the profile used
Some example media profiles are 'cfhd' for High Definition AVC video and 'caac' for AAC audio.
Do note that for full CMAF compliance there are other constraints that have not been mentioned yet, but that tracks need to adhere to as well (typically, encoders will make sure this is the case):
Same aspect ratio for all renditions within a switching set (a "group" of video tracks that share a number of characteristics so that a player can switch between them without issue)
Same color space and color transfer characteristics for all renditions within a switching set
Visual tracks only contain samples for display, without padding of the image using black pixels to fit the aspect ratio
Bit-depth and chroma format does not change within tracks
Should Fix: timed metadata is carried in a separate sparse track
If all Timed Metadata for a stream is contained in a separate sparse track, Origin can rely on a single source of information for such metadata. Otherwise, Origin needs to scan all media tracks for potential Timed Metadata, which is less efficient and may also result in conflicting information (when certain Timed Metadata is present in one track, but not in another and vice versa).
A sparse track does not represent a continuous stream of data, but an intermittent one. This is an ideal fit for Timed Metadata, which occurs intermittently, as opposed to the tracks that store audio and video, which should be continuous.
Should Fix: add an audio description track (for the visually impaired)
To increase accessibility of your content you should add audio description tracks where possible. An audio description track is a track that contains not only the regular audio, but also a spoken description of what is happening in the video.
For VOD please refer to the Configuring Audio description track how to to know how to add a audio description track (note that is important to follow the steps exactly).
Should Fix:: avoid transcoding of subtitles when using advanced styling
If your subtitles contain cues with advanced styling, do take into consideration that this styling is stripped when Unified Packager or Unified Origin is used to transcode these subtitles from one format into another, i.e., TTML into WebVTT or vice versa.
That is, if you need advanced styling to be present in your output, make sure that your source subtitles are already encoded in the format that is required (i.e., TTML, WebVTT, or both).
Note
Captions embedded in the video track (i.e., CEA 608/708) should only be used if there is a clear business case/need for it. Otherwise using them is not recommended.