Adaptive Bitrate (ABR) Streaming
Adaptive bitrate (ABR) streaming uses a source video format that is encoded to multiple bitrates. The most commonly used video codecs are H.264/AVC and H.265/HEVC. The most commonly used audio codec is AAC. See the Factsheet for an overview of supported codecs and formats.
Since a presentation consists of many different assets (audio, video, subtitles) we use a manifest file that describes the information of all the assets.
Creation of a server manifest file
The server manifest is used by the webserver module. It holds information about the available streams, DRM options, etc. The webserver module uses the information in the server manifest file to create the different client manifests / playlists and to apply any selected encryption.
When you are using an encoder (please see the Factsheet for an overview of supported encoders) for producing VoD smooth streaming presentations, you always have to re-create the server manifest file. USP stores additional information in the server manifest file which may not be available in the server manifest file generated by the encoder.
It is not necessary to create the playlists (.ismc, .m3u8, .mpd, .f4m). All these files are created on-the-fly by the webserver module.
The server manifest file comes in two flavours. A server manifest filename
ending with the extension .ism
is used for VOD presentations.
For LIVE presentation an extension of .isml
is used (note the additional
character l
for live).
Note
Please make sure that none of the tracks you use as input is 'empty' as this will cause an error (i.e., '415 FMP4_MISSING_TFRA') when a client manifest is requested through Origin.
Using fragmented-MP4 as source
Fragmented-MP4 video is a file following the ISO base media file format with one or more movie fragment boxes. Contrary to a progressive MP4, a fragmented MP4 is already prepared for ABR (Adaptive Bitrate) playout. This is the preferred format.
There are many variations of fragmented mp4, e.g. MPEG-DASH segment format, Microsoft's Smooth Streaming (Protected Interoperable File Format (PIFF)), UltraViolet's Common File Format, Adobe's F4F and CMAF (see also: How to package CMAF).
In its simplest form the command for creating a VoD server manifest file is:
#!/bin/bash
mp4split -o tears-of-steel.ism \
tears-of-steel-avc1-1000k.ismv
You can also specify multiple input filenames. Say you have the audio track in tears-of-steel-aac-64k.isma and two video files, tears-of-steel-avc1-400k.ismv and tears-of-steel-avc1-750k.ismv:
#!/bin/bash
mp4split -o tears-of-steel.ism \
tears-of-steel-aac-64k.isma \
tears-of-steel-avc1-400k.ismv \
tears-of-steel-avc1-750k.ismv
See Selecting specific tracks from an input file for more advanced options.
The video files may be stored remotely as well. The webserver is able to fetch the media data from any HTTP server (e.g. S3, Azure). Just specify the input files using fully qualified URLs:
#!/bin/bash
mp4split -o tears-of-steel.ism \
http://storage.server/tears-of-steel/tears-of-steel-aac-64k.isma \
http://storage.server/tears-of-steel/tears-of-steel-avc1-400k.ismv \
http://storage.server/tears-of-steel/tears-of-steel-avc1-750k.ismv
See Cloud Storage for more information.
Using CMAF trickplay as a source
New in version 1.8.3.
For fast-forward and scrubbing, a sync sample only track packaged with --trickplay can be added to the server manifest in addition to the regular ABR tracks.
#!/bin/bash
mp4split -o tears-of-steel.ism \
tears-of-steel-aac-64k.cmfa \
tears-of-steel-avc1-400k.cmfv \
tears-of-steel-avc1-750k.cmfv \
tears-of-steel-trickplay.cmfv
In HLS, the trickplay track is signaled as #EXT-X-I-FRAME-STREAM-INF
in
the master playlist. In DASH, the MPD the track is advertised in an separate
AdaptationSet marked with EssentialProperty conforming to DASH-IF Interoperability Points section 3.2.9 Trick Mode Support.
Using CMAF Tiled Thumbnails as a source
New in version 1.10.15.
An alternative method that provides thumbnails for fast-forward and scrubbing is offered by DASH-IF Interoperability Points section 6.2.6 Tiles of thumbnail images. In this case, a CMAF track is prepared with --trickplay --fourcc=jpeg, which adds multiple thumbnails into a tiled grid, and stores it in JPEG format.
The tiled thumbnails CMAF track can then be added to the server manifest, in addition to the regular ABR tracks:
#!/bin/bash
# Generate a tiled thumbnails track from a low-bitrate video track
mp4split --trickplay --fourcc=jpeg -o tears-of-steel-tiled-thumbnails.cmfv \
tears-of-steel-avc1-400k.cmfv
# Generate a server manifest for audio, video and tiled thumbnails tracks
mp4split -o tears-of-steel.ism \
tears-of-steel-aac-64k.cmfa \
tears-of-steel-avc1-400k.cmfv \
tears-of-steel-avc1-750k.cmfv \
tears-of-steel-avc1-1000k.cmfv \
tears-of-steel-tiled-thumbnails.cmfv
Note
Tiled Thumbnails are only supported for DASH. Other playout formats, such as HLS, do not support this feature.
By default, the DASH client manifest that Origin will generate for a stream with
tiled thumbnails will use a SegmentTimeline
for the tiled thumbnails track.
This is supported for DASH.js reference player on top of version 3.2.1 or above.
To be compatible with earlier versions of the DASH.js reference player, the generated MPD should use $Number$ instead of $Time$ based identifiers. The DASH specific version of the --[iss|hls|hds|mpd].minimum_fragment_length option can be used to switch to a $Number$ based SegmentTemplate (see the example below).
Note that using a $Number$ based SegmentTemplate puts more requirements on the content:
The ABR tracks should be prepared with a constant GOP size (for DASH, 2 seconds is recommended)
All the different bitrates should have the same GOP size
The
--mpd.minimum_fragment_length
option should be an exact multiple of the GOP sizeThe audio segments should have the same length as the video segments
For example, because the GOP size of the Tears of Steel example files is 4 seconds, we can use:
# Generate a server manifest from audio, video and tiled thumbnails
mp4split -o tears-of-steel.ism \
--mpd.minimum_fragment_length=4 \
tears-of-steel-aac-64k.cmfa \
tears-of-steel-avc1-400k.cmfv \
tears-of-steel-avc1-750k.cmfv \
tears-of-steel-avc1-1000k.cmfv \
tears-of-steel-tiled-thumbnails.cmfv
Using progressive-MP4 as source
You can also use progressive-MP4 files as input. Progressive MP4 files have a single index and require more work and memory by the Origin to process. For very large files it is recommended to use fragmented-MP4 instead.
When all the tracks (bitrates) are contained in one mp4 file, no server manifest is needed. When the tracks (bitrates) are individual mp4's a server manifest must be created to indicate which files belong together:
#!/bin/bash
mp4split -o tears-of-steel.ism \
tears-of-steel-aac-64k.mp4 \
tears-of-steel-avc1-400k.mp4 \
tears-of-steel-avc1-750k.mp4
Enabling fragmented MP4 HLS output
--hls.fmp4
New in version 1.8.3.
Output of fMP4 HLS using Unified Origin for VOD is simply enabled by using the
--hls.fmp4
option. When this option is used, Origin will output fMP4 HLS
instead of HLS Transport Streams. The protocol version signaled in the Master
Playlist will be '6', unless specified otherwise (which we do not recommend).
#!/bin/bash
mp4split -o tos-fmp4-hls.ism \
--hls.fmp4 \
tears-of-steel-aac-128k.cmfa \
tears-of-steel-avc1-1500k.cmfv \
tears-of-steel-en.cmft
Use tos-fmp4-hls-origin.sh
to set up a stream of the Tears of Steel
demo contents with support for fMP4 HLS (using the
CMAF contents packaged as input).
Attention
A number of options normally available for the configuration of
the HLS output should not be used in conjunction with the --hls.fmp4
option,
as they are specifically intended for a Transport Stream output. These options
are the following:
--hls.no_discontinuities
--hls.optimized
--hls.inline_drm
--hls.no_multiplex
--hls.pass_sei
--hls.psi
--hls.no_audio_only
--hls.no_elementary
Semantically, these options should be considered "off", except for
--hls.no_audio_only
, --hls.no_elementary
and --hls.no_multiplex
,
which default to "true".
Using HTTP Live Streaming (HLS) as source
New in version 1.7.2: Origin can ingest HLS TS streams (although using (f)MP4 sources for ingest is strongly recommended). If HLS TS is ingested, Media Playlists are used for creating the timeline and determining the URLs of the media segments.
Attention
Mixing HLS TS and (f)MP4 sources for ingest is not supported, and
if HLS TS is provided as ingest the segments are expected to be TS, so
elementary audio streams (e.g., .aac
) or plain-text WebVTT subtitles
(webvtt
or .vtt
) are not supported.
Create a server manifest file with the URL(s) of Media Playlist(s) as input, so that mp4split fetches the playlist(s) and extracts all information necessary to create the server manifest file:
#!/bin/bash
mp4split -o hls.ism \
https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel.ism/tears-of-steel-audio_eng=64008-video_eng=401000.m3u8
Note
If instead of a Media Playlist a Master Playlist is used as input when creating the server manifest, only the first Media playlist listed in the Master Playlist is ingested.
Requirements
General requirements from HTTP Live Streaming section 3 Media segments, specifically:
Transport Stream segments MUST contain a single MPEG-2 program
There MUST be a Program Association Table (PAT) and a Program Map Table (PMT) at the start of each segment
Additionally,
A media segment that contains video MUST start with an Instantaneous Decoder Refresh (IDR) coded picture and enough information to completely initialize a video decoder
Media segments MUST be decodable without information from other segments, the
EXT-X-INDEPENDENT-SEGMENTS
setting must be used. E.g. the PES payload may not spill into the next segmentThe duration value of the
EXTINF
tag MUST be accurateFor streams containing both audio and video (multiplexed), the PTS of the first audio Access Unit (AU) MUST be greater than or equal to the PTS of the first video AU
Using an I-frame Playlist for HLS TS ingest if not all video segments start with IDR
If the video segments of an ingested HLS TS Media Playlist all start with a
keyframe (the EXT-X-INDEPENDENT-SEGMENTS
setting must be used) we can use
that playlist to ingest the video.
If this is not the case (that is: the keyframes are positioned arbitrarily in the MPEG TS) then all segments need to be scanned to determine their timestamp.
For performance reasons, this cannot be done on-the-fly.
However, the exact IDR boundaries can be signaled in a separate
I-frame playlist (using the EXT-X-I-FRAME-STREAM-INF
tag and
EXT-X-BYTERANGE
). If the video of a HLS TS stream has arbitrarily positioned
keyframes, an I-frame Playlist must be used to ingest the video (instead of the
Media Playlist referencing the actual video segments).
If the HLS TS stream you want to ingest has arbitrarily positioned keyframes and no I-frame Playlist, one can be easily generated from the Media Playlist with the original video segments.
For example (using Unified Packager):
#!/bin/bash
mp4split --package_hls -o iframe-playlist.m3u8 \
--create_iframe_playlist \
original.m3u8
mp4split -o playout.ism \
iframe-playlist.m3u8
This approach can also be used to fix inaccurate EXTINF
values typically
found in older HLS TS content (protocol versions < 4).
Overriding and adding properties
All track properties are generated based on the input track. It is possible to override some properties. But note that is not necessary in the general case.
The following track properties can be overridden:
--track_name
By default the track_name is generated according to rules so that tracks with identical names are part of a set that a player can seamlessly switch between. The type of the track (audio/video/text) and e.g. the language fields are used to generate the name of the track. Also taken into account are the codec, the kind of a track, the samplerate, etc... When there are many switching sets we will add an additional postfix (_1, _2).
#!/bin/bash
mp4split -o presentation.ism \
audio.mp4 --track_name=audio \
commentary.mp4 --track_name=audio_commentary
--track_language
See --track_language.
--track_description
Adds a user-friendly description of the track.
When set, it overrides the defaults for the LABEL attribute (for alternative
audio tracks in HDS) and the NAME attribute (for media tracks in HLS, client
manifest version of 4 or higher required). For DASH
client manifests, it adds a Label
element to the track representation.
#!/bin/bash
mp4split -o presentation.ism \
tears-of-steel-aac-128k.mp4 \
--track_language=eng \
--track_description="English audio"
--track_bitrate
See --track_bitrate.
--track_role
See --track_role.
Options for VOD and LIVE streaming
The webserver module supports the following options. If an option is preceded
by iss.
, hls.
, hds.
or mpd.
then it is specific to that playout
format.
--[iss|hls|hds|mpd].disable
Disables playout of a specific format. The default is to allow playout to all formats (HTTP Smooth Streaming, HTTP Live Streaming, HTTP Dynamic Streaming and MPEG-DASH).
--[iss|hls|hds|mpd].minimum_fragment_length
The default duration for media fragments is determined by the keyframe interval (GOP size) used in the encoded video.
For HTTP Smooth Streaming and MPEG-DASH using a closed GOP of (approximately) two seconds works well. The GOP size is set by the encoder during the encoding process.
HLS is optimized for media segments with a duration of around eight seconds.
If your media segments are encoded in two seconds, then this option concatenates
multiple GOPs into a single media segment. Setting minimum_fragment_length=8
instructs the Origin to concatenate four GOPs into a single media fragment. An
example:
#!/bin/bash
mp4split -o video.ism \
--hls.client_manifest_version=3 \
--hls.minimum_fragment_length=8 \
....
Note
When using this option for DASH, the MPD that is generated defaults to a
SegmentTemplate that indexes segments based on $Number$ instead of $Time$. You
can override this behavior by also specifying mpd.segment_template=time
.
For more information about the difference between the two, and why $Time$ is
preferred is most cases, please read our blog post about this topic:
SegmentTimeline blog.
--[iss|hls|hds|mpd].base_path
Add the given path/URL to every URL used by the generated playlists and manifests. Defaults to empty, making all the requests relative to the manifest file.
--no_inband_parameter_sets
Codec parameters can be carried in the SampleEntry or in the NAL units in the samples. Carriage of codec parameters in the SampleEntry is preferred. This corresponds to codec configurations such as 'avc1' and 'hvc1' (i.e., as opposed to 'avc3' and 'hev1').
For AVC encoded content this option may be used to convert any avc3 encoded content to avc1 encoded output. Note that any content that has been processed by Remix will be output as avc3 by default, so using this option is necessary if you require avc1 output when using a Remix workflow, although in general the use of this option is not recommended.
MPEG-DASH
The following options are specific to MPEG-DASH streaming:
Note
For DASH output, the timescale of media is rescaled when the source uses a 10MHz timescale, based on framerate. When frame rate is not signaled media is rescaled to 90KHz because it fits all common framerates.
--mpd.min_buffer_time
The value of the @minBufferTime attribute in the MPD element (in seconds).
Note
The value of the minimum buffer time does not provide any instructions to the client on how long to buffer the media.
The value however describes how much buffer a client should have to avoid stalling under ideal network conditions. As such, MBT is not describing the burstiness or jitter in the network, it is describing the burstiness or jitter in the content encoding. It is a property of the content.
DASH-IF Interoperability Points section 3.2.8 Bandwidth and Minimum Buffer Time.
--mpd.inline_drm
Specify the content protection in the MPD.
This has the advantage that a player can quickly issue a license request, without having to load (or wait) on the initialization segment to become available.
--mpd.profile
The default is to select the profile that fits the presentation best, which is the approach that we strongly recommend.
This option will force the origin to signal a certain profile, regardless of the question whether the content matches the chosen profile or not. We therefore advise against using it. With the DVB Profile as the only exception, the origin won't actually make the content compliant.
The valid URNs for this option are the following:
Profile / Interoperability Point URN |
Section |
---|---|
urn:mpeg:dash:profile:mp2t-main:2011 |
ISO/IEC 23009-1 section 8.4 |
urn:mpeg:dash:profile:isoff-live:2011 |
ISO/IEC 23009-1 section 8.6 |
urn:com:dashif:dash264 |
DASH-IF DASH-AVC/264 section 6.3 |
urn:hbbtv:dash:profile:isoff-live:2012 |
HBB Profile |
urn:dvb:dash:profile:dvb-dash:2014 |
DVB Profile (DVB-DASH specification (ETSI TS 103 285)) |
DVB DASH (urn:dvb:dash:profile:dvb-dash:2014)
When this profile is selected a presentation with the requirements imposed by the DVB Profile (DVB-DASH specification (ETSI TS 103 285)) is written.
The content is offered using Inband Storage for SPS/PPS. That is, the codec signaled in the sample entry is
avc3
and each IDR frame starts with one or more SPS/PPS NAL units.Use a default of 15 seconds for MPD@SuggestedPresentationDelay.
--mpd.segment_template
The indexing mode for SegmentTemplate. Defaults to "time", except when
mpd.minimum_fragment_length
is specified (then it defaults to "number").
Indexing mode |
URL template |
derivation of MPD start time and duration |
---|---|---|
time |
$Time$ |
SegmentTimeline |
number |
$Number$ |
@startNumber and @duration |
number_timeline |
$Number$ |
SegmentTimeline and @startNumber |
Note
If the last segment of a track would be less than 0.5 seconds long it is
appended to the previous segment. Using a $Number$
-based timeline this can
cause issues if a player does not take into account the value for
endNumber
, which communicates the last segment number for a given track if
it deviates from what can be expected based on the media's length and the
segments' average duration.
In such cases one of the following workarounds may be used (if switching to a
$Time$
-based timeline isn't feasible, which would be the recommended
solution):
Set
mpd.minimum_fragment_length
to a value that fits an exact number of audio frames (which have a length of 1024/48000 seconds if you use 48KHz audio) and an exact number of GOPs (this is impossible if a non-integer frame rate is used)Ignore the 404s associated with numbered segments
Get the player to recognize and use the
endNumber
value in the MPD
HLS - HTTP Live Streaming
The following options are specific to HTTP Live Streaming:
--hls.client_manifest_version
Output protocol version of the client manifest (.m3u8) file. Defaults to 1 specifying protocol version 1. For a detailed overview see here
Version 3 adds floating point durations which may be more accurate.
Version 4 adds media groups (instead of multiplexed streams) and enables
WebVTT support. It also adds EXT-X-I-FRAMES-ONLY
variants to the master playlist.
Note
Please note that only functionality that breaks backwards compatibility puts requirements on the protocol version that must be signaled: 'Protocol Version Compatibility' paragraph of the hls specification.
--hls.no_audio_only
Do not include the audio only track in the variant m3u8 playlist file. (Defaults to false.)
--hls.no_discontinuities
Output MPEG-TS streams with no discontinuities. (Defaults to true.)
--hls.optimized
Output MPEG-TS streams with optimized PES placement. (Defaults to false.)
Optimized PES placement reduces the overhead introduced by the MPEG-TS container, especially for low bitrate video streams. The additional complexity introduced in the MPEG-TS stream is fully compatible with HLS, but for achieving maximum compatibility with older legacy devices the default is false.
--hls.pass_sei
Wrap SEI messages into ID3 Tags.
--hls.subtitles_subformat
This option configures the subformat for tracks of type SUBTITLES and overrides the default of using WebVTT output. It is intended to support specific legacy implementations that combine TTML with HLS TS, like SMPTE-TT using bitmap images.
When set to "SMPTETT" (case sensitive) an additional SUBFORMAT attribute is added to the Master Playlist, signaling that the subtitles are formatted as SMPTE-TT XML. This also disables the conversion to WebVTT that Origin would normally apply, and simply passes through your TTML input (therefore, if your TTML input is not compatible with SMPTE-TT, your output won't be either). Also note you need a custom player supporting this format, as support for it is not part of the HLS specification.
--hls.no_elementary
By default we output elementary streams for "aac", "ac3" and "ec3" audio. If this option is specified then the elementary streams are muxed into an MPEG-TS stream instead.
--hls.no_multiplex
By default, Origin muxes video of a variant with the default audio track of the audio group that is associated with the variant. Another default is that even separate audio tracks are delivered in TS segments. The latter is incompatible with the HLS specification, which requires separate audio tracks to be delivered as elementary streams (or in a fMP4 / CMAF container, which is not relevant for Origin's HLS TS output).
If this option is specified then the HLS TS streams will be fully compliant with the HLS specification.
HDS - HTTP Dynamic Streaming
The following options are specific to HTTP Dynamic Streaming:
--hds.client_manifest_version
Output version of the client manifest file (.f4m). (Defaults to 1). Version 2 adds support for Late Binding Audio.
When the client manifest version is set to 2 and the dvr window length
is set to 60 seconds or higher a <dvrInfo>
tag is written in the manifest.
--hds.inline_drm
Inline the DRM meta data blob in the manifest file (instead of a URL).
--hds.no_onfi
The onFI
message is added to any stream that has video or is an audio only
stream (manifest version 1).
The messages are not added in an alternate audio stream (manifest version 2) since the OMSF framework stalls on alternate audio tracks that has SCRIPT tags. Note that this is okay, since the onFI messages are present in the main presentation and for Radio/Audio only presentations you use manifest version 1.
The timestamps in an onFI message are aligned to seconds and is written once
per second. The system-date sd
(dd-mm-yyyy) element is only present when
UTC timing is present. The system-time st
(hh:mm:ss.sss) element is always
present.
Setting this flag skips inserting any onFI
message at all.
HSS - Smooth Streaming
The following options are specific to ISS Smooth Streaming:
--iss.client_manifest_version
Output version of the client manifest file. Defaults to 20 specifying version 2.0. Version 2.2 client manifests add support for compressed timelines, this greatly reduces the size of the manifest files.
Progressive download
See Progressive download and Download to own for HTTP Live Streaming (HLS).