Packaging for Unified Origin¶
Table of Contents
Fragmented-MP4 is also known as PIFF, ISMV or CMAF (down below there is a specific section on How to package CMAF. Converting from an MP4 file to a fragmented mp4 file (.ismv):
#!/bin/bash mp4split -o tears-of-steel-avc1-750k.ismv \ tears-of-steel-avc1-750k.mp4
The other way around, from fragmented MP4 / PIFF / ISMV / CMAF to MP4:
#!/bin/bash mp4split -o tears-of-steel-avc1-750k.mp4 \ tears-of-steel-avc1-750k.ismv
Or from Adobe's F4F to MP4:
#!/bin/bash mp4split -o tears-of-steel-avc1-750k.mp4 \ tears-of-steel-avc1-750k.f4f
You can also use the server manifest file as input. All the audio and video streams referenced in the manifest file are combined into one MP4 file.
#!/bin/bash mp4split -o tears-of-steel-avc1-750k.mp4 \ tears-of-steel-avc1.ism
It is also possible to package multiple tracks into a single MP4. When doing this, the order in which the tracks are specified on the command-line will define the order of the tracks within the MP4 (which can be useful for Define default track when preparing content (track order)):
#!/bin/bash mp4split -o tos_avc1-sorted.ismv \ tears-of-steel-avc1-1000k.mp4 \ tears-of-steel-avc1-1500k.mp4 \ tears-of-steel-avc1-750k.mp4 \ tears-of-steel-avc1-400k.mp4
New in version 1.8.3.
The process of packaging CMAF is very similar to using
mp4split to package
ISMV. Depending on the extension of the output,
mp4split will decide how the
content is packaged. In the case of CMAF, you can choose between
.cmfa for audio and
.cmft for text streams.
The command-line that you use is very straightforward, as shown in the examples for packaging CMAF audio, video an text streams video below:
#!/bin/bash mp4split -o tears-of-steel-aac-128k.cmfa tears-of-steel-aac-128k.mp4 mp4split -o tears-of-steel-avc1-2200k.cmfv tears-of-steel-avc1-2200k.mp4 mp4split -o tears-of-steel-en.cmft tears-of-steel-en.srt
tos-cmaf-packaging.sh to package the Tears of Steel demo contents
If you are packaging CMAF to support fMP4 HLS and want to follow Apple's recommendation of delivering 6 second segments, you will have to make sure that the CMAF files that contain your media are fragmented with this length in mind as each fragment in a MP4 equals a 'play-out' segment when content is statically packaged (when using Origin, the segment length can be changed on-the-fly for each play-out format).
You can make sure your content is fragmented to your liking by using the
--fragment_duration option while packaging CMAF (per default, Unified
Packager will fragment the MP4 according to the GOP structure of the media).
See below for an example.
However, do note that the specified length should be a multiple of the media's GOP length, as segments can only contain full GOPs. So, in case you are ingesting media with 1.92 seconds GOPs and want to aim for a segment duration of 6 seconds, you should specify a fragment duration of 576/100.
In below example GOP length is 48 frames at 24Hz, or 2 seconds. We use 4 GOPs per fragment to align segment boundaries with audio (because 375 blocks of 1024 samples at 48kHz is precisely 8 seconds):
#!/bin/bash mp4split -o tos-8s-aac-128k.cmfa \ --fragment_duration=384000/48000 \ tears-of-steel-aac-128k.mp4 mp4split -o tos-8s-avc1-2200k.cmfv \ --fragment_duration=192/24 \ tears-of-steel-avc1-2200k.mp4 mp4split -o tos-8s-en.cmft \ --fragment_duration=8/1 \ tears-of-steel-en.srt
If you want to override or add any properties to a track when packaging CMAF, you can do so using the familiar options: Overriding and adding track properties.
Solving WebVTT sync issues¶
If your workflow for fMP4 HLS involves one or more WebVTT files that are or were
part of a HLS Transport Stream playout scenario, chances are that a 10 seconds
time offset is signaled in the WebVTT (using the
If this is the case, it will result in the subtitles being out of sync as the
default when packaging CMAF is to not use a time offset. To synchronize the
WebVTT timeline with the other media you can simply remove the
EXT-X-TIMESTAMP-MAP tag from the WebVTT file.
Synchronizing the other media to the WebVTT timeline is also possible, but not
recommended. To do so, offset all other media when packaging by adding the option
The packager supports the following options.
Create a (progressive) mp4 that references a fragmented mp4 file, for ''progressive download'' to older players. Unified Origin will resolve media data references on playout.
New in version 1.7.27.
Like "--use_dref" creates mp4 that references an mp4 file, but without explicitly referencing sub-samples, resulting in a (considerably) smaller video mp4.
Do not use if you want offer a download to own option, because that requires sub-sample data.
Do not write the output.
The output timescale used for media. Defaults to the original media or 10MHz when the "piff" brand is used.
The target duration of each fragment expressed as a fraction "X/Y" of seconds
(or in milliseconds "X"), default 2s.
Behaviour is similar to
#EXT-X-TARGETDURATION in HLS or
maxSegmentDuration in MPEG DASH.
When sync-samples are present, each fragment starts with a sync-sample and has
0 or more additional sync samples - as many as will fit into fragment duration.
This parameter can be useful to align fragment boundaries across different codecs or tuning the fragment duration for a specific playout format. For example, Apple recommends 6 second segments in HLS, while 2 seconds is common in Smooth Streaming.
Sets the 'brand'. Common options: "piff", "iso6", "ccff", "dash" and "cmfc". Default is "iso6", but with timescale=10000000 (10Mhz) the default is "piff". When packaging CMAF (i.e., extension of output is .cmfv, .cmfa or .cmft) the default is "cmfc".
When creating (progressive) mp4 files with negative-composition-times "iso4" is used as brand. When using "iso2" negative composition time offsets are disabled and an edit list is used to compensate for the ct_offset.
By using the
--brand option, you overrule the default major brand for the
given output. When using the option more than once on the same command-line, any
brands specified after the first will be added as compatibility brands.
For instance, the first example below will result in a CMAF-file with "cmfc" as its major brand and "iso9" as a compatibility brand, whereas the compatibility brand will be "dash" in the second example:
mp4split -o example.cmfv --brand=cmfc --brand=iso9 example.mp4 mp4split -o example.cmfv --brand=cmfc --brand=dash example.mp4
When generating a fragmented or progressive MP4 file (.mp4, .isma, .ismv or .ismt) from an input track, its track properties are based on the properties of the input track. It is possible to add or override some of these properties, but in most cases, this is not necessary.
When generating a fragmented or progressive MP4, the track properties that can be added or overridden are the following:
By default track_language is taken from the input track's media info. In case you do need or want to set the language for a track, make sure to use the correct RFC 5646 language tags. These language tags consist of two-letter, three-letter, extended languages, and scripts.
Only DASH and HLS offer support for RFC 5646. For output formats that do not
support it, a tag's first two or three characters are parsed according to
ISO 639-1 or ISO 639-2/T. If this does not result in a valid language tag,
und is used. To make sure that a valid fallback option is available, it is
good practice to specify a macro language when possible. For example: signal
Cantonese Chinese using
zh-yue rather than
yue so that 'Chinese' is
used as the fallback option.
For example, specifying languages with two letter ISO 639-1 language codes:
For languages that do not have two letter language codes but do have ISO 639-2/T or ISO 639-3 codes:
For languages as used in different regions:
en-UKfor English as used in the UK
nl-BEfor Dutch; Flemish as used in Belgium
pt-BRfor Portuguese as used in Brazil
For additional scripting tags for languages:
sr-Cyrlfor Serbian using the Cyrillic script
zh-Hansfor Chinese using the simplified script
In addition to the language tag, HLS requires the presence of a language name. For tags that are part of ISO 639-1 or ISO 639-2/T mapping of the tag to a name is automatic. For all other language tags, --track_description should be used to signal the name.
#!/bin/bash mp4split -o audio-en.mp4 --track_language=en mp4split -o audio-nl-be.mp4 --track_language=nl-be \ --track_description="Vlaams Nederlands" mp4split -o audio-zh-yue-hant.mp4 --track_language=zh-yue-hant \ --track_description="Cantonese Chinese using Traditional script"
Overrides the average bitrate of a track.
By default track_bitrate is the average bitrate (either from the metadata info
of the input track, or calculated from the source samples). You can also set
max, so that the maximum/peak bitrate is used.
#!/bin/bash mp4split -o tos-override-bitrates.ismv \ tears-of-steel-aac-64k.mp4 --track_bitrate=64000 \ tears-of-steel-aac-128k.mp4 --track_bitrate=128000 \ tears-of-steel-avc1-400k.mp4 --track_bitrate=400000 \ tears-of-steel-avc1-750k.mp4 --track_bitrate=750000 \ tears-of-steel-avc1-1000k.mp4 --track_bitrate=1000000
Sets the role of a track and can be used to further distinguish it, next to bitrate and language. The exact meaning of a role can be dependent on the kind of track it is added to (video, audio or text). All of the roles specified in urn:mpeg:dash:role:2011 can be used. Most of them are listed in the table below:
|main||main media intended for presentation if no other information is provided.|
|alternate||media that is an alternative to the main media of the same type.|
|supplementary||media that is supplementary to media content of a different media component type.|
|commentary||media content component with commentary.|
|caption||media content component with captions (typically containing description of music and other sounds, in addition to transcript of dialog).|
|subtitle||media content component with subtitles.|
|description||track containing textual description (intended for audio synthesis) or audio description, describing visual component.|
|metadata||media component containing information intended to be processed by application specific elements.|
|forced-subtitle||textual information meant for display when no other text representation is selected.|
#!/bin/bash mp4split -o example-audio.isma \ example-audio.mp4 --track_role=main --track_language=eng mp4split -o example-commentary.isma \ example-commentary.mp4 --track_role=alternate --track_language=eng
New in version 1.7.31.
Adds a SchemeIdUri/Value pair to the 'kind' box when packaging a (f)MP4. This box should describe the intended purpose of the track. Similar to the --track_role option described above, the --track_kind option can be used to further distinguish a track, besides its bitrate and language.
Specifying the parameters of this option is done like so:
Where the <SchemeIdUri> and <Value> should be replaced with parameters of choice,
preferably from the
about:html-kind defined by W3C HTML5 or the
scheme defined by MPEG-DASH (although the latter can be signaled more easily
using the --track_role option).
When packaging for Unified Origin, the main use case of the
is adding and properly signaling tracks that provide accessibility features,
such as captions for the hard of hearing or an audio description of the video
track for the visually impaired. The first can be signaled using
--track_role=caption), whereas the
about:html-kind can be used to signal the latter with
Take for example the situation in which the
is present in a track that has been added to a server manifest. Unified Origin will
then add the following parameters for this track when generating a DASH client
manifest (.mpd) and HLS main playlist (.m3u8) for playout:
<Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="1"> </Accessibility> <Role schemeIdUri="urn:mpeg:dash:role:2011" value="alternate"> </Role>
Use case walkthrough¶
As an example, consider a use case that starts from three (progressive) MP4's. One contains the video as well as the main audio track (English), while the other two contain an audio track each, one with alternate audio (Welsh), the other with a broadcast mix audio description for the visually impaired (English).
To start, fragmented MP4's need to be created from the input files (for a command that uses the --track_kind option, go to the fourth step below).
#!/bin/bash mp4split -o english-audio.isma \ english.mp4 \ --track_type=audio \ --track_language=eng
Second, package the alternate Welsh audio track. As with the track above, the language property is corrected here as well:
#!/bin/bash mp4split -o welsh-audio.isma \ welsh-audio.mp4 \ --track_language=cym
Third, use the --track_type once more to extract and fragment the video:
#!/bin/bash mp4split -o english-video.ismv \ english.mp4 \ --track_type=video
Fourth, use the --track_kind option when packaging the alternate audio track
that contains the audio description in English. Because language, bitrate and
codec are identical to the fMP4 that contains the main audio track (english-audio.isma),
their 'kind' is what distinguishes them. As indicated above,
should be used for audio description tracks:
#!/bin/bash mp4split -o english-ad.isma \ english-ad-audio.mp4 \ --track_language=eng \ --track_kind="about:html-kind@main-desc"
Finally, generate a server manifest that includes all of the fragmented MP4's created above:
#!/bin/bash mp4split -o presentation.ism \ --hls.client_manifest_version=4 \ english-video.ismv \ english-audio.isma \ welsh-audio.isma \ english-ad.isma \ --track_description="English (describes video)"
When the server manifest has been generated, everything is ready to stream the video using Unified Origin, which will include the proper signaling of the audio description track in the client manifest, as explained earlier.
The first step is to package all the source content into the format that is used by Unified Origin. This is the fragmented-MP4 format.
The example uses this Source Content.
#!/bin/bash mp4split -o video_400k.ismv \ tears-of-steel-avc1-400k.mp4 \ tears-of-steel-aac-64k.mp4 mp4split -o video_800k.ismv \ tears-of-steel-avc1-750k.mp4 \ tears-of-steel-he-aac.mp4 mp4split -o video.ismv \ tears-of-steel-avc1-400k.mp4 \ tears-of-steel-avc1-750k.mp4 \ tears-of-steel-dts-384k.mp4 \ tears-of-steel-ac3-448k.mp4 \
Now that we have packaged all the audio and video, the following step is to create the two progressive download files. Instead of creating a completely new MP4 video file we will create an MP4 video that only contains the necessary index and references the actual movie data that is stored in the fragmented-MP4 format.
#!/bin/bash mp4split -o video_400k.mp4 --use_dref \ video_400k.ismv
#!/bin/bash mp4split -o video_800k.mp4 --use_dref \ video_800k.ismv
As a last step we create the server manifest file. This is an XML file that contains the media information about all the tracks and is used by the USP webserver module.
#!/bin/bash mp4split -o video.ism \ video.ismv \ video_400k.ismv \ video_800k.ismv
At this point we have six files stored for our presentation:
|video_400k.ismv||AAC-LC, 400 kbps video|
|video_800k.ismv||HE-AAC, 800 kbps video|
|video.ismv||200/600 kbps video, DTS, AC3|
|video_400k.mp4||AAC-LC, 400 kbps video|
|video_800k.mp4||HE-AAC, 800 kbps video|
|video.ism||USP server manifest file|
The USP webserver module makes the following URLs available. Note that except for the progressive download URLs, they are all virtual and do not exist on disk:
|HTTP Live Streaming||http://www.example.com/usp/video.ism/video.m3u8|
|HTTP Dynamic Streaming||http://www.example.com/usp/video.ism/video.f4m|