Packaging for Unified Origin

Package MP4 to fragmented-MP4 and back

Fragmented-MP4 is also known as PIFF, ISMV or CMAF (down below there is a specific section on How to package CMAF. Converting from an MP4 file to a fragmented mp4 file (.ismv):


mp4split -o example.ismv \

The other way around, from fragmented MP4 / PIFF / ISMV / CMAF to MP4:


mp4split -o example.mp4 \

Or from Adobe’s F4F to MP4:


mp4split -o example.mp4 \

You can also use the server manifest file as input. All the audio and video streams referenced in the manifest file are combined into one MP4 file.


mp4split -o example.mp4 \

How to package CMAF

New in version 1.8.3.

The process of packaging CMAF is very similar to using mp4split to package ISMV. Depending on the extension of the output, mp4split will decide how the content is packaged. In the case of CMAF, you can choose between .cmfv for video, .cmfa for audio and .cmft for text streams.

The command-line that you use is very straightforward, as shown in the examples for packaging CMAF audio, video an text streams video below:


mp4split -o tears-of-steel-128k.cmfa tears-of-steel-128k.mp4
mp4split -o tears-of-steel-5.cmfv tears-of-steel-5.mp4
mp4split -o tears-of-steel-en.cmft tears-of-steel-en.ismt

Use to package the Tears of Steel demo contents to CMAF.


If you want to override or add any properties to a track when packaging CMAF, you can do so using the familiar options: Overriding and adding track properties.

Solving WebVTT sync issues


If your workflow for fMP4 HLS involves one or more WebVTT files that are or were part of a HLS Transport Stream play-out scenario, chances are that a 10 minute time offset is signaled in the WebVTT (using the EXT-X-TIMESTAMP-MAP tag).

If this is the case, it will result in the subtitles being out of sync as the default when packaging CMAF is to not use a time offset. To synchronize the WebVTT timeline with the other media you can simply remove the EXT-X-TIMESTAMP-MAP tag from the WebVTT file.

Synchronizing the other media to the WebVTT timeline is also possible, but not recommended. To do so, offset all other media when packaging by adding the option --timestamp_offset=10.

Options for fragmented-MP4 packaging

The packager supports the following options.


Create a (progressive) mp4 that references a fragmented mp4 file, for ‘’progressive download’’ to older players. Unified Origin will resolve media data references on playout.


New in version 1.7.27.

Like “–use_dref” creates mp4 that references an mp4 file, but without explicitly referencing sub-samples, resulting in a (considerably) smaller video mp4.


Do not use if you want offer a download to own option, because that requires sub-sample data.


Do not write the output.


The output timescale used for media. Defaults to the original media or 10MHz when the “piff” brand is used.


The target duration of each fragment (in milliseconds), default 2000. When sync-samples are present, then the fragments for the streams are aligned. This parameter can be useful for optimizing the fragment duration for a specific playout format e.g. HLS which recommends a fragment duration of 8 seconds. In rare cases it could also be useful for aligning audio or video fragments, although it is highly recommended to start with all sources GOP aligned from the outset.


Sets the ‘compatibility brand’. Options: “piff”, “iso6”, “ccff” and “dash”. Default is “iso6”, but with timescale=10000000 (10Mhz) the default is “piff”.

When creating (progressive) mp4 files with negative-composition-times “iso4” is used as brand. When using “iso2” negative composition time offsets are disabled and an edit list is used to compensate for the ct_offset.

Overriding and adding track properties

When generating a fragmented or progressive MP4 file (.mp4, .isma, .ismv or .ismt) from an input track, its track properties are based on the properties of the input track. It is possible to add or override some of these properties, but in most cases, this is not necessary.


You can also set a name and description for each track, but that is only possible when generating a server manifest. See –track_name and –track_description.

When generating a fragmented or progressive MP4, the track properties that can be added or overridden are the following:


By default track_language is taken from the input track’s media info. In case you do need or want to set the language for a track, make sure to use the correct RFC 5646 language tags. These language tags consist of two-letter, three-letter, extended languages, and scripts.


Only DASH and HLS offer suport for RFC 5646. For output formats that do not support it, a tag’s first two or three characters are parsed according to ISO 639-1 or ISO 639-2/T. If this does not result in a valid language tag, und is used. To make sure that a valid fallback option is available, it is good practice to specify a macro language when possible. For example: signal Cantonese Chinese using zh-yue rather than yue so that ‘Chinese’ is used as the fallback option.

For example, specifying languages with two letter ISO 639-1 language codes:

  • en for English
  • nl for Dutch
  • es for Spanish

For languages that do not have two letter language codes but do have ISO 639-2/T or ISO 639-3 codes:

  • haw for Hawaian
  • yue for Cantonese

For languages as used in different regions:

  • en-uk for English as used in the UK
  • nl-be for Dutch; Flemmish as used in Belgium
  • pt-br for Portugese as used in Brazil

For additional scripting tags for languages:

  • sr-cyrl for Serbian using the Cyrillic script
  • zh-hans for Chinese using the simplified script


In addition to the language tag, HLS requires the presence of a language name. For tags that are part of ISO 639-1 or ISO 639-2/T mapping of the tag to a name is automatic. For all other language tags, –track_description should be used to signal the name.


mp4split -o audio-en.mp4 --track_language=en \
            audio-nl-be.mp4 --track_language=nl-be \
            --track_description="Vlaams Nederlands" \
            audio-zh-yue-hans.mp4 --track_language=zh-yue-hant \
            --track_description="Cantonese Chinese using Traditional script"


Overrides the average bitrate of a track.

By default track_bitrate is the average bitrate (either from the metadata info of the input track, or calculated from the source samples). You can also set this to max, so that the maximum/peak bitrate is used.


mp4split -o output.ismv \
 input1.mp4 --track_type=audio  --track_bitrate=24000 \
 input2.mp4 --track_type=audio  --track_bitrate=48000 \
 input3.mp4 --track_type=video  --track_bitrate=31000 \
 input4.mp4 --track_type=video  --track_bitrate=86000 \
 input5.mp4 --track_type=video  --track_bitrate=156000


Sets the role of a track and can be used to further distinguish it, next to bitrate and language. The exact meaning of a role can be dependent on the kind of track it is added to (video, audio or text). All of the roles specified in urn:mpeg:dash:role:2011 can be used. Most of them are listed in the table below:

–track_role= Description
main main media intended for presentation if no other information is provided.
alternate media that is an alternative to the main media of the same type.
supplementary media that is supplementary to media content of a different media component type.
commentary media content component with commentary.
caption media content component with captions (typically containing description of music and other sounds, in addition to transcript of dialog).
subtitle media content component with subtitles.
description track containing audio of textual description of visual component (intended for audio synthesis).
metadata media component containing information intended to be processed by application specific elements.

mp4split -o audio.mp4 --track_role=main --track_language=eng \
  commentary.mp4 --track_role=alternate --track_language=eng


New in version 1.7.31.

Adds a SchemeIdUri/Value pair to the ‘kind’ box when packaging a (f)MP4. This box should describe the intended purpose of the track. Similar to the –track_role option described above, the –track_kind option can be used to further distinguish a track, besides its bitrate and language.

Specifying the parameters of this option is done like so:


Where the <SchemeIdUri> and <Value> should be replaced with parameters of choice, preferably from the about:html-kind scheme defined by W3C HTML5 or the urn:mpeg:dash:role:2011 scheme defined by MPEG-DASH (although the latter can be signaled more easily using the –track_role option).

In additon, urn:tva:metadata:cs:AudioPurposeCS:2007 can be used, in which case using value ‘1’ signals content for the visually impaired and value ‘2’ signals content for the hard of hearing.

When packaging for Unified Origin, the main use case of the –track_kind option is adding and properly signalling tracks that provide accessibility features, such as captions for the hard of hearing or an audio description of the video track for the visually impaired. The first can be signaled using urn:tva:metadata:cs:AudioPurposeCS:2007@2, whereas the about:html-kind scheme can be used to signal the latter with about:html-kind@main-desc.

Take for example the situation in which the about:html-kind@main-desc ‘kind’ is present in a track that has been added to a server manifest. Unified Origin will then add the following parameters for this track when generating a DASH client manifest (.mpd) and HLS main playlist (.m3u8) for playout:

MPEG-DASH (.mpd)


HLS (.m3u8)


Use case walkthrough

As an example, consider a use case that starts from three (progressive) MP4’s. One contains the video as well as the main audio track (English), while the other two contain an audio track each, one with alternate audio (Welsh), the other with a broadcast mix audio description for the visually impaired (English).

To start, fragmented MP4’s need to be created from the input files (for a command that uses the –track_kind option, go to the fourth step below).

First, use the –track_type option to extract and fragment the audio track from the file that contains video and audio. To correct the language property, –track_language is used as well:


mp4split -o english-audio.isma \
  english.mp4 \
  --track_type=audio \

Second, package the alternate Welsh audio track. As with the track above, the language property is corrected here as well:


mp4split -o welsh-audio.isma \
  welsh-audio.mp4 \

Third, use the –track_type once more to extract and fragment the video:


mp4split -o english-video.ismv \
  english.mp4 \

Fourth, use the –track_kind option when packaging the alternate audio track that contains the audio description in English. Because language, bitrate and codec are identical to the fMP4 that contains the main audio track (english-audio.isma), their ‘kind’ is what distinguishes them. As indicated above, about:html-kind@main-desc should be used for audio description tracks:


mp4split -o english-ad.isma \
  english-ad-audio.mp4 \
  --track_language=eng \

Finally, generate a server manifest that includes all of the fragmented MP4’s created above. To ensure that the audio description track is signaled using a unique name and description, both are set explicitly:


mp4split -o blockbuster.ism \
  --hls.client_manifest_version=4 \
  english-video.ismv \
  english-audio.isma \
  welsh-audio.isma \
  english-ad.isma \
  --track_description="English Audio Description"

When the server manifest has been generated, everything is ready to stream the video using Unified Origin, which will include the proper signaling of the audio description track in the client manifest, as explained earlier.

Packaging Smooth Streaming with track selection

Say you have one MP4 video and would like to store the audio and video track in separate fragmented files:


mp4split -o example-64k.isma \
  example.mp4 --track_type=audio

mp4split -o example-800k.ismv \
  example.mp4 --track_type=video

Generating the required server manifest file:


mp4split -o example.ism \
  example-64k.isma \

The track selection options always come after the input file. Next to --track_type you can also use --track_id to specify a specific track. Say you have two input files, example-audio.mp4 (containing 4 audio tracks) and example-video.mp4 (containing 4 video tracks) and you want to create a fragmented output file containing the first track of the audio and the last track of the video.


mp4split -o example.ismv \
  example-audio.mp4 --track_id=1 \
  example-video.mp4 --track_id=4

Packaging with track order and defaults

New in version 1.7.17.

It is possible to place tracks in the manifest in a specific order. This order is set by the order in which the tracks are added in the packaging command line. For playout formats that support it e.g. HLS, this in turn also means that the chosen track can be set to DEFAULT=YES.


# create a sorted .isma file
mp4split -o audio_sort.isma \
  swe_audio.mp4 \
  eng_audio.mp4 \
  dan_audio.mp4 \

# create ismv with video and sorted audio
mp4split -o video_audio.ismv \
  video1-4.mp4 \

# create a sorted subtitle file
mp4split -o sorted_subtitles.ismt \
  swe_sub.dfxp \
  eng_sub.dfxp \
  dan_sub.dfxp \

# combine into manifest
mp4split -o sorted_manifest.ism \
  video_audio.ismv \

Therefore in the example above both the Swedish audio and subtitle track would be the first track in their respective groups and in the HLS manifest both set to DEFAULT=YES.

The HLS manifest (sorted_manifest.ism/.m3u8) would look like this (additional tracks intentionally omitted):

# AUDIO groups

# SUBTITLES groups

Packaging content for delivery by Unified Origin

The first step is to package all the source content into the format that is used by Unified Origin. This is the fragmented-MP4 format.

The example uses this Source Content.


mp4split -o video_400k.ismv \
  video_400k.mp4 \

mp4split -o video_800k.ismv \
  video_800k.mp4 \

mp4split -o video.ismv \
  video_200k.mp4 \
  video_600k.mp4 \
  audio_dts.mp4 \
  audio_ac3.mp4 \

Now that we have packaged all the audio and video, the following step is to create the two progressive download files. Instead of creating a completely new MP4 video file we will create an MP4 video that only contains the necessary index and references the actual movie data that is stored in the fragmented-MP4 format.


mp4split -o video_400k.mp4 --use_dref \

mp4split -o video_800k.mp4 --use_dref \

As a last step we create the server manifest file. This is an XML file that contains the media information about all the tracks and is used by the USP webserver module.


mp4split -o video.ism \
  video.ismv \
  video_400k.ismv \

At this point we have six files stored for our presentation.

File Description
video_400k.ismv AAC-LC, 400 kbps video
video_800k.ismv HE-AAC, 800 kbps video
video.ismv 200/600 kbps video, DTS, AC3, EAC3
video_400k.mp4 AAC-LC, 400 kbps video
video_800k.mp4 HE-AAC, 800 kbps video
video.ism USP server manifest file

The USP webserver module makes the following URLs available. Note that all these URLs are virtual. They do not exist on disk.

Playout format URL
Smooth Streaming
HTTP Live Streaming
HTTP Dynamic Streaming
Progressive download
Progressive download

Please download the sample script which creates the various server manifest as discussed above. The sample content is Tears of Steel.