Packaging for Unified Origin

Package MP4 to fragmented-MP4 and back

Fragmented-MP4 is also known as PIFF and ISMV. Converting from an MP4 file to a fragmented mp4 file (.ismv):

#!/bin/bash

mp4split -o example.ismv \
  example.mp4

The other way around, from fragmented MP4 / PIFF / ISMV to MP4:

#!/bin/bash

mp4split -o example.mp4 \
  example.ismv

Or from Adobe's F4F to MP4:

#!/bin/bash

mp4split -o example.mp4 \
  example.f4f

You can also use the server manifest file as input. All the audio and video streams referenced in the manifest file are combined into one MP4 file.

#!/bin/bash

mp4split -o example.mp4 \
  example.ism

Options for fragmented-MP4 packaging

The packager supports the following options.

--use_dref

Create a (progressive) mp4 that references a fragmented mp4 file, for ''progressive download'' to older players. Unified Origin will resolve media data references on playout.

--use_dref_no_subs

New in version 1.7.27.

Like "--use_dref" creates mp4 that references an mp4 file, but without explicitly referencing sub-samples, resulting in a (considerably) smaller video mp4.

Note

Do not use if you want offer a download to own option, because that requires sub-sample data.

--dry_run

Do not write the output.

--timescale

The output timescale used for media. Defaults to the original media or 10MHz when the "piff" brand is used.

--fragment_duration

The target duration of each fragment (in milliseconds), default 2000. When sync-samples are present, then the fragments for the streams are aligned. This parameter can be useful for optimizing the fragment duration for a specific playout format e.g. HLS which recommends a fragment duration of 8 seconds. In rare cases it could also be useful for aligning audio or video fragments, although it is highly recommended to start with all sources GOP aligned from the outset.

--brand

Sets the 'compatibility brand'. Options: "piff", "iso6", "ccff" and "dash". Default is "iso6", but with timescale=10000000 (10Mhz) the default is "piff".

When creating (progressive) mp4 files with negative-composition-times "iso4" is used as brand. When using "iso2" negative composition time offsets are disabled and an edit list is used to compensate for the ct_offset.

Overriding and adding track properties

When generating a fragmented or progressive MP4 file (.mp4, .isma, .ismv or .ismt) from an input track, its track properties are based on the properties of the input track. It is possible to add or override some of these properties, but in most cases, this is not necessary.

Note

You can also set a name and description for each track, but that is only possible when generating a server manifest. See --track_name and --track_description.

When generating a fragmented or progressive MP4, the track properties that can be added or overridden are the following:

--track_language

By default track_language is taken from the input track's media info. In case you do need or want to set the language for a track, make sure to use the correct three character ISO 639-2/T language codes, .e.g. spa for Spanish, eng for English, nld for Dutch et cetera.

#!/bin/bash

mp4split -o audio.mp4 --track_language=eng

--track_bitrate

Overrides the average bitrate of a track.

By default track_bitrate is the average bitrate (either from the metadata info of the input track, or calculated from the source samples). You can also set this to max, so that the maximum/peak bitrate is used.

#!/bin/bash

mp4split -o output.ismv \
 input1.mp4 --track_type=audio  --track_bitrate=24000 \
 input2.mp4 --track_type=audio  --track_bitrate=48000 \
 input3.mp4 --track_type=video  --track_bitrate=31000 \
 input4.mp4 --track_type=video  --track_bitrate=86000 \
 input5.mp4 --track_type=video  --track_bitrate=156000

--track_role

Sets the role of a track and can be used to further distinguish it, next to bitrate and language. The exact meaning of a role can be dependent on the kind of track it is added to (video, audio or text). All of the roles specified in urn:mpeg:dash:role:2011 can be used. Most of them are listed in the table below:

--track_role= Description
main main media intended for presentation if no other information is provided.
alternate media that is an alternative to the main media of the same type.
supplementary media that is supplementary to media content of a different media component type.
commentary media content component with commentary.
caption media content component with captions (typically containing description of music and other sounds, in addition to transcript of dialog).
subtitle media content component with subtitles.
description track containing audio of textual description of visual component (intended for audio synthesis).
metadata media component containing information intended to be processed by application specific elements.
#!/bin/bash

mp4split -o audio.mp4 --track_role=main --track_language=eng \
  commentary.mp4 --track_role=alternate --track_language=eng

--track_kind

New in version 1.7.31.

Adds a SchemeIdUri/Value pair to the 'kind' box when packaging a (f)MP4. This box should describe the intended purpose of the track. Similar to the --track_role option described above, the --track_kind option can be used to further distinguish a track, besides its bitrate and language.

Specifying the parameters of this option is done like so:

--track_kind="<SchemeIdUri>@<Value>"

Where the <SchemeIdUri> and <Value> should be replaced with parameters of choice, preferably from the about:html-kind scheme defined by W3C HTML5 or the urn:mpeg:dash:role:2011 scheme defined by MPEG-DASH (although the latter can be signaled more easily using the --track_role option).

In additon, urn:tva:metadata:cs:AudioPurposeCS:2007 can be used, in which case using value '1' signals content for the visually impaired and value '2' signals content for the hard of hearing.

When packaging for Unified Origin, the main use case of the --track_kind option is adding and properly signalling tracks that provide accessibility features, such as captions for the hard of hearing or an audio description of the video track for the visually impaired. The first can be signaled using urn:tva:metadata:cs:AudioPurposeCS:2007@2, whereas the about:html-kind scheme can be used to signal the latter with about:html-kind@main-desc.

Take for example the situation in which the about:html-kind@main-desc 'kind' is present in a track that has been added to a server manifest. Unified Origin will then add the following parameters for this track when generating a DASH client manifest (.mpd) and HLS main playlist (.m3u8) for playout:

MPEG-DASH (.mpd)

<Accessibility
  schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007"
  value="1">
</Accessibility>
<Role
    schemeIdUri="urn:mpeg:dash:role:2011"
    value="alternate">
</Role>

HLS (.m3u8)

CHARACTERISTICS="public.accessibility.describes-video",AUTOSELECT=YES

Use case walkthrough

As an example, consider a use case that starts from three (progressive) MP4's. One contains the video as well as the main audio track (English), while the other two contain an audio track each, one with alternate audio (Welsh), the other with a broadcast mix audio description for the visually impaired (English).

To start, fragmented MP4's need to be created from the input files (for a command that uses the --track_kind option, go to the fourth step below).

First, use the --track_type option to extract and fragment the audio track from the file that contains video and audio. To correct the language property, --track_language is used as well:

#!/bin/bash

mp4split -o english-audio.isma \
  english.mp4 \
  --track_type=audio \
  --track_language=eng

Second, package the alternate Welsh audio track. As with the track above, the language property is corrected here as well:

#!/bin/bash

mp4split -o welsh-audio.isma \
  welsh-audio.mp4 \
  --track_language=cym

Third, use the --track_type once more to extract and fragment the video:

#!/bin/bash

mp4split -o english-video.ismv \
  english.mp4 \
  --track_type=video

Fourth, use the --track_kind option when packaging the alternate audio track that contains the audio description in English. Because language, bitrate and codec are identical to the fMP4 that contains the main audio track (english-audio.isma), their 'kind' is what distinguishes them. As indicated above, about:html-kind@main-desc should be used for audio description tracks:

#!/bin/bash

mp4split -o english-ad.isma \
  english-ad-audio.mp4 \
  --track_language=eng \
  --track_kind="about:html-kind@main-desc"

Finally, generate a server manifest that includes all of the fragmented MP4's created above. To ensure that the audio description track is signaled using a unique name and description, both are set explicitly:

#!/bin/bash

mp4split -o blockbuster.ism \
  --hls.client_manifest_version=4 \
  english-video.ismv \
  english-audio.isma \
  welsh-audio.isma \
  english-ad.isma \
  --track_description="English Audio Description"

When the server manifest has been generated, everything is ready to stream the video using Unified Origin, which will include the proper signaling of the audio description track in the client manifest, as explained earlier.

Packaging Smooth Streaming with track selection

Say you have one MP4 video and would like to store the audio and video track in separate fragmented files:

#!/bin/bash

mp4split -o example-64k.isma \
  example.mp4 --track_type=audio

mp4split -o example-800k.ismv \
  example.mp4 --track_type=video

Generating the required server manifest file:

#!/bin/bash

mp4split -o example.ism \
  example-64k.isma \
  example-800k.ismv

The track selection options always come after the input file. Next to --track_type you can also use --track_id to specify a specific track. Say you have two input files, example-audio.mp4 (containing 4 audio tracks) and example-video.mp4 (containing 4 video tracks) and you want to create a fragmented output file containing the first track of the audio and the last track of the video.

#!/bin/bash

mp4split -o example.ismv \
  example-audio.mp4 --track_id=1 \
  example-video.mp4 --track_id=4

Packaging with track order and defaults

New in version 1.7.17.

It is possible to place tracks in the manifest in a specific order. This order is set by the order in which the tracks are added in the packaging command line. For playout formats that support it e.g. HLS, this in turn also means that the chosen track can be set to DEFAULT=YES.

#!/bin/bash

# create a sorted .isma file
mp4split -o audio_sort.isma \
  swe_audio.mp4 \
  eng_audio.mp4 \
  dan_audio.mp4 \
  nor_audio.mp4

# create ismv with video and sorted audio
mp4split -o video_audio.ismv \
  video1-4.mp4 \
  audio_sort.isma

# create a sorted subtitle file
mp4split -o sorted_subtitles.ismt \
  swe_sub.dfxp \
  eng_sub.dfxp \
  dan_sub.dfxp \
  nor_sub.dfxp

# combine into manifest
mp4split -o sorted_manifest.ism \
  video_audio.ismv \
  sorted_subtitles.ismt

Therefore in the example above both the Swedish audio and subtitle track would be the first track in their respective groups and in the HLS manifest both set to DEFAULT=YES.

The HLS manifest (sorted_manifest.ism/.m3u8) would look like this (additional tracks intentionally omitted):

# AUDIO groups
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio-aacl-139",NAME="Swedish",LANGUAGE="sv",AUTOSELECT=YES,DEFAULT=YES
...

# SUBTITLES groups
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="textstream",NAME="Swedish",LANGUAGE="sv",AUTOSELECT=YES,DEFAULT=YES,URI="sorted_manifest-textstream_swe=1000.m3u8"
...

Packaging content for delivery by Unified Origin

The first step is to package all the source content into the format that is used by Unified Origin. This is the fragmented-MP4 format.

The example uses this Source Content.

#!/bin/bash

mp4split -o video_400k.ismv \
  video_400k.mp4 \
  audio_aac-lc.mp4

mp4split -o video_800k.ismv \
  video_800k.mp4 \
  audio_he-aac.mp4

mp4split -o video.ismv \
  video_200k.mp4 \
  video_600k.mp4 \
  audio_dts.mp4 \
  audio_ac3.mp4 \
  audio_eac3.mp4

Now that we have packaged all the audio and video, the following step is to create the two progressive download files. Instead of creating a completely new MP4 video file we will create an MP4 video that only contains the necessary index and references the actual movie data that is stored in the fragmented-MP4 format.

#!/bin/bash

mp4split -o video_400k.mp4 --use_dref \
  video_400k.ismv
#!/bin/bash

mp4split -o video_800k.mp4 --use_dref \
  video_800k.ismv

As a last step we create the server manifest file. This is an XML file that contains the media information about all the tracks and is used by the USP webserver module.

#!/bin/bash

mp4split -o video.ism \
  video.ismv \
  video_400k.ismv \
  video_800k.ismv

At this point we have six files stored for our presentation.

File Description
video_400k.ismv AAC-LC, 400 kbps video
video_800k.ismv HE-AAC, 800 kbps video
video.ismv 200/600 kbps video, DTS, AC3, EAC3
video_400k.mp4 AAC-LC, 400 kbps video
video_800k.mp4 HE-AAC, 800 kbps video
video.ism USP server manifest file

The USP webserver module makes the following URLs available. Note that all these URLs are virtual. They do not exist on disk.

Playout format URL
Smooth Streaming http://www.example.com/usp/video.ism/Manifest
HTTP Live Streaming http://www.example.com/usp/video.ism/video.m3u8
HTTP Dynamic Streaming http://www.example.com/usp/video.ism/video.f4m
MPEG DASH http://www.example.com/usp/video.ism/video.mpd
Progressive download http://www.example.com/usp/video_400k.mp4
Progressive download http://www.example.com/usp/video_800k.mp4

Please download the advanced-usp.sh sample script which creates the various server manifest as discussed above. The sample content is Tears of Steel.