Table of Contents
There are many subtitle / caption formats to choose from. HTTP Smooth Streaming supports the XML-based Timed Text Markup Language (TTML) format and the SMPTE-TT, EBU-TT and DFXP profiles that are derived from it. HTTP Live Streaming supports Web Video Text Tracks (WebVTT), which is based on plain-text SRT or VTT and MPEG DASH supports both WebVTT and TTML (as well as the aforementioned derivatives).
Unified Packager can fragment and package TTML and WebVTT into (fragmented) MP4,
wvtt codecs, respectively. It can also convert
plain text SubRip Text (.srt) and WebVTT
into TTML. When offline packaging DASH, an additional option is to
add TTML or WebVTT as a sidecar (but do note that sidecar
subtitles are not supported by Unified Origin).
The preferred format for subtitles is DFXP (TTML), as it is the most constrained format and therefore the least likely to cause issues when Creating the media files (.ismt).
MP4Split supports reading SubRip Text files. It assumes the input file is encoded in ASCII unless the file starts with a Byte Order Marker (BOM) that describes how the input should be transformed to Unicode.
The output TTML has a default styling and layout, which in general will work well. You are free to change this, but keep in mind that most players have limited capabilities and Unified Packager expects valid TTML.
#!/bin/bash mp4split -o video.ttml \ video.srt --track_language="nld"
When you have created a TTML file from an SRT, the language defaults to English,
i.e. the TTML file contains the XML tag
Make sure that you use the correct extension for the TTML files (.ttml or .dfxp)
Generating a TTML file from WebVTT is similar to the conversion from SubRip files, except that the input is always interpreted as Unicode (regardless of any BOM), because WebVTT is UTF-8 by definition.
#!/bin/bash mp4split -o video.ttml \ video.webvtt --track_language=spa
A limited set of markup components in WebVTT’s cue payloads can be converted to their TTML equivalents. This allows for portability across formats regarding the basic styling that is supported by most devices and players.
Supported WebVTT cue components¶
||Bolds the textual content|
||Italicises the text|
||Underlines the textual content|
||Specifies a line strike through on the text|
Here is an example of a regular WebVTT file with some cue point component elements:
WebVTT cue point example :
WEBVTT 1 00:00:15,000 --> 00:00:18,000 At the <u>left</u> we can see... 2 00:00:18,167 --> 00:00:20,083 position:35% line:20 align:left At the <u>right</u> we can see the... 3 00:00:20,083 --> 00:00:22,000 ...the <c.highlight>head-snarlers</c> 4 00:00:22,000 --> 00:00:24,417 Everything is safe. <i>Perfectly</i> safe.
Result after converting to TTML:
<?xml version="1.0" encoding="utf-8"?> <tt xmlns="..." xml:lang="en"> <head>...</head> <body> <div style="default" xml:lang="en"> <p begin="00:00:15.000" end="00:00:18.000" region="speaker"> At the <span tts:textDecoration="underline">left</span> we can see... </p> <p begin="00:00:18.167" end="00:00:20.083" region="speaker"> At the <span tts:textDecoration="underline">right</span> we can see the... </p> <p begin="00:00:20.083" end="00:00:22.000" region="speaker"> ...the <c.highlight>head-snarlers</c> </p> <p begin="00:00:22.000" end="00:00:24.417" region="speaker"> Everything is safe.<br /> <span tts:fontStyle="italic">Perfectly</span> safe. </p> </div> </body> </tt>
The settings (cue 2) are ignored when converting to TTML and unrecognized styling in the payload is escaped (cue 3).
Fragmented MP4 (.ismt) files that contain captions or subtitles can be created from WebVTT or TTML files.
New in version 1.7.31.
(Web)VTT is packaged as specified by ISO/IEC 14496-30:2014 - Web Video Text Tracks, using the
wvtt). This format allows WebVTT specific cue settings
to define individual subtitle positioning, region and styling information.
Playout only works for HLS and DASH, in the players that support the wvtt codec.
#!/bin/bash mp4split -o subtitles.ismt --fragment_duration=10000 \ subtitles.webvtt --track_language=spa
When packaging WebVTT subtitles, using the
--track_language option is necessary
because (unlike TTML) WebVTT files do not define a language attribute. The
–fragment_duration option specifies fragment length in milliseconds.
Besides packaging WebVTT as a fragmented MP4, packaging it as a progressive MP4 is possible as well:
#!/bin/bash mp4split -o subtitles.mp4 \ subtitles.webvtt --track_language=spa
TTML samples (either DFXP, EBU-TT, SMPTE-TT or CFF-TT) are stored in a subtitle
track that uses the XMLSubtitleSampleEntry(
stpp) with timing (
@end attributes) relative to the start of the track . Packaging them as
fragemented MP4 (.ismt) files is done like so:
#!/bin/bash mp4split -o video.ismt \ video.ttml
This command creates a file with a single track, which is why the TTML input file should contain only one language. If you have a single TTML file that contains multiple languages then you will have to extract separate TTML files for each language first.
MPEG-DASH players (e.g. DASH-JS 1.4 reference player) support this format for both VOD and LIVE playback. Other players (e.g. Google Shaka pre-release) that support only WebVTT may benefit from Adding TTML or WebVTT sidecar subtitles for MPEG-DASH or using the wvtt codec.
The ISO 14496-30 format is the preferred format. When Packaging for HTTP Smooth Streaming (HSS) the
TTML is stored in a similar, but incompatible way in a fragmented MP4
.ismt) container. The samples are stored in a text track and use the
dfxp) as their format. The timing of the
@end attributes is relative to the start of the sample. If you want to
write this older format, then you have to add
--brand=piff to the command
Note that Unified Origin supports both formats and adjusts the timing when necessary.
New in version 1.7.12.
While ISMT is the format of choice for streaming subtitles, occasionally it may be desirable or necessary to expose raw unsegmented subtitles to the player. In these cases, a WebVTT or TTML sidecar file can be added to the MPD.
For instance, to cater (pre-release) Google Shaka player (which supports WebVTT rather than fragmented TTML) the following commands expose German subtitles as WebVTT (as well as fragmented TTML):
#!/bin/bash mp4split -o subtitles.ttml \ subtitles_deu.webvtt --track_language=ger mp4split --package-mpd -o subtitles.ismt subtitles.ttml mp4split --package-mpd -o movie.mpd \ [audio/video] \ subtitles.ismt \ subtitles_deu.webvtt --track_language=ger
This adds an adaptation set with mime type
<AdaptationSet contentType="text" lang="de" mimeType="text/vtt"> <Representation id="textstream_ger=0" bandwidth="0"> <BaseURL>subtitles_deu.webvtt</BaseURL> </Representation> </AdaptationSet>
When you add sidecar subtitles, they are added as-is. That is,
won’t any metadata from the file. This means that metadata that is of importance
for the file should be passed on the command-line (like the track’s language,
|||To create text samples it is important that Unified Packager can derive
correct timing information from TTML source. While the TTML spec is liberal (and
sometimes ambiguous) in this respect, Packager assumes timing in |