Table of Contents
There are many subtitle / caption formats to choose from. HTTP Dynamic Streaming, HTTP Smooth Streaming and MPEG DASH support the TTML (SMPTE-TT, EBU-TT or DFXP) format. HTTP Live Streaming supports WebVTT.
MP4Split supports various conversions for preparing and packaging subtitles. It can package TTML and convert SubRip Text (.srt) and WebVTT into TTML to enable packaging of these formats. When packaging DASH, there's the additional option to add TTML or WebVTT as a sidecar.
The preferred format is DFXP (TTML), as it is the most constrained format and therefore the least likely to cause issues when Creating the media files (.ismt).
MP4Split supports reading SubRip Text files. It assumes the input file is encoded in ASCII unless there is a Byte Order Marker (BOM) header present. If that's the case then the input file is read as Unicode.
The output TTML has a default styling and layout that in general work well. You are free to change this, but keep in mind that most players have limited capabilities.
#!/bin/bash mp4split -o video.ttml \ video.srt
When you have created a TTML file from an SRT, the language defaults to English.
I.e. the TTML file contains the XML tag
xml:lang="en". One option is that
you manually modify the
xml:lang tag, so that when you convert the TTML into
an ISMT the track language is recognized. The other option is that you pass
the language when creating a server manifest file. More on that later in this
Make sure that you use the correct extension for the TTML files (.ttml).
Generating a TTML file from a WebVTT is done like this (e.g. to enable you to add WebVTT to an HLS presentation):
#!/bin/bash mp4split -o video.ttml \ video.webvtt
We have added extra support for importing WebVTT cue components to add further information to the actual cue text itself. These components are similar to HTML elements, and can be used to add styling and semantics to the actual text strings. These elements contain a start and an end tag, and the formatting is applied to the text in between. In the table below we have listed the different available WebVTT cue components that will be supported in the latest release:
Supported WebVTT cue components¶
||Bolds the textual content|
||Italicises the text|
||Underlines the textual content|
||Specifies a line strike through on the text|
Here is an example of a regular WebVTT file with some cue point component elements:
WebVTT cue point example :
WEBVTT 1 00:00:15,000 --> 00:00:18,000 At the <u>left</u> we can see... 2 00:00:18,167 --> 00:00:20,083 At the <u>right</u> we can see the... 3 00:00:20,083 --> 00:00:22,000 ...the <c.highlight>head-snarlers</c> 4 00:00:22,000 --> 00:00:24,417 Everything is safe. <i>Perfectly</i> safe.
Result after converting to TTML:
<?xml version="1.0" encoding="utf-8"?> <tt xmlns="..." xml:lang="en"> <head>...</head> <body> <div style="default" xml:lang="en"> <p begin="00:00:15.000" end="00:00:18.000" region="speaker"> At the <span tts:textDecoration="underline">left</span> we can see... </p> <p begin="00:00:18.167" end="00:00:20.083" region="speaker"> At the <span tts:textDecoration="underline">right</span> we can see the... </p> <p begin="00:00:20.083" end="00:00:22.000" region="speaker"> ...the <c.highlight>head-snarlers</c> </p> <p begin="00:00:22.000" end="00:00:24.417" region="speaker"> Everything is safe.<br /> <span tts:fontStyle="italic">Perfectly</span> safe. </p> </div> </body> </tt>
A WebVTT file is always interpreted as UTF-8, regardless of any BOM (Byte Order Marker).
The format of TTML in a fragmented MP4 (.ismt) container is conformant with the
MPEG specification (ISO 14496-30) and allows storing any of the TTML formats
(EBU-TT, SMPTE-TT, TTML-TT or CFF-TT)  . The samples are stored in a subtitle track
and use the XMLSubtitleSampleEntry(
stpp) as format. The timing of the
@end attributes are relative to the start of the track.
#!/bin/bash mp4split -o video.ismt \ video.ttml
MPEG-DASH players (e.g. DASH-JS 1.4 reference player) support this format for both VOD and LIVE playback. Other players (e.g. Google Shaka pre-release) that support only WebVTT may benefit from Adding TTML or WebVTT sidecar subtitles for MPEG-DASH.
The ISO 14496-30 format is the preferred format. When Packaging for HTTP Smooth Streaming (HSS) the
TTML is stored in a similar, but incompatible, way in a fragmented MP4
.ismt) container. The samples are stored in a text track and use the
dfxp) as format. The timing of the
attributes are relative to the start of the sample. If you want to write
this older format, then you have to add
--brand=piff to the command line.
Note that Unified Origin supports both formats and changes the timing when necessary.
New in version 1.7.12.
While ISMT is the format of choice for streaming subtitles, occasionally it may be desirable or necessary to expose raw unsegmented subtitles to the player. In these cases, a WebVTT or TTML sidecar file can be added to the MPD.
For instance, to cater (pre-release) Google Shaka player (which supports WebVTT rather than fragmented TTML) the following commands expose German subtitles as WebVTT (as well as fragmented TTML):
#!/bin/bash mp4split -o subtitles.ttml \ subtitles_deu.webvtt --track_language=ger mp4split --package-mpd -o subtitles.ismt subtitles.ttml mp4split --package-mpd -o movie.mpd \ [audio/video] \ subtitles.ismt \ subtitles_deu.webvtt --track_language=ger
This adds an adaptation set with mime type
<AdaptationSet contentType="text" lang="de" mimeType="text/vtt"> <Representation id="textstream_ger=0" bandwidth="0"> <BaseURL>subtitles_deu.webvtt</BaseURL> </Representation> </AdaptationSet>
|||To create text samples it is important that packager can derive correct
timing information from TTML source. While the TTML spec is liberal (and
sometimes ambiguous) in this respect, packager assumes timing in |