Packaging Subtitles

There are many subtitle / caption formats to choose from. HTTP Dynamic Streaming, HTTP Smooth Streaming and MPEG DASH support the TTML (SMPTE-TT, EBU-TT or DFXP) format. HTTP Live Streaming supports WebVTT.

MP4Split supports various conversions for preparing and packaging subtitles. It can package TTML and convert SubRip Text (.srt) and WebVTT into TTML to enable packaging of these formats. When packaging DASH, there's the additional option to add TTML or WebVTT as a sidecar.

The preferred format is DFXP (TTML), as it is the most constrained format and therefore the least likely to cause issues when Creating the media files (.ismt).

Converting SubRip Text to Timed Text Markup Language (TTML)

MP4Split supports reading SubRip Text files. It assumes the input file is encoded in ASCII unless there is a Byte Order Marker (BOM) header present. If that's the case then the input file is read as Unicode.

The output TTML has a default styling and layout that in general work well. You are free to change this, but keep in mind that most players have limited capabilities.



mp4split -o video.ttml \

When you have created a TTML file from an SRT, the language defaults to English. I.e. the TTML file contains the XML tag xml:lang="en". One option is that you manually modify the xml:lang tag, so that when you convert the TTML into an ISMT the track language is recognized. The other option is that you pass the language when creating a server manifest file. More on that later in this section.


Make sure that you use the correct extension for the TTML files (.ttml).

Converting WebVTT to Timed Text Markup Language (TTML)

Generating a TTML file from a WebVTT is done like this (e.g. to enable you to add WebVTT to an HLS presentation):


mp4split -o video.ttml \

We have added extra support for importing WebVTT cue components to add further information to the actual cue text itself. These components are similar to HTML elements, and can be used to add styling and semantics to the actual text strings. These elements contain a start and an end tag, and the formatting is applied to the text in between. In the table below we have listed the different available WebVTT cue components that will be supported in the latest release:

Supported WebVTT cue components

Name Description
<b></b> Bolds the textual content
<i></i> Italicises the text
<u></u> Underlines the textual content
<s></s> Specifies a line strike through on the text

Here is an example of a regular WebVTT file with some cue point component elements:

WebVTT cue point example :


00:00:15,000 --> 00:00:18,000
At the <u>left</u> we can see...

00:00:18,167 --> 00:00:20,083
At the <u>right</u> we can see the...

00:00:20,083 --> 00:00:22,000
...the <c.highlight>head-snarlers</c>

00:00:22,000 --> 00:00:24,417
Everything is safe.
<i>Perfectly</i> safe.

Result after converting to TTML:

<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="..." xml:lang="en">
    <div style="default" xml:lang="en">
      <p begin="00:00:15.000" end="00:00:18.000" region="speaker">
        At the <span tts:textDecoration="underline">left</span> we can see...
      <p begin="00:00:18.167" end="00:00:20.083" region="speaker">
        At the <span tts:textDecoration="underline">right</span> we can see the...
      <p begin="00:00:20.083" end="00:00:22.000" region="speaker">
        ...the &lt;c.highlight&gt;head-snarlers&lt;/c&gt;
      <p begin="00:00:22.000" end="00:00:24.417" region="speaker">
        Everything is safe.<br />
        <span tts:fontStyle="italic">Perfectly</span> safe.


A WebVTT file is always interpreted as UTF-8, regardless of any BOM (Byte Order Marker).

Creating the media files (.ismt)

The format of TTML in a fragmented MP4 (.ismt) container is conformant with the MPEG specification (ISO 14496-30) and allows storing any of the TTML formats (EBU-TT, SMPTE-TT, TTML-TT or CFF-TT) [1] . The samples are stored in a subtitle track and use the XMLSubtitleSampleEntry(stpp) as format. The timing of the @begin and @end attributes are relative to the start of the track.


mp4split -o video.ismt \

MPEG-DASH players (e.g. DASH-JS 1.4 reference player) support this format for both VOD and LIVE playback. Other players (e.g. Google Shaka pre-release) that support only WebVTT may benefit from Adding TTML or WebVTT sidecar subtitles for MPEG-DASH.

The ISO 14496-30 format is the preferred format. When Packaging for HTTP Smooth Streaming (HSS) the TTML is stored in a similar, but incompatible, way in a fragmented MP4 (.ismt) container. The samples are stored in a text track and use the SampleEntry(dfxp) as format. The timing of the @begin and @end attributes are relative to the start of the sample. If you want to write this older format, then you have to add --brand=piff to the command line.

Note that Unified Origin supports both formats and changes the timing when necessary.

Adding TTML or WebVTT sidecar subtitles for MPEG-DASH

New in version 1.7.12.

While ISMT is the format of choice for streaming subtitles, occasionally it may be desirable or necessary to expose raw unsegmented subtitles to the player. In these cases, a WebVTT or TTML sidecar file can be added to the MPD.

For instance, to cater (pre-release) Google Shaka player (which supports WebVTT rather than fragmented TTML) the following commands expose German subtitles as WebVTT (as well as fragmented TTML):


mp4split -o subtitles.ttml \
  subtitles_deu.webvtt --track_language=ger

mp4split --package-mpd -o subtitles.ismt subtitles.ttml

mp4split --package-mpd -o movie.mpd \
  [audio/video] \
  subtitles.ismt \
  subtitles_deu.webvtt --track_language=ger

This adds an adaptation set with mime type text/vtt (or application/ttml+xml for TTML):

<AdaptationSet contentType="text" lang="de" mimeType="text/vtt">
  <Representation id="textstream_ger=0" bandwidth="0">


[1]To create text samples it is important that packager can derive correct timing information from TTML source. While the TTML spec is liberal (and sometimes ambiguous) in this respect, packager assumes timing in HH:MM:SS.mmm format in the @begin and @end attributes of tt/body/div/p element.