Cloud Storage Reducing Latency

Introduction

A webserver is stateless and each call for a fragment, will trigger reading the server manifest and requesting (sample) ranges from the storage backend.

This configuration, when storing source content in HTTP storage, results in an overhead in communications between origin and a HTTP storage location.

Using a cache on the origin for requests normally made against the content on HTTP storage, minimizes latency and improves throughput.

cdns --> shield-cache --> origin --> http storage (s3)

The origin makes multiple calls to the storage for manifest and sample indexes to find, and retrieve the samples it needs to create the output. When the source content is large, the content index will also be large. Without caching there will be significantly more requests increasing the overall latency of the response.

The following setup describes how to lower latency. If you do not experience latency problems (for instance when storage and webserver(s) are close to each other) you do not need to use this setup.

Local cache overview

By utilizing a local cache populated with additional index files made from the stored content, it is possible to significantly reduce the number of requests. This leads to both a reduction in fragment latency and an increase in throughput. An additional benefit would be a reduction in CPU usage on the origin server.

Schematically this looks likes the following:

cdns -> shield-cache -> origin -> storage-cache -> (http) storage

The server manifest and index file are cached locally, pointing to the audio and video source placed in the storage:

cache       [http]     storage
              |
.ism          |
  --> .mp4 -- | -->   audio/video source

Prerequisites

Index files need to be created alongside the existing content assets, these are small files containing the necessary meta-data and which reference the actual movie data from the original (fragmented) video.

The index MP4s must be put in the storage in the same bucket as the .ism and .cfmv/a files and contain references to the media directly in the same bucket (no uri of any kind). As the references in the MP4 are relative, the origin resolves the references against the content accessible on the same path.

An index MP4 should be created for each fragmented MP4 (audio, video or text).

In addition to mod_smooth_streaming you will need to enable the following modules:

  • mod_headers

  • mod_proxy

  • mod_proxy_http

  • mod_cache

  • mod_cache_disk

A new virtual host will be created on the origin to proxy the S3 content through to the default virtual host and will selectively cache the index files locally.

Enabling subrequests and connection reuse

The Apache directive UspEnableSubreq on must be added to the VirtualHost and <Proxy> sections and ProxySet directives configured for each storage, see the storage proxy Configuration documentation for the details on how to set this up.

The cache will gain more efficiency if the storage cache itself supports HTTP keepalive [1], and the Origin is configured to use it. When it does connections between the Origin and the storage cache can be pooled and re-used. This is configured like the following:

<Proxy "http://storage1.example.com/">
  ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
</Proxy>

Creating the index MP4 files

To create the data reference (dref) MP4 files you use the --use_dref_no_subs command with mp4split as below:

#!/bin/bash

mp4split -o tears-of-steel-avc1-400k-index.mp4 \
  --use_dref_no_subs tears-of-steel-avc1-400k.cfmv

mp4split -o tears-of-steel-avc1-1000k-index.mp4 \
  --use_dref_no_subs tears-of-steel-avc1-1000k.cfmv

mp4split -o tears-of-steel-aac-128k-index.mp4 \
  --use_dref_no_subs tears-of-steel-aac-128k.cfma

Please make sure you use a filename that you can later match against in this case.

Note

Creating one dref MP4 for all tracks within a stream is possible as well, and may be even more efficient (simply add all tracks as input when creating the dref MP4).

Creating a new manifest

A new manifest is necessary to reference the new index MP4s, the origin can then fetch it from the local cache for subsequent requests:

#!/bin/bash

mp4split -o tears-of-steel.ism \
  tears-of-steel-avc1-400k-index.mp4 \
  tears-of-steel-avc1-1000k-index.mp4 \
  tears-of-steel-aac-128k-index.mp4

Apache configuration

You will need to add an additional Virtual Host to your origin configuration and a new listen port of 8081 to your main configuration.

The new Virtual Host will act as a reverse proxy caching only the manifest and index files, the default Virtual Host can choose to then use the cached files instead.

Below is an example default Virtual Host and the new Virtual Host available on port 8081. Please note this example is for Rocky Linux.

As the HTTP storage returns a status code of 206 for the requested S3 content we remove the range header from the manifest and index files only, allowing them to enter the local Apache cache.

You can uncomment the debug logging temporarily on the caching Virtual Host to ensure your mod_cache set is working correctly and the right items have the correct status code and are in turn being cached.

<VirtualHost *:80>
  Header set Access-Control-Allow-Origin "*"
  ServerAdmin webmaster@localhost
  ServerName origin-proxy
  DocumentRoot /var/www/origin

  # Location redirecting all requests to Internal Cache
  <Location "/">
    UspHandleIsm on
    UspEnableSubreq on
    IsmProxyPass "http://localhost:8081/"
  </Location>

  <Proxy "http://localhost:8081/">
    ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
  </Proxy>

  ## Alternate method of configuring forward proxy
  #ProxySet http://localhost:8081/ connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300

  ErrorLog /var/log/apache2/origin-error.log
  CustomLog /var/log/apache2/origin-access.log combined
  LogLevel warn
</VirtualHost>

<VirtualHost *:8081>
  ServerName origin-cache

  # The cache directory (which should exists)
  CacheRoot /var/cache/apache2
  CacheEnable disk /
  CacheDirLevels 5
  CacheDirLength 3
  CacheDefaultExpire 7200
  CacheIgnoreNoLastMod On
  CacheIgnoreCacheControl On
  CacheIgnoreQueryString On
  # The max size of your index files
  CacheMaxFileSize 1000000000

  # This allows for the full dref mp4 index to be cached locally
  CacheQuickHandler off

  # Unset range to cache the index mp4s and server manifest
  # Set Cache-Control s-maxage to 7days to support caching of files when S3
  # Authentication Headers(S3UseHeaders) are enabled. If the response contains
  # an "Authorization:" header, it must also contain an "s-maxage",
  # "must-revalidate" or "public" option in the "Cache-Control:" header, or it
  # won't be cached.
  <LocationMatch ".*\.(?i:ism|mp4)$">
     RequestHeader unset Range
     Header set Cache-Control "s-maxage=604800"
   </LocationMatch>

  <Location "/">
    ProxyPass http://your-bucket.s3.eu-central-1.amazonaws.com/ connectiontimeout=5 timeout=10 ttl=300 keepalive=on retry=0
    ProxyPassReverse http://your-bucket.s3.eu-central-1.amazonaws.com/
  ## Add S3 Authentication if needed (requires mod_ssl and mod_unified_s3_auth)
  #  SSLProxyEngine on
  #  S3AccessKey Your-AWS-AccessKey
  #  S3SecretKey Your-AWS-SecretKey
  #  S3Region your-buckets-region
  #  S3UseHeaders on
  </Location>

  ## Alternate method of configuring reverse proxy
  #ProxyPass / http://s3.eu-central-1.amazonaws.com/ connectiontimeout=5 timeout=10 ttl=300 keepalive=on retry=0
  #ProxyPassReverse / http://s3.eu-central-1.amazonaws.com/

  ErrorLog /var/log/apache2/cache-error.log
  CustomLog /var/log/apache2/cache-access.log combined
  #LogLevel debug # Use this to check your files are being cached
  LogLevel warn
</VirtualHost>

Note

When S3UseHeaders on is configured a Cache-Control Header must also be set. If the response contains an "Authorization:" header (which is does), it must also contain an "s-maxage", "must-revalidate" or "public" option in the "Cache-Control:" header, or it won't be cached. For more information, please see http://httpd.apache.org/docs/2.4/caching.html.

Note

Another approach to secure access to your storage is to configure the S3 bucket(s) to only accept requests coming through a VPC endpoint, which does not require HTTPS and thus mitigates the overhead of signing each request to S3: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html.