Cloud Storage Proxy

New in version 1.10.27.

Attention

Eventually ProxyPass will replace IsmProxyPass meaning full Apache 'Proxy' functionality will be utilised. An intermediate workflow is outlined below which uses Proxy with subrequests to better handle media requests.

../../_images/storage_proxy.svg

Introduction

Often content requested by HTTP clients is stored remotely and must be accessed through HTTP(s) requests, historically the Origin has relied on cURL to handle these requests. While providing a robust solution this has two distinct disadvantages.

  • Each request requires establishing a new connection between webservers. This is the most performance limiting step, in terms of network latency and the request process.

  • cURL doesn't pass through client headers.

These are resolved by using Apache subrequests to handle all upstream HTTP(s) requests.

For backwards compatibility the cURL functionality remains unchanged and still available.

See Configuration below on how to set up the use of subrequests.

Benefits

The key benefits include performance gains, greater flexibility and opportunities for setup optimization.

Performance Gains

Using Apache's subrequests

Apache subrequests establishes and maintains a pool of connections between Webservers 'intelligently' reusing or culling them as required.

To achieve performance gains, subrequests must be enabled and individual <Proxy> sections defined for each backend storage server. See below.

Throughput Performance Gains with Remote Storage

A fairly optimal setup may achieve performance improvements of around 10-20% while less optimized setups will see even greater gains. These improvements are largely due to caching of DNS lookups.

Adding Caching Layers to Improve Performance

Adding a further caching layer between the Origin and storage populated with index files of stored content significantly reduces the amount of requests leading to a reduction in fragment latency and an increase in throughput.

Using subrequests rather than curl requests will always be more efficient because of Apache's internal caching mechanisms (for DNS and other TCP managerial processes). The storage cache will be even more efficient if it supports HTTP keepalive [1], and the Origin is correctly configured for this, as connections between the Origin and storage cache can be pooled and re-used. See Cloud Storage Reducing Latency for further information.

Flexibility and Optimization

Adding Custom Headers

When utilizing subrequests Adding custom HTTP headers, our webserver modules propagate request headers transparently which can be added to web frontends and passed through Origin or Remix to arrive at storage backends. Custom headers can be added, removed or modified as required using Apache's mod_headers module to provide additional information to aid trouble shooting or implement server side logic for setup optimization, for example;

  • Informing users whether an asset has been delivered between servers

  • Rate limiting bandwidth on your origin server

  • Restricting CDN traffic

  • Collecting statistics

  • Using headers to include/exclude proxy requests into a billing system

  • Controlling the routing of the mp4 proxy requests according to routing policy rules

Trace-ID Headers

There are multiple proprietary and standardized methods for tracing requests through web services available. Unified Streaming reference the W3C Trace Context standard [2].

Amazon X-Amzn-Trace-Id Header

Amazon's Application Load Balancer [3] defines a X-Amzn-Trace-Id header, to identify when many similar requests are received from the same client within a short time. If there are many layers in the Amazon stack, the header can also be used to track a unique request across all the layers.

Google X-Cloud-Trace-Context Header

Google's Cloud Trace [4] is a distributed tracing system for Google Cloud that collects latency data from applications and displays it near real-time in the Google Cloud Console.

Microsoft Request-Id Header

Microsoft Azure has supported the Request-Id and Correlation-Context headers for some time, however this will be deprecated in favor of the upcoming Trace Context standard.

W3C Trace Context

W3C has recently published a draft of their Trace Context standard, which is co-authored by several Google, Dynatrace and Microsoft employees. It is intended as a replacement for Microsoft's Request-Id and Correlation-Context headers (see HTTP Correlation Protocol [5]).

Forwarded: header (RFC 7239)

In RFC 7239 [6] the Forwarded header is defined, this allows proxy components to disclose information lost in the proxying process.

To manage (e.g. add, remove or modify) tracing headers used by Apache, it is recommended to use subrequests alongside <Proxy> sections, and directives from mod_headers.

Amazon S3 Authentication Using Headers

Authentication is sometimes required when accessing Amazon S3 buckets.

To aid workflow simplification, provide greater flexibility and offer improvements to user setups, the Amazon S3 API has been integrated into Origin. This enables authentication to be handled by the Apache Proxy. A separate module mod_unified_s3_auth handles the configuration and signing of authentication logic.

This enables AWS authentication parameters to be placed at the more logical point, where the S3 bucket is defined. Secondly, the signing method used has been changed; signing is now performed using the header approach not the query parameter approach, providing a better fit for the use of headers as described below.

origin --> storage-proxy+cache --> mod_unified_s3_auth --> storage (s3)
             (ism & drefs)

See Using S3 with Authentication for further details.

Requirements

To use subrequests in Origin, you require:

  • Apache 2.4, with the following modules enabled:

    • mod_proxy

    • mod_proxy_http

    • mod_ssl

  • mod_smooth_streaming 1.10.22, or later

  • mod_unified_s3_auth 1.10.22, or later (for AWS S3 authentication)

Installation

Install Apache and Unified Origin as usual (see How to Configure (Unified Origin) for more information).

Ensure mod_proxy, mod_proxy_http and mod_smooth_streaming are enabled, and that apachectl configtest shows no errors.

If you require Amazon S3 authentication install and enable mod_unified_s3_auth.

Tip

Use apachectl to test configurations.

Configuration

To configure subrequests:

  • Add a UspEnableSubreq on directive

  • Add <Proxy> sections for target URLs

Custom HTTP headers can also be added, if mod_headers is enabled.

Adding UspEnableSubreq directives

UspEnableSubreq on directs Origin to use subrequests, it should be placed in a <Location> section. (See Location in the Apache documentation for further details.)

This should be combined with the directives enabling the use of the Unified Streaming module.

For example:

<Location "/">
  UspHandleIsm on
  UspEnableSubreq on
</Location>

To enable remote storage access IsmProxyPass needs to be added also.

This can be done in either either a <Location> or a <Directory> directive - where <Location> is the preferred directive as of 1.10.28:

<Location "/your-bucket">
  IsmProxyPass http://your-bucket.s3.amazonaws.com/
</Location>

or:

<Directory "/var/www/test/your-bucket">
  IsmProxyPass http://your-bucket.s3.amazonaws.com/
</Directory>

Note

Traditionally <Directory> has been used to access remote storage (as can be seen in the Dynamic Manifests section) with the path being a virtual path: it should not actually exist on disk for the mapping to remote storage to work.

However, looking at the Apache documentation <Location> seems to be a better fit as remote storage indeed does not relate to the local filesystem which <Directory> implies - so no 'virtual' path anymore.

Alternatively, the directives can be combined into a single <Location> when all content is remotely stored in for instance S3, which is the most common use case:

<Location "/">
  UspHandleIsm on
  UspEnableSubreq on
  IsmProxyPass http://your-bucket.s3.amazonaws.com/
</Location>

For locations and directories where UspEnableSubreq is enabled, Origin issues HTTP requests to remote storage objects by building internal subrequests, and dispatching these directly into Apache's proxy handler.

Adding <Proxy> sections for target URLs

When the rewrite rules send the subrequest internally as a proxy request, they are handled by workers in Apache. There are two built-in workers: the default forward proxy worker and the default reverse proxy worker, these are not configurable.

Additional workers can be configured explicitly, using <Proxy> sections with ProxySet directives, these should be defined for each of your remote storage servers. This enables connection reuse, and HTTP keep-alive for the defined remote storage servers.

For example, for a remote storage server at http://storage.example.com/, add the following to the <VirtualHost>:

<Proxy "http://storage.example.com/">
  ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
</Proxy>

Individual settings are explained below.

If the server is reachable via http and https, you must add a separate <Proxy> setting for each.

Note

Where a <Proxy> section refers to https, you must also add the SSLProxyEngine on directive to your <VirtualHost> section.

The above ProxySet parameters are customized, the most important being enablereuse=on, which enables connection reuse and gives the greatest performance improvements.

For more information about the ProxySet directive, see proxyset in the Apache documentation.

Description of ProxySet key=value parameters

connectiontimeout (default: timeout)

We recommend 5 seconds, which should be more than enough for most cases, including when connecting to far away Amazon S3 buckets.

If you know your storage is "close", in network terms, this setting can be lowered. However setting this too low can lead to an increase in errors when establishing connections.

disablereuse (default: Off)

We recommend keeping this off, see below.

enablereuse (default: On)

We recommend keeping this on (or not setting it at all), reusing connections greatly improves performance.

keepalive (default: On)

We recommend keeping this on unless you know that TCP connections are kept open indefinitely by the network between your origin and storage.

retry (default: 60)

We recommend 0, this means errors will be immediately reported to the subrequest handler instead of keeping the pool workers occupied.

timeout (default: ProxyTimeout)

We recommend 30 seconds as the upstream default is 60 seconds, which is a long for data to be retrieved.

If you know that the connection to your storage is fast, this setting can be lowered. However setting this too low can lead to more errors when downloading storage content.

ttl (default: n/a)

We recommend 300 seconds as the upstream default does not keep inactive connections. Keeping inactive connections means they can be reused for HTTP Keep-Alive, which improves performance.

Adding custom HTTP headers

You can add custom HTTP headers to subrequests, using Apache's mod_headers. Use the RequestHeader directive inside the appropriate <Proxy> section.

For example:

<Proxy "http://storage.example.com/">
  ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
  RequestHeader set MyHeader1 "%D %t"
  RequestHeader set MyHeader2 "Hello"
</Proxy>

This will add two custom headers to requests for http://storage.example.com/:

  • MyHeader1 which contains the duration and the time of the request

  • MyHeader2 which contains the fixed string Hello

Trace-ID Headers can be set similarly.

For more information about the possible uses of the RequestHeader directive, see requestheader.

In the above case headers are only added to request for media fragments, as <Proxy> is only used for media fragments. In case headers are required on manifest request they may be added in a proxy, for instance as outlined in Header Authorization. Alternatively, if local caching is used as outlined in Cloud Storage Reducing Latency the headers may be set in the caching virtual host so they are added when proxying the request to the remote storage.

Removing request headers

As this configuration causes the Origin to act as a proxy towards the storage backend request headers will be passed through. In some cases this can affect the response of the storage backend in a negative way, for example by setting an inappropriate Accept-Encoding header.

To avoid this, mod_headers can be used to remove any unwanted request headers from the proxy request.

For example:

<Proxy "http://storage.example.com/">
  ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
  RequestHeader unset Accept-Encoding
</Proxy>

Troubleshooting

Apache subrequests are performed using internal proxy requests, and handled by Apache's workers. By default, there will not appear any messages in the Apache log about the activities of these workers and proxy requests, except for (fatal) errors.

To help with troubleshooting requests, it is advisable to turn up Apache's LogLevel for the mod_proxy_http module to at least trace4. Add the following line to the appropriate VirtualHost section, after the other configuration for logging:

LogLevel proxy_http:trace4

Then tell Apache to reload its configuration, or restart it. The additional mod_proxy_http messages will then appear in the file specified by the ErrorLog directive in your VirtualHost section, typically something like /var/log/apache2/myvirtualhost-error.log.

For example, if media is retrieved from Amazon S3, the log messages will look like the following:

[Tue Feb 01 12:52:22.150234 2022] [proxy_http:trace1] [pid 67975:tid 140427176965888] mod_proxy_http.c(62): [client 127.0.0.1:56444] HTTP: canonicalising URL //usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4
[Tue Feb 01 12:52:22.150441 2022] [proxy_http:trace1] [pid 67975:tid 140427176965888] mod_proxy_http.c(1985): [client 127.0.0.1:56444] HTTP: serving URL http://usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4
[Tue Feb 01 12:52:22.183174 2022] [proxy_http:trace3] [pid 67975:tid 140427176965888] mod_proxy_http.c(1361): [client 127.0.0.1:56444] Status from backend: 206
[Tue Feb 01 12:52:22.183226 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1016): [client 127.0.0.1:56444] Headers received from backend:
[Tue Feb 01 12:52:22.183243 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] x-amz-id-2: 3aEuz5gEaxmkfVvlT/kQhFc00kmcsDP1be07L2WPaFZ6bxlTPV+lguKsEmEhgBWyHmTtMz0etQ4=
[Tue Feb 01 12:52:22.183260 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] x-amz-request-id: 5ZYNMXRNDE3TT7CE
[Tue Feb 01 12:52:22.183273 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Date: Tue, 01 Feb 2022 11:52:23 GMT
[Tue Feb 01 12:52:22.183288 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Last-Modified: Fri, 26 Jan 2018 13:25:16 GMT
[Tue Feb 01 12:52:22.183342 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] ETag: "49cdbf517193fe6796f73a535e62e1f1-2"
[Tue Feb 01 12:52:22.183357 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Accept-Ranges: bytes
[Tue Feb 01 12:52:22.183369 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Range: bytes 0-65535/30172842
[Tue Feb 01 12:52:22.183381 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Type: video/mp4
[Tue Feb 01 12:52:22.183392 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Server: AmazonS3
[Tue Feb 01 12:52:22.183403 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Length: 65536
[Tue Feb 01 12:52:22.183424 2022] [proxy_http:trace3] [pid 67975:tid 140427176965888] mod_proxy_http.c(1724): [client 127.0.0.1:56444] start body send
[Tue Feb 01 12:52:22.208999 2022] [proxy_http:trace2] [pid 67975:tid 140427176965888] mod_proxy_http.c(1870): [client 127.0.0.1:56444] end body send

In this example:

  • A subrequest is done to retrieve the remote storage URL http://usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4

  • The HTTP status returned by the remote storage is 206, which means "OK, partial content"

  • The reply headers are logged, including x-amz-request-id and x-amz-id-2, which can be used for contacting Amazon Support [7].

In particular, when errors occur, the HTTP status and x-amz-request-id headers can be useful when diagnosing the root cause. Similarly, other cloud vendors such as Azure and Google Cloud will return identifying headers in response to requests.

Note that many HTTP requests can be "in flight" simultanously. If you want to inspect one particular request, filter the log for the specific [client 127.0.0.1:ppppp] line containing the URL you are interested in, where ppppp is a unique local port number assigned to each individual connection.

Example configuration

Here is an example configuration file containing some of the above setting which can be used as a foundation for building your own setup.

<VirtualHost *:80>
  ServerAdmin admin@localhost
  ServerName server.localhost

  DocumentRoot /var/www/origin

  <Directory />
    Require all granted
    Satisfy Any
  </Directory>

  AddHandler smooth-streaming.extensions .ism .isml .mp4

  # Root location for handling local server manifests
  # enabling subrequests here allows it to be applied to the whole site.
  <Location "/">
    UspHandleIsm on
    UspEnableSubreq on
  </Location>

  # Alternate location redirecting to S3 storage
  <Location "/your-bucket/">
    IsmProxyPass "http://your-bucket.s3.eu-central-1.amazonaws.com/"
  </Location>

  # Proxy location and timeout parameters for apache workers when using UspEnableSubreq
  <Proxy "http://your-bucket.s3.eu-central-1.amazonaws.com/">
    ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
  </Proxy>

  # Alternate method of configuring proxy if preferred
  #ProxySet http://your-bucket.s3.eu-central-1.amazonaws.com/ connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300

  Options -Indexes

  # If not specified, the global error log is used
  ErrorLog /var/log/apache2/features.unified-streaming.com-error.log
  CustomLog /var/log/apache2/features.unified-streaming.com-access.log combined
  LogLevel warn

  HostnameLookups Off
  UseCanonicalName On
  ServerSignature On
  LimitRequestBody 0

  Header always set Access-Control-Allow-Headers "origin, range"
  Header always set Access-Control-Allow-Methods "GET, HEAD, OPTIONS"
  Header always set Access-Control-Allow-Origin "*"

</VirtualHost>

Notes