Cloud Storage Proxy¶
New in version 1.10.27.
ProxyPass will replace
IsmProxyPass meaning full Apache
'Proxy' functionality will be utilised. An intermediate workflow is outlined
below which uses
Proxy with subrequests to better handle media
Table of Contents
Often content requested by HTTP clients is stored remotely and must be accessed through HTTP(s) requests, historically the Origin has relied on cURL to handle these requests. While providing a robust solution this has two distinct disadvantages.
- Each request requires establishing a new connection between webservers. This is the most performance limiting step, in terms of network latency and the request process.
- cURL doesn't pass through client headers.
These are resolved by using Apache subrequests to handle all upstream HTTP(s) requests.
For backwards compatibility the cURL functionality remains unchanged and still available.
See Configuration below on how to set up the use of subrequests.
The key benefits include performance gains, greater flexibility and opportunities for setup optimization.
Using Apache's subrequests¶
Apache subrequests establishes and maintains a pool of connections between Webservers 'intelligently' reusing or culling them as required.
To achieve performance gains, subrequests must be enabled and individual
<Proxy> sections defined for each backend storage server. See below.
Throughput Performance Gains with Remote Storage
A fairly optimal setup may achieve performance improvements of around 10-20% while less optimized setups will see even greater gains. These improvements are largely due to caching of DNS lookups.
Adding Caching Layers to Improve Performance
Adding a further caching layer between the Origin and storage populated with index files of stored content significantly reduces the amount of requests leading to a reduction in fragment latency and an increase in throughput.
Using subrequests rather than curl requests will always be more efficient because of Apache's internal caching mechanisms (for DNS and other TCP managerial processes). The storage cache will be even more efficient if it supports HTTP keepalive , and the Origin is correctly configured for this, as connections between the Origin and storage cache can be pooled and re-used. See Cloud Storage Reducing Latency for further information.
Flexibility and Optimization¶
Adding Custom Headers¶
When utilizing subrequests Adding custom HTTP headers, our
webserver modules propagate request headers transparently which can be added
to web frontends and passed through Origin or Remix to arrive at
storage backends. Custom headers can be added, removed or modified as required
mod_headers module to provide additional information
to aid trouble shooting or implement server side logic for setup optimization,
- Informing users whether an asset has been delivered between servers
- Rate limiting bandwidth on your origin server
- Restricting CDN traffic
- Collecting statistics
- Using headers to include/exclude proxy requests into a billing system
- Controlling the routing of the mp4 proxy requests according to routing policy rules
There are multiple proprietary and standardized methods for tracing requests through web services available. Unified Streaming reference the W3C Trace Context standard .
- Amazon X-Amzn-Trace-Id Header
- Amazon's Application Load Balancer  defines a
X-Amzn-Trace-Idheader, to identify when many similar requests are received from the same client within a short time. If there are many layers in the Amazon stack, the header can also be used to track a unique request across all the layers.
- Google X-Cloud-Trace-Context Header
- Google's Cloud Trace  is a distributed tracing system for Google Cloud that collects latency data from applications and displays it near real-time in the Google Cloud Console.
- Microsoft Request-Id Header
- Microsoft Azure has supported the
Correlation-Contextheaders for some time, however this will be deprecated in favor of the upcoming Trace Context standard.
- W3C Trace Context
- W3C has recently published a draft of their Trace Context standard, which is
co-authored by several Google, Dynatrace and Microsoft employees. It is
intended as a replacement for Microsoft's
Correlation-Contextheaders (see HTTP Correlation Protocol ).
- Forwarded: header (RFC 7239)
- In RFC 7239  the
Forwardedheader is defined, this allows proxy components to disclose information lost in the proxying process.
To manage (e.g. add, remove or modify) tracing headers used by Apache, it is
recommended to use subrequests alongside
<Proxy> sections, and
Amazon S3 Authentication Using Headers¶
Authentication is sometimes required when accessing Amazon S3 buckets.
To aid workflow simplification, provide greater flexibility and offer
improvements to user setups, the Amazon S3 API has been integrated into Origin.
This enables authentication to be handled by the Apache Proxy. A separate module
mod_unified_s3_auth handles the configuration and signing of authentication
This enables AWS authentication parameters to be placed at the more logical point, where the S3 bucket is defined. Secondly, the signing method used has been changed; signing is now performed using the header approach not the query parameter approach, providing a better fit for the use of headers as described below.
origin --> storage-proxy+cache --> mod_unified_s3_auth --> storage (s3) (ism & drefs)
See Using S3 with Authentication for further details.
To use subrequests in Origin, you require:
- Apache 2.4, with the following modules enabled:
mod_smooth_streaming1.10.22, or later
mod_unified_s3_auth1.10.22, or later (for AWS S3 authentication)
Install Apache and Unified Origin as usual (see How to Configure (Unified Origin) for more information).
enabled, and that
apachectl configtest shows no errors.
If you require Amazon S3 authentication install and enable
Use apachectl to test configurations.
To configure subrequests:
- Add a
<Proxy>sections for target URLs
Custom HTTP headers can also be added, if
mod_headers is enabled.
UspEnableSubreq on directs Origin to use subrequests, it should be placed in
<Location> section. (See Location in the Apache documentation for
This should be combined with the directives enabling the use of the Unified Streaming module.
<Location "/"> UspHandleIsm on UspEnableSubreq on </Location>
To enable remote storage access
IsmProxyPass needs to be added also.
This can be done in either either a
<Location> or a
directive - where
<Location> is the preferred directive as of 1.10.28:
<Location "/your-bucket"> IsmProxyPass http://your-bucket.s3.amazonaws.com/ </Location>
<Directory "/var/www/test/your-bucket"> IsmProxyPass http://your-bucket.s3.amazonaws.com/ </Directory>
<Directory> has been used to access remote storage (as can
be seen in the Dynamic Manifests section) with the path being a virtual
path: it should not actually exist on disk for the mapping to remote storage
However, looking at the Apache documentation
<Location> seems to be a
better fit as remote storage indeed does not relate to the local filesystem
<Directory> implies - so no 'virtual' path anymore.
Alternatively, the directives can be combined into a single
when all content is remotely stored in for instance S3, which is the most common
<Location "/"> UspHandleIsm on UspEnableSubreq on IsmProxyPass http://your-bucket.s3.amazonaws.com/ </Location>
For locations and directories where
UspEnableSubreq is enabled, Origin
issues HTTP requests to remote storage objects by building internal
subrequests, and dispatching these directly into Apache's proxy handler.
<Proxy> sections for target URLs¶
When the rewrite rules send the subrequest internally as a proxy request, they are handled by workers in Apache. There are two built-in workers: the default forward proxy worker and the default reverse proxy worker, these are not configurable.
Additional workers can be configured explicitly, using
<Proxy> sections with
ProxySet directives, these should be defined for each of your remote storage
servers. This enables connection reuse, and HTTP keep-alive
for the defined remote storage servers.
For example, for a remote storage server at http://storage.example.com/,
add the following to the
<Proxy "http://storage.example.com/"> ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300 </Proxy>
Individual settings are explained below.
If the server is reachable via
https, you must add a separate
<Proxy> setting for each.
<Proxy> section refers to
https, you must also add the
SSLProxyEngine on directive to your
ProxySet parameters are customized, the most important being
enablereuse=on, which enables connection reuse and gives the greatest performance
For more information about the
ProxySet directive, see proxyset in the
Description of ProxySet key=value parameters¶
- connectiontimeout (default: timeout)
We recommend 5 seconds, which should be more than enough for most cases, including when connecting to far away Amazon S3 buckets.
If you know your storage is "close", in network terms, this setting can be lowered. However setting this too low can lead to an increase in errors when establishing connections.
- disablereuse (default: Off)
- We recommend keeping this off, see below.
- enablereuse (default: On)
- We recommend keeping this on (or not setting it at all), reusing connections greatly improves performance.
- keepalive (default: On)
- We recommend keeping this on unless you know that TCP connections are kept open indefinitely by the network between your origin and storage.
- retry (default: 60)
- We recommend 0, this means errors will be immediately reported to the subrequest handler instead of keeping the pool workers occupied.
- timeout (default: ProxyTimeout)
We recommend 30 seconds as the upstream default is 60 seconds, which is a long for data to be retrieved.
If you know that the connection to your storage is fast, this setting can be lowered. However setting this too low can lead to more errors when downloading storage content.
- ttl (default: n/a)
- We recommend 300 seconds as the upstream default does not keep inactive connections. Keeping inactive connections means they can be reused for HTTP Keep-Alive, which improves performance.
Adding custom HTTP headers¶
You can add custom HTTP headers to subrequests, using Apache's
RequestHeader directive inside the appropriate
<Proxy "http://storage.example.com/"> ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300 RequestHeader set MyHeader1 "%D %t" RequestHeader set MyHeader2 "Hello" </Proxy>
This will add two custom headers to requests for http://storage.example.com/:
MyHeader1which contains the duration and the time of the request
MyHeader2which contains the fixed string
Trace-ID Headers can be set similarly.
For more information about the possible uses of the
In the above case headers are only added to request for media fragments, as
<Proxy> is only used for media fragments. In case headers are required on
manifest request they may be added in a proxy, for instance as outlined in
Header Authorization. Alternatively, if local caching is used as
outlined in Cloud Storage Reducing Latency the headers may be set in the
caching virtual host so they are added when proxying the request to the remote
Removing request headers¶
As this configuration causes the Origin to act as a proxy towards the storage
backend request headers will be passed through. In some cases this can affect
the response of the storage backend in a negative way, for example by setting
To avoid this,
mod_headers can be used to remove any unwanted request
headers from the proxy request.
<Proxy "http://storage.example.com/"> ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300 RequestHeader unset Accept-Encoding </Proxy>
Apache subrequests are performed using internal proxy requests, and handled by Apache's workers. By default, there will not appear any messages in the Apache log about the activities of these workers and proxy requests, except for (fatal) errors.
To help with troubleshooting requests, it is advisable to turn up Apache's
LogLevel for the
mod_proxy_http module to at least
trace4. Add the
following line to the appropriate
VirtualHost section, after the other
configuration for logging:
Then tell Apache to reload its configuration, or restart it. The additional
mod_proxy_http messages will then appear in the file specified by the
ErrorLog directive in your
VirtualHost section, typically something like
For example, if media is retrieved from Amazon S3, the log messages will look like the following:
[Tue Feb 01 12:52:22.150234 2022] [proxy_http:trace1] [pid 67975:tid 140427176965888] mod_proxy_http.c(62): [client 127.0.0.1:56444] HTTP: canonicalising URL //usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4 [Tue Feb 01 12:52:22.150441 2022] [proxy_http:trace1] [pid 67975:tid 140427176965888] mod_proxy_http.c(1985): [client 127.0.0.1:56444] HTTP: serving URL http://usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4 [Tue Feb 01 12:52:22.183174 2022] [proxy_http:trace3] [pid 67975:tid 140427176965888] mod_proxy_http.c(1361): [client 127.0.0.1:56444] Status from backend: 206 [Tue Feb 01 12:52:22.183226 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1016): [client 127.0.0.1:56444] Headers received from backend: [Tue Feb 01 12:52:22.183243 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] x-amz-id-2: 3aEuz5gEaxmkfVvlT/kQhFc00kmcsDP1be07L2WPaFZ6bxlTPV+lguKsEmEhgBWyHmTtMz0etQ4= [Tue Feb 01 12:52:22.183260 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] x-amz-request-id: 5ZYNMXRNDE3TT7CE [Tue Feb 01 12:52:22.183273 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Date: Tue, 01 Feb 2022 11:52:23 GMT [Tue Feb 01 12:52:22.183288 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Last-Modified: Fri, 26 Jan 2018 13:25:16 GMT [Tue Feb 01 12:52:22.183342 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] ETag: "49cdbf517193fe6796f73a535e62e1f1-2" [Tue Feb 01 12:52:22.183357 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Accept-Ranges: bytes [Tue Feb 01 12:52:22.183369 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Range: bytes 0-65535/30172842 [Tue Feb 01 12:52:22.183381 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Type: video/mp4 [Tue Feb 01 12:52:22.183392 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Server: AmazonS3 [Tue Feb 01 12:52:22.183403 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Length: 65536 [Tue Feb 01 12:52:22.183424 2022] [proxy_http:trace3] [pid 67975:tid 140427176965888] mod_proxy_http.c(1724): [client 127.0.0.1:56444] start body send [Tue Feb 01 12:52:22.208999 2022] [proxy_http:trace2] [pid 67975:tid 140427176965888] mod_proxy_http.c(1870): [client 127.0.0.1:56444] end body send
In this example:
- A subrequest is done to retrieve the remote storage URL
- The HTTP status returned by the remote storage is 206, which means "OK, partial content"
- The reply headers are logged, including
x-amz-id-2, which can be used for contacting Amazon Support .
In particular, when errors occur, the HTTP status and
headers can be useful when diagnosing the root cause. Similarly, other cloud
vendors such as Azure and Google Cloud will return identifying headers in
response to requests.
Note that many HTTP requests can be "in flight" simultanously. If you want to
inspect one particular request, filter the log for the specific
127.0.0.1:ppppp] line containing the URL you are interested in, where
ppppp is a unique local port number assigned to each individual connection.
Here is an example configuration file containing some of the above setting which can be used as a foundation for building your own setup.
<VirtualHost *:80> ServerAdmin admin@localhost ServerName server.localhost DocumentRoot /var/www/origin <Directory /> Require all granted Satisfy Any </Directory> AddHandler smooth-streaming.extensions .ism .isml .mp4 # Root location for handling local server manifests # enabling subrequests here allows it to be applied to the whole site. <Location "/"> UspHandleIsm on UspEnableSubreq on </Location> # Alternate location redirecting to S3 storage <Location "/your-bucket/"> IsmProxyPass "http://your-bucket.s3.eu-central-1.amazonaws.com/" </Location> # Proxy location and timeout parameters for apache workers when using UspEnableSubreq <Proxy "http://your-bucket.s3.eu-central-1.amazonaws.com/"> ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300 </Proxy> # Alternate method of configuring proxy if preferred #ProxySet http://your-bucket.s3.eu-central-1.amazonaws.com/ connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300 Options -Indexes # If not specified, the global error log is used ErrorLog /var/log/apache2/features.unified-streaming.com-error.log CustomLog /var/log/apache2/features.unified-streaming.com-access.log combined LogLevel warn HostnameLookups Off UseCanonicalName On ServerSignature On LimitRequestBody 0 Header always set Access-Control-Allow-Headers "origin, range" Header always set Access-Control-Allow-Methods "GET, HEAD, OPTIONS" Header always set Access-Control-Allow-Origin "*" </VirtualHost>