Cloud Storage Proxy
New in version 1.10.27.
Attention
Eventually ProxyPass
will replace IsmProxyPass
meaning full Apache
'Proxy' functionality will be utilised. An intermediate workflow is outlined
below which uses Proxy
with subrequests to better handle media
requests.
Introduction
Often content requested by HTTP clients is stored remotely and must be accessed through HTTP(s) requests, historically the Origin has relied on cURL to handle these requests. While providing a robust solution this has two distinct disadvantages.
Each request requires establishing a new connection between webservers. This is the most performance limiting step, in terms of network latency and the request process.
cURL doesn't pass through client headers.
These are resolved by using Apache subrequests to handle all upstream HTTP(s) requests.
For backwards compatibility the cURL functionality remains unchanged and still available.
See Configuration below on how to set up the use of subrequests.
Benefits
The key benefits include performance gains, greater flexibility and opportunities for setup optimization.
Performance Gains
Using Apache's subrequests
Apache subrequests establishes and maintains a pool of connections between Webservers 'intelligently' reusing or culling them as required.
To achieve performance gains, subrequests must be enabled and individual
<Proxy>
sections defined for each backend storage server. See below.
Throughput Performance Gains with Remote Storage
A fairly optimal setup may achieve performance improvements of around 10-20% while less optimized setups will see even greater gains. These improvements are largely due to caching of DNS lookups.
Adding Caching Layers to Improve Performance
Adding a further caching layer between the Origin and storage populated with index files of stored content significantly reduces the amount of requests leading to a reduction in fragment latency and an increase in throughput.
Using subrequests rather than curl requests will always be more efficient because of Apache's internal caching mechanisms (for DNS and other TCP managerial processes). The storage cache will be even more efficient if it supports HTTP keepalive [1], and the Origin is correctly configured for this, as connections between the Origin and storage cache can be pooled and re-used. See Cloud Storage Reducing Latency for further information.
Flexibility and Optimization
Adding Custom Headers
When utilizing subrequests Adding custom HTTP headers, our
webserver modules propagate request headers transparently which can be added
to web frontends and passed through Origin or Remix to arrive at
storage backends. Custom headers can be added, removed or modified as required
using Apache's mod_headers
module to provide additional information
to aid trouble shooting or implement server side logic for setup optimization,
for example;
Informing users whether an asset has been delivered between servers
Rate limiting bandwidth on your origin server
Restricting CDN traffic
Collecting statistics
Using headers to include/exclude proxy requests into a billing system
Controlling the routing of the mp4 proxy requests according to routing policy rules
Trace-ID Headers
There are multiple proprietary and standardized methods for tracing requests through web services available. Unified Streaming reference the W3C Trace Context standard [2].
- Amazon X-Amzn-Trace-Id Header
Amazon's Application Load Balancer [3] defines a
X-Amzn-Trace-Id
header, to identify when many similar requests are received from the same client within a short time. If there are many layers in the Amazon stack, the header can also be used to track a unique request across all the layers.- Google X-Cloud-Trace-Context Header
Google's Cloud Trace [4] is a distributed tracing system for Google Cloud that collects latency data from applications and displays it near real-time in the Google Cloud Console.
- Microsoft Request-Id Header
Microsoft Azure has supported the
Request-Id
andCorrelation-Context
headers for some time, however this will be deprecated in favor of the upcoming Trace Context standard.- W3C Trace Context
W3C has recently published a draft of their Trace Context standard, which is co-authored by several Google, Dynatrace and Microsoft employees. It is intended as a replacement for Microsoft's
Request-Id
andCorrelation-Context
headers (see HTTP Correlation Protocol [5]).- Forwarded: header (RFC 7239)
In RFC 7239 [6] the
Forwarded
header is defined, this allows proxy components to disclose information lost in the proxying process.
To manage (e.g. add, remove or modify) tracing headers used by Apache, it is
recommended to use subrequests alongside <Proxy>
sections, and
directives from mod_headers
.
Amazon S3 Authentication Using Headers
Authentication is sometimes required when accessing Amazon S3 buckets.
To aid workflow simplification, provide greater flexibility and offer
improvements to user setups, the Amazon S3 API has been integrated into Origin.
This enables authentication to be handled by the Apache Proxy. A separate module
mod_unified_s3_auth
handles the configuration and signing of authentication
logic.
This enables AWS authentication parameters to be placed at the more logical point, where the S3 bucket is defined. Secondly, the signing method used has been changed; signing is now performed using the header approach not the query parameter approach, providing a better fit for the use of headers as described below.
origin --> storage-proxy+cache --> mod_unified_s3_auth --> storage (s3)
(ism & drefs)
See Using S3 with Authentication for further details.
Requirements
To use subrequests in Origin, you require:
Apache 2.4, with the following modules enabled:
mod_proxy
mod_proxy_http
mod_ssl
mod_smooth_streaming
1.10.22, or latermod_unified_s3_auth
1.10.22, or later (for AWS S3 authentication)
Installation
Install Apache and Unified Origin as usual (see How to Configure (Unified Origin) for more information).
Ensure mod_proxy
, mod_proxy_http
and mod_smooth_streaming
are
enabled, and that apachectl configtest
shows no errors.
If you require Amazon S3 authentication install and enable
mod_unified_s3_auth
.
Tip
Use apachectl to test configurations.
Configuration
To configure subrequests:
Add a
UspEnableSubreq on
directiveAdd
<Proxy>
sections for target URLs
Custom HTTP headers can also be added, if mod_headers
is enabled.
Adding UspEnableSubreq
directives
UspEnableSubreq on
directs Origin to use subrequests, it should be placed in
a <Location>
section. (See Location in the Apache documentation for
further details.)
This should be combined with the directives enabling the use of the Unified Streaming module.
For example:
<Location "/">
UspHandleIsm on
UspEnableSubreq on
</Location>
To enable remote storage access IsmProxyPass
needs to be added also.
This can be done in either either a <Location>
or a <Directory>
directive - where <Location>
is the preferred directive as of 1.10.28:
<Location "/your-bucket">
IsmProxyPass http://your-bucket.s3.amazonaws.com/
</Location>
or:
<Directory "/var/www/test/your-bucket">
IsmProxyPass http://your-bucket.s3.amazonaws.com/
</Directory>
Note
Traditionally <Directory>
has been used to access remote storage (as can
be seen in the Dynamic Manifests section) with the path being a virtual
path: it should not actually exist on disk for the mapping to remote storage
to work.
However, looking at the Apache documentation <Location>
seems to be a
better fit as remote storage indeed does not relate to the local filesystem
which <Directory>
implies - so no 'virtual' path anymore.
Alternatively, the directives can be combined into a single <Location>
when all content is remotely stored in for instance S3, which is the most common
use case:
<Location "/">
UspHandleIsm on
UspEnableSubreq on
IsmProxyPass http://your-bucket.s3.amazonaws.com/
</Location>
For locations and directories where UspEnableSubreq
is enabled, Origin
issues HTTP requests to remote storage objects by building internal
subrequests, and dispatching these directly into Apache's proxy handler.
Adding <Proxy>
sections for target URLs
When the rewrite rules send the subrequest internally as a proxy request, they are handled by workers in Apache. There are two built-in workers: the default forward proxy worker and the default reverse proxy worker, these are not configurable.
Additional workers can be configured explicitly, using <Proxy>
sections with
ProxySet
directives, these should be defined for each of your remote storage
servers. This enables connection reuse, and HTTP keep-alive
for the defined remote storage servers.
For example, for a remote storage server at http://storage.example.com/,
add the following to the <VirtualHost>
:
<Proxy "http://storage.example.com/">
ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
</Proxy>
Individual settings are explained below.
If the server is reachable via http
and https
, you must add a separate
<Proxy>
setting for each.
Note
Where a <Proxy>
section refers to https
, you must also add the
SSLProxyEngine on
directive to your <VirtualHost>
section.
The above ProxySet
parameters are customized, the most important being
enablereuse=on
, which enables connection reuse and gives the greatest performance
improvements.
For more information about the ProxySet
directive, see proxyset in the
Apache documentation.
Description of ProxySet key=value parameters
- connectiontimeout (default: timeout)
We recommend 5 seconds, which should be more than enough for most cases, including when connecting to far away Amazon S3 buckets.
If you know your storage is "close", in network terms, this setting can be lowered. However setting this too low can lead to an increase in errors when establishing connections.
- disablereuse (default: Off)
We recommend keeping this off, see below.
- enablereuse (default: On)
We recommend keeping this on (or not setting it at all), reusing connections greatly improves performance.
- keepalive (default: On)
We recommend keeping this on unless you know that TCP connections are kept open indefinitely by the network between your origin and storage.
- retry (default: 60)
We recommend 0, this means errors will be immediately reported to the subrequest handler instead of keeping the pool workers occupied.
- timeout (default: ProxyTimeout)
We recommend 30 seconds as the upstream default is 60 seconds, which is a long for data to be retrieved.
If you know that the connection to your storage is fast, this setting can be lowered. However setting this too low can lead to more errors when downloading storage content.
- ttl (default: n/a)
We recommend 300 seconds as the upstream default does not keep inactive connections. Keeping inactive connections means they can be reused for HTTP Keep-Alive, which improves performance.
Adding custom HTTP headers
You can add custom HTTP headers to subrequests, using Apache's mod_headers
.
Use the RequestHeader
directive inside the appropriate <Proxy>
section.
For example:
<Proxy "http://storage.example.com/">
ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
RequestHeader set MyHeader1 "%D %t"
RequestHeader set MyHeader2 "Hello"
</Proxy>
This will add two custom headers to requests for http://storage.example.com/:
MyHeader1
which contains the duration and the time of the requestMyHeader2
which contains the fixed stringHello
Trace-ID Headers can be set similarly.
For more information about the possible uses of the RequestHeader
directive,
see requestheader.
In the above case headers are only added to request for media fragments, as
<Proxy>
is only used for media fragments. In case headers are required on
manifest request they may be added in a proxy, for instance as outlined in
Header Authorization. Alternatively, if local caching is used as
outlined in Cloud Storage Reducing Latency the headers may be set in the
caching virtual host so they are added when proxying the request to the remote
storage.
Removing request headers
As this configuration causes the Origin to act as a proxy towards the storage
backend request headers will be passed through. In some cases this can affect
the response of the storage backend in a negative way, for example by setting
an inappropriate Accept-Encoding
header.
To avoid this, mod_headers
can be used to remove any unwanted request
headers from the proxy request.
For example:
<Proxy "http://storage.example.com/">
ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
RequestHeader unset Accept-Encoding
</Proxy>
Troubleshooting
Apache subrequests are performed using internal proxy requests, and handled by Apache's workers. By default, there will not appear any messages in the Apache log about the activities of these workers and proxy requests, except for (fatal) errors.
To help with troubleshooting requests, it is advisable to turn up Apache's
LogLevel
for the mod_proxy_http
module to at least trace4
. Add the
following line to the appropriate VirtualHost
section, after the other
configuration for logging:
LogLevel proxy_http:trace4
Then tell Apache to reload its configuration, or restart it. The additional
mod_proxy_http
messages will then appear in the file specified by the
ErrorLog
directive in your VirtualHost
section, typically something like
/var/log/apache2/myvirtualhost-error.log
.
For example, if media is retrieved from Amazon S3, the log messages will look like the following:
[Tue Feb 01 12:52:22.150234 2022] [proxy_http:trace1] [pid 67975:tid 140427176965888] mod_proxy_http.c(62): [client 127.0.0.1:56444] HTTP: canonicalising URL //usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4
[Tue Feb 01 12:52:22.150441 2022] [proxy_http:trace1] [pid 67975:tid 140427176965888] mod_proxy_http.c(1985): [client 127.0.0.1:56444] HTTP: serving URL http://usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4
[Tue Feb 01 12:52:22.183174 2022] [proxy_http:trace3] [pid 67975:tid 140427176965888] mod_proxy_http.c(1361): [client 127.0.0.1:56444] Status from backend: 206
[Tue Feb 01 12:52:22.183226 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1016): [client 127.0.0.1:56444] Headers received from backend:
[Tue Feb 01 12:52:22.183243 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] x-amz-id-2: 3aEuz5gEaxmkfVvlT/kQhFc00kmcsDP1be07L2WPaFZ6bxlTPV+lguKsEmEhgBWyHmTtMz0etQ4=
[Tue Feb 01 12:52:22.183260 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] x-amz-request-id: 5ZYNMXRNDE3TT7CE
[Tue Feb 01 12:52:22.183273 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Date: Tue, 01 Feb 2022 11:52:23 GMT
[Tue Feb 01 12:52:22.183288 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Last-Modified: Fri, 26 Jan 2018 13:25:16 GMT
[Tue Feb 01 12:52:22.183342 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] ETag: "49cdbf517193fe6796f73a535e62e1f1-2"
[Tue Feb 01 12:52:22.183357 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Accept-Ranges: bytes
[Tue Feb 01 12:52:22.183369 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Range: bytes 0-65535/30172842
[Tue Feb 01 12:52:22.183381 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Type: video/mp4
[Tue Feb 01 12:52:22.183392 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Server: AmazonS3
[Tue Feb 01 12:52:22.183403 2022] [proxy_http:trace4] [pid 67975:tid 140427176965888] mod_proxy_http.c(1039): [client 127.0.0.1:56444] Content-Length: 65536
[Tue Feb 01 12:52:22.183424 2022] [proxy_http:trace3] [pid 67975:tid 140427176965888] mod_proxy_http.c(1724): [client 127.0.0.1:56444] start body send
[Tue Feb 01 12:52:22.208999 2022] [proxy_http:trace2] [pid 67975:tid 140427176965888] mod_proxy_http.c(1870): [client 127.0.0.1:56444] end body send
In this example:
A subrequest is done to retrieve the remote storage URL
http://usp-auth-v4-2.s3-eu-central-1.amazonaws.com/oceans.mp4
The HTTP status returned by the remote storage is 206, which means "OK, partial content"
The reply headers are logged, including
x-amz-request-id
andx-amz-id-2
, which can be used for contacting Amazon Support [7].
In particular, when errors occur, the HTTP status and x-amz-request-id
headers can be useful when diagnosing the root cause. Similarly, other cloud
vendors such as Azure and Google Cloud will return identifying headers in
response to requests.
Note that many HTTP requests can be "in flight" simultanously. If you want to
inspect one particular request, filter the log for the specific [client
127.0.0.1:ppppp]
line containing the URL you are interested in, where
ppppp
is a unique local port number assigned to each individual connection.
Example configuration
Here is an example configuration file containing some of the above setting which can be used as a foundation for building your own setup.
<VirtualHost *:80>
ServerAdmin admin@localhost
ServerName server.localhost
DocumentRoot /var/www/origin
<Directory />
Require all granted
Satisfy Any
</Directory>
AddHandler smooth-streaming.extensions .ism .isml .mp4
# Root location for handling local server manifests
# enabling subrequests here allows it to be applied to the whole site.
<Location "/">
UspHandleIsm on
UspEnableSubreq on
</Location>
# Alternate location redirecting to S3 storage
<Location "/your-bucket/">
IsmProxyPass "http://your-bucket.s3.eu-central-1.amazonaws.com/"
</Location>
# Proxy location and timeout parameters for apache workers when using UspEnableSubreq
<Proxy "http://your-bucket.s3.eu-central-1.amazonaws.com/">
ProxySet connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
</Proxy>
# Alternate method of configuring proxy if preferred
#ProxySet http://your-bucket.s3.eu-central-1.amazonaws.com/ connectiontimeout=5 enablereuse=on keepalive=on retry=0 timeout=30 ttl=300
Options -Indexes
# If not specified, the global error log is used
ErrorLog /var/log/apache2/features.unified-streaming.com-error.log
CustomLog /var/log/apache2/features.unified-streaming.com-access.log combined
LogLevel warn
HostnameLookups Off
UseCanonicalName On
ServerSignature On
LimitRequestBody 0
Header always set Access-Control-Allow-Headers "origin, range"
Header always set Access-Control-Allow-Methods "GET, HEAD, OPTIONS"
Header always set Access-Control-Allow-Origin "*"
</VirtualHost>
Notes