Shield caching with Nginx and Apache

This tutorial describes how to build a origin-edge setup with Apache and Nginx on Ubuntu. Nginx with cache locking is needed because Apache's own cache locking works only as a hint and is not reliable.

The information here is a guideline, other ways of doing this are possible too.

Lua may be used on both Apache and Nginx to script extra features, but if this is not required it may be left out.

Apache

Origin configuration

To setup Apache you will need to create a vhost config file in /etc/apache2/sites-enabled, for instance as 'origin.conf':

Listen 0.0.0.0:82
<VirtualHost *:82>
  ServerAdmin webmaster@localhost
  ServerName origin.unified-streaming.com

  DocumentRoot /var/www/vod
  <Directory />
    Options FollowSymLinks
    AllowOverride None
  </Directory>

  <Location />
    UspHandleIsm on
  </Location>

  ErrorLog /var/log/apache2/origin.unified-streaming.com-error.log

  # Possible values include: debug, info, notice, warn, error, crit,
  # alert, emerg.
  LogLevel warn

  CustomLog /var/log/apache2/origin.unified-streaming.com-access.log common
</VirtualHost>

At this point you will have Apache configured as an origin with /var/www/vod as DocumentRoot. A good test if everything works is to use the setup from Verify Your Setup.

Running this tutorial will create a directory called /var/www/usp-evaluation directory with content, but it can be renamed to /var/www/vod. Then adjust the URLs in 'index.html' (inside /var/www/vod) to 'origin.unified-streaming.com' and make sure your /etc/hosts file maps your IP address to origin.unified-streaming.com.

You then can start the webserver as follows:

#!/bin/bash

sudo service apache2 restart

Apache should now start listening to port 82 as an origin.

Nginx

Setup a cache and log directory for Nginx:

sudo mkdir /var/cache/nginx
sudo chown -R www-data:www-data /var/cache/nginx

sudo mkdir /var/log/nginx
sudo chown -R www-data:www-data /var/log/nginx

Configuration

Nginx will listen on port 80 and sit in front of the Apache origin. You will need the following configuration (in /usr/local/nginx/conf):

# same user as Apache
user www-data;

# most sources suggest 1 per core
worker_processes 2;

working_directory /var/www;
error_log /var/log/nginx/error.log;
pid /var/tmp/nginx.pid;

# worker_processes * worker_connections = maxclients
events {
  worker_connections 256;
}

http {
  default_type  application/octet-stream;

  log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';

  access_log /var/log/nginx/access.log main;

  sendfile on;
  tcp_nopush on;
  keepalive_timeout 65;

  log_format cache '***$time_local '
                   '$upstream_cache_status '
                   'Cache-Control: $upstream_http_cache_control '
                   'Expires: $upstream_http_expires '
                   '"$request" ($status) '
                   '"$http_user_agent" ';

  access_log  /var/log/nginx/cache.log cache;

  proxy_buffering on;
  proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=edge-cache:10m inactive=20m max_size=1g;
  proxy_temp_path /var/cache/nginx/tmp;
  proxy_cache_lock on;
  proxy_cache_use_stale updating;
  proxy_bind 0.0.0.0;
  proxy_cache_valid 200 302 10m;
  proxy_cache_valid 301      1h;
  proxy_cache_valid any      1m;

  upstream origin {
    server origin.unified-streaming.com:82;
    keepalive 32;
  }

  server {
    listen 0.0.0.0:80;
    server_name edge.unified-streaming.com;

    location / {
      proxy_pass http://origin;
      proxy_cache edge-cache;

      proxy_http_version 1.1;
      proxy_set_header Connection "";

      add_header X-Cache-Status $upstream_cache_status;
      add_header X-Handled-By $proxy_host;
    }

    location /server-status {
      stub_status on;
    }
  }
}

Schematically:

viewers -> cdns -> nginx:80 -> origin:82
                 [cache+lock]

If all is correct then you have an edge-origin setup with caching and cache-locks through Nginx.

#!/bin/bash

sudo /usr/local/sbin/nginx

(Assuming you installed nginx in /usr/local/sbin - see the Nginx documentation).

Additional Nginx settings

In order to setup the Nginx cache lock properly you need to have the following settings.

  • In the upstream section of the nginx.conf add this line:
keepalive {CONNECTIONLIMIT};

where CONNECTIONLIMIT is the maximum you want to set for keepalive.

  • In the location section add these two lines:
proxy_http_version 1.1;
proxy_set_header Connection "";

This will allow for the "time_wait"-requests to close much quicker and if it would reach CONNECTIONLIMIT, older connections will be closed from Nginx.

Nginx Rate Limiting

Note

This section is an extract from an Nginx Blog

Rate limiting provides the capability to limit the number of requests per user that can be made within a given period of time. Rate limiting can help to prevent DDos attacks ibut most commonly it is used to protect the upstream_server (origin) from being overloaded with too many simultaneous requests.

Nginx rate limiting uses the leaky bucket algorithm to process requests based upon a first‑in‑first‑out (FIFO) method. Once the number of requests exceeds the given threshold the remaining requests are simply discarded and a failure response provided (5xx) leaving your upstream origin protected.

This can be configured using two main directives: limit_req_zone and limit_req.

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s

upstream origin {
    server origin.unified-streaming.com:82;
    keepalive 32;
  }

  server {
    listen 0.0.0.0:80;
    server_name edge.unified-streaming.com;

    location / {
      limit_req zone=mylimit;
      proxy_pass http://origin;
      proxy_cache edge-cache;

      proxy_http_version 1.1;
      proxy_set_header Connection "";

      add_header X-Cache-Status $upstream_cache_status;
      add_header X-Handled-By $proxy_host;
    }

The limit_req_zone directive defines the parameters for rate limiting while limit_req enables rate limiting within the context where it appears.

The limit_req_zone directive is typically defined in the http block allowing it to be used with multiple context/locations.

It takes the following three parameters:

  • Key – Defines the request characteristic against which the limit is applied. In the example it is the Nginx variable $binary_remote_addr, which holds a binary representation of a client’s IP address. This means we are limiting each unique IP address to the request rate defined by the third parameter.
  • Zone – Defines the shared memory zone used to store the state of each IP address and how often it has accessed a request‑limited URL. Keeping the information in shared memory means it can be shared among the Nginx worker processes. The definition has two parts: the zone name identified by the zone= keyword, and the size following the colon. State information for about 16,000 IP addresses takes 1 megabyte, so our zone can store about 160,000 addresses.
  • Rate – Sets the maximum request rate. In the example, the rate cannot exceed 10 requests per second. Nginx actually tracks requests at millisecond granularity, so this limit corresponds to 1 request every 100 milliseconds (ms). Because we are not allowing for bursts (see Nginx Rate Limiting), this means that a request is rejected if it arrives less than 100ms after the previous permitted one. The limit_req_zone directive sets the parameters for rate limiting and the shared memory zone, but it does not actually limit the request rate. For that you need to apply the limit to a specific location or server block by including a limit_req directive there. In the example, we are rate limiting requests to /.

So now each unique IP address is limited to 10 requests per second for / or more precisely, cannot make a request for that URL within 100ms of its previous one.

Note

For further details and use cases please see the Nginx Blog mentioned earlier and Nginx Rate Limiting documentation.

Urls

A typical browser session (Silverlight) in Firefox would look like this:

../../_images/player.png

Urls would be:

http://edge.unified-streaming.com/flash.html?file=http://edge.unified-streaming.com/video/oceans/oceans.ism/oceans.f4m

http://edge.unified-streaming.com/silver.html?file=http://edge.unified-streaming.com/video/oceans/oceans.ism/Manifest

The above can be achieved by editing the 'index.html' in /var/www/vod to use the URL wanted and by setting the ip address of the edge correctly in /etc/hosts locally - when DNS is not used.

Debugging

Using Firefox + Firebug, it is possible to see what edge is doing regarding its cache:

../../_images/flash.png ../../_images/silverlight.png

With this it is possible to quickly check the behaviour of the edge(s)/origin.

Log files

The log files generated by the different setups are also of value. Locations of the different log files are as follows.

Apache

/var/log/apache2

Nginx

/var/log/nginx

The config files of each server define the access, error and cache logs. Tools like for instance Munin can pick these up to produce load graphs.

Load testing

For load testing other tools can be used, varying from Apache Bench (AB) to Weighttp to custom build setups.