Unified Origin - Recommendations for VOD¶
Table of Contents
In VOD workflows Unified Origin is placed between the storage and a Content Delivery Network (CDN), with a shield cache between Origin and CDN and a reverse proxy cache between Origin and storage being recommended to ensure robustness and increase performance:
player --> cdn --> unified origin --> (remote) storage
Building a large scale VOD platform based on Unified Origin presents several challenges:
- Storing large collections of media content in a reliable, secure and cost effective way
- Making the stored large collections of media content available to Origin in a secure and reliable manner with low enough latency and high enough throughput
- Preparing the media content and configuring Origin in a way that ensures that all required output formats are delivered according to their specification (e.g., DASH, HLS and Smooth)
- Scaling Origin and the infrastructure around it so that it can meet the demand of the expected maximum of concurrent users (and beyond)
- Securing Origin to protect your content and the availability of your service
- Integrating one or multiple CDNs to ensure efficient and timely delivery at high volumes and throughout different regions
- Differentiating between the most popular content and your 'long tail' content to achieve the best performance at scale while offering customers access to a vast collection of media content
Most VOD services rely on a vast library of media content to offer their customers a wide variety of choices of assets to stream. Storing such a library in a secure, reliable and cost effective manner is challenging, and important. Among our customers, object-based cloud storage is a popular solution to solve this issue and we believe this to be a good choice for a number of reasons:
- Scale as you go, virtually unlimited storage capabilities
- Reliability and fault tolerance (nine nines reliability), makes the risk of losing media assets due to disk corruption virtually non-existent.
- Fast access and well defined API's as well as bandwidth rate limiting
- Relatively cost effective and inexpensive
- Security and authentication mechanisms are provided
- In case Origin runs in the same cloud environment, there are often no costs for the traffic between Origin and the storage
Origin can work with different object-based cloud storage solutions such as AWS S3 and OpenStack Swift.
Other storage methods can used as well, but at this point we have no concrete guidelines defined for them. In case you have any questions, please contact us.
Naming of the content can also be important, as different names may lead to content being stored on different underlying hard drive sectors within a remote cloud storage environment.
Sub-requests need to be integrated into this story: Requirements.
Put caching proxy between remote storage and Origin: cache dref MP4s and ISMs¶
Remote object-based storage generally offers high throughput, but relatively high latency as well. That's why you want to decrease the number of round trips between the remote storage and Origin as much as possible. This can be done by putting a reverse caching proxy between the remote storage and Origin:
player --> cdn --> unified origin --> cache proxy --> (cloud) storage
For every request that Origin needs to serve, it reads the server manifest first and after that the index of the relevant media. Only after it has read the index does it know where in the source it should look for the media samples it needs. This means that several round trips happen before Origin fetches any media data.
To make the most efficient use of this caching proxy, we recommend a specific setup for your stream: instead of creating a server manifest that references your source content directly, dref MP4s should be used as an intermediary. These dref MP4s contain all of the index information from the source files, but none of the media data, making them very small and ideal for caching.
The setup that we recommend creates dref MP4s for all source content first, then uses these dref MP4s to create the server manifests (ISMs) and finally stores the source content, dref MP4s and ISMs alongside each other on the remote storage. Then, the caching proxy is configured such that it will cache the dref MP4s and the ISMs.
Please refer to Object Storage Reducing Latency to learn more about this setup, and how you can configure a NGINX-based caching proxy to make use of it.
The setup with the cloud storage and Unified Origin for VOD can be extended with an intermediate proxy.
Securing access to your remote storage¶
Another important aspect for cloud storage is to secure access. We recommend to restrict access to your remote storage by requiring requests to be signed with the proper signature, or by only allowing requests from the virtual private cloud that Origin is running in (e.g., by making use of AWS's VPC Endpoint functionality, as described in our blog How to deploy a redundant VOD setup on AWS with Unified Origin).
The advantage of the latter option is that it saves the overhead that is generated by additional TLS handshakes that the signing requires, plus there is no need to share security credentials with modules that need to sign the request.
Merge the following below: Amazon Web Services (AWS)
For VOD, when working with large enough content libraries caching all content becomes unfeasible. This means that the number of requests for relatively unpopular content will determine the load on Origin, as these are the requests that will hit Origin directly.
As a correctly configured Unified Origin instance is bound by I/O and not CPU or memory, a simple approach to determine when to scale would be to consider the outgoing link bandwidth. Given that an instance provides a maximum link of 10 Gb/s or 1 Gb/s, and assuming 2 Mbit/s streams, the result would be a maximum concurrent number of users of 5000 or 500, if each of these users would request a different video stream.
These numbers do not take the link to the remote storage into consideration. As that link may reduce the final maximum throughput with 10-20 percent, 4000 or 400 would be a more realistic number given a link of 10 Gb/s or 1 Gb/s and still assuming 2 Mbit/s streams.
When it comes to scaling, one could spin up new cloud instances or containers running Origin and distribute the requests by a load balancer in case the number of concurrent users approaches the theoretical limit as calculated above.
In some cases other factors may degrade performance of Unified Origin. These may include VM Exits by hypervisor for many requests interupts.
We really need these results and those for Amazon are most critical. These results are much more important than the backend storage caching This section needs to be much clearer and more specific. Some tests were done with OpenStack as well, it will make sense to repeat these tests. Also, a visualization of metrics from origin under heavy load may be useful to show.
scaling on premise deployment may also still be considered ? Like using Bare metal service and optimized NIC's ?
With this scalability problem, scaling and configuring Apache web server is important as shown in Christina's thesis we may consider allocating additional time to work on this topic, using this thesis as a starting point.
I can expand this section based upon the thesis work of mark, and perhaps christina. Yet I would like to receive feedback first on that would be the critical metrics.
Load balancing across different Origins¶
We need to say something about the use of load balancers as well, as they're a pretty important of a multi-Origin setup.
In VOD scenarios, as opposed to Live, it is often unfeasible to let CDNs cache all content in all relevant output formats from Origin for three reasons:
- Content libraries are too large, thus the costs involved would be very high
- VOD content is often made available for a long period of time, thus, again, the costs involved would be very high
- Most requests are for a small portion of highly popular content
Considering this, it is important to identify your most popular content and differentiate from the large portion of content that gets requested much less frequently. This large portion of less popular content is often called the 'long tail'.
For example: your most popular content may be responsible for 80% of requests, whereas it only represents only 20% of your content, with the other 80% of your content only generating 20% of the requests. In a graph showing the distribution of popularity of your content, 80% of it would be represented as the 'long tail' of the distribution.
Differentiating between the most popular content and the long tail is not always easy. In addition, you need to take into account that the distribution of popularity may change over time and certain content from the long tail can be become popular and vice versa.
In general, you may want to approach this topic in two ways:
- Ensure that content that you believe will be popular is cached on the CDN before making it available (this can be done by using a tool like Unified Capture to 'pull it through' the CDN).
- Consider storing your long tail content on slower, cheaper storage to save costs.
We only recommend doing the first, not the second, although doing the second may be warranted in certain setups as well.
- Use Docker to quickly deploy Your own Video on Demand demo
- Check our tutorial on Getting started with VOD
- Read our blog on How to deploy a redundant VOD setup on AWS with Unified Origin
- See all of our features in action on Unified Streaming Demo
- Read our Unified Origin - VOD documentation to learn more
- Check our Troubleshooting VOD Streaming when you run into a problem
- Sharpen your skills through our training program and get Unified Certified