Panda Kaltura AWS Cluster – the Elastic Compute Cloud
The amazon Elastic Compute Cloud is a web service that provides resizable computing capacity in the cloud. You pay for what you use, and you can scale up or down automatically.
Instances can be launched from an Amazon Machine Image (AMI). An AMI contains both software (operating system and other applications) and configuration. You can launch many instances using one AMI. Amazon and third parties provide many AMIs prebuilt for specific needs like media streaming (Wowza Media Server), Database servers, application servers, etc. You can also create an AMI from your own instances. The AMI is associated with your account and can be copied between regions. You can also control the visibility of your AMI.
Kaltura instances deployment process
We described the installation process of our Kaltura Cluster in this post. We launch all are instances using a template instance which contains the Kaltura code and installation and maintenance scripts. We create a template instance from the template AMI and then use the template instance to deploy all the other instances using a web tool we developed. At Panda OS we use the service for our Kaltura Cluster. Below you will find a description of the instance and instance storage types we use, and also general considerations on using the EC2 service.
Amazon offers 3 storage options:
- Amazon Elastic Block Store (EBS)
- Amazon EC2 Instance Store
- Amazon Simple Storage Service (S3)
What storage option you choose for your instances has implications on cost, backup, and flexibility. Usually a combination of storage options is needed. Below I will describe the storage configuration we use on the Panda Kaltura AWS Cluster.
EBS volumes are recommended over instance store since they can be managed independently of the instance they are connected to. We use EBS volumes as local storage for most instances in the Panda Kaltura cluster.
Provides instances with temporary storage which is physically attached to the host computer. The data in the instance storage does not persist when the associated instance is stopped or terminated. Therefore they are good for instances which only need temporary, low cost storage like transcoding servers.
We use S3 to store the various flavors of the videos and as a source for the Cloudfront CDN. We will discuss S3 in a future post.
Instances in the Panda Kaltura AWS Cluster
All available instance types can be found here. This is the recommended setup. If you want to lower costs, several roles can be combined on one instance. For instance all the batch instances can reside on one instance. Another great feature of EC2 is that resizing instances can be done in minutes – you stop the instance, resize and start it. This can also be automated with the auto scaling tool so that instances are created and resized according to load.
The main front instance holds the data warehouse, Kaltura API, Admin console and KMC, and the sphinx indexing service. We usually use a m1.xlarge instance with a 40GB EBS volume. Additional fronts are launched as needed with the same specification. All front servers are behind a load balancer.Front instances
There are several roles for batch servers, which can be split to individual servers or not, depending on usage.
- Transcoding batch – transcodes entries. It is important that the transcoding batch will have multiple cores so that several workers can transcode the flavors of an entry simultaneously. We use a high cpu instance for this instance.
- Rest batch – handles all the other batch tasks (email sending, etc). We use an m1.medium instance for the rest batch.
- Drop folder batch – usually not necessary, handles fetching videos from drop folders.
We use master and slave database instances. The master is a 50 GB m1.large, and the slave is a 50GB m1.medium.
The NFS instance holds the Kaltura logs and web directory. We use an m1.large with an 80GB EBS volume.
Wowza / Red5 instances
A Wowza instance used for live streaming and VOD can be purchased from the AWS marketplace. Red5 can be installed easily on a
We use a single Elasticache node with a cluster class cache.m1.small.
Cost and cost reduction will be discussed in detail in a separate post. A medium setup which includes:
- Front instance
- Template instance
- Database instance
- NFS instance
- Memcache node
- Two batch servers
- 3TB of cloudfront bandwidth
- 1TB of S3 storage
- 1TB Glacier storage
- Load balancers, Elastic IPs and some API requests
- By default, you can run up to 20 instances and have 40 instances in all states (you can request more).
- You can stop instances that use Amazon EBS volumes as their root device. When an instance is stopped you do not pay for hourly usage or data transfer, but you are billed for the EBS volume usage.
- Each time you transition an instance from stopped to started, you are charged for a full instance hour.
- The instance won’t keep its public and private IP addresses when stopped and started unless it’s in a VPC.
- When an instance is stopped you can attach or detach EBS volumes, and create AMIs from the instance.
- Terminating an instance also deletes its attached EBS volumes in some configurations.
In the next post we will describe our private cloud setup which covers the Virtual Private Cloud, Security Groups, Elastic IPs and Load Balancers.