Building an RTMP re-streaming cluster with NGINX using Heat autoscaling for our Christmas tree project

OpenStack

About a week ago we were discussing how to handle turning off and on the monitoring displays every day. This is a boring repetitive task, so we all agreed, it should be automated as soon as possible.

I quickly scavenged an old raspberry pi that would do the trick. Some more collegues joined in and we started expanding the scope. Everyone brought christmas decorations, a power distribution unit, home automation sets and a lot of Christmas lights. The next evening our manager got wind of this and immediately provided a purchase order to go and buy a christmas tree. I bought it the same evening and set it up the next morning. Project christmas was born!

The OPS team manager, Rosco Nap, built an API in GO that sends SNMP commands to the PDU. Bjarn Bronsveld, our trainee, made a first version of the twitter bot in NodeJS within minutes so the tree would respond to tweets containing the word cloud (which are a lot of tweets by the way…). It was then that we realised that we should share this idea with the rest of the world via a video stream. Of course this has been done many times before, but who cares. It’s fun to do right?

Requirements

Since we like to think big (and we have near unlimited resources to our disposal) i decided we need to build our own re-streaming cluster.
Some of the requirements were;

  • Most importantly, it runs in our office, so it must be secure. Even if it is in a separete network.
  •   We can’t expose our internal mac mini that is capturing the video
  •    The mac mini will push the stream to a external server that will redistribute the stream
  •   We don’t want to expose the GO API externally
  •    That’s where the NodeJS twitter bot comes in
  • It should be able to handle massive volumes of visitors, just in case.
  • Goes without saying, but it should be high available
  • It should scale automaticaly depending on load
  • It should contain a web interface for easy playback

The design

We would not want to initiate a project without a proper design right?
(Don’t worry, we did not write a PID)

Next to the cluster we run 4 main components;

  • The NodeJS twitter bot monitoring twitter for the selected hashtags mentioning or keywords
  • The GO API controlling the PDU power outlets via SNMP
  • Open Broadcaster Software on a mac, streaming to a our NGINX Streaming backend
  • The NGINX Streaming backend. allowing to publish streams from our office, pusing that stream to youtube and allowing to play from the cluster

We have chosen to use two floating ip’s on two separate LBaaS instances. We include them both in the DNS record to get a simple form of HA via DNS.
This is because heat updates can be somewhat disruptive if not done carefully. The LBaaS instances are always runnning HA at CloudVPS without you having to do anything. If you delete or misconfigure one however, the other is also deleted or misconfigured…

The same problems arise with the autoscaling groups (groups of VPSes). If you misconfigure them with heat, all goes to, uhm, down.

Building the cluster with heat

So now we had a backend we can pull the stream from. All we needed now was a bunch of instances that could handle the frontend and re-stream the RTMP stream. All the scripts we used for the autoscaling cluster are available on our github.

unattended install of instances with a user data script

We started by creating a user data script for openstack. Basically it is a bash script that runs the first time a instance is started.
In that way our instances will be stateless, and not require any manual configuration. This is a requirement for the autoscaling part.
If you would have to configure the instance manually, autoscaling will not work.

If you look at our user data script you will see that we go through a couple of steps;

  • Updating the apt cache and installing the required packages
  • Downloading, extracting and building a version of NGINX that includes the RTMP module
  • Downloading the frontend html. Ours is pretty simple, but you could of course pull your app from git.
  • Creating the NGINX config file
  • Restarting NGINX to apply the config

Defining heat resources

In 00_registry.yml we defined what resources we will be calling on in our master.yml. This includes defining the LBaaS resource as well as two types of instances.

One instance type with an floating ip attached for our deploy server. The deploy server is only used as a troubleshooting tool, but could be used to deploy new commits of your application to the app servers.
The other type will include the app instances in a LBaaS pool. This will make sure that LBaaS knows where it can forward traffic to once the number of instances scaling.

Putting it all together

Finally we put it all together in 01_master.yaml. We started off by defining all the parameters needed to deploy the autoscaling cluster. We have defined some defaults, which might not work for you. Don’t worry, you can modify them also when running the command. The parameters all have descriptions, or are pretty much self-explanatory.

After that it is time to actually create resources. We started off by creating the security groups that only allow the neccesary ports.
We choose not to further secure the internal network. If your data is more sensitive than a publicly available RTMP stream you might want to add more security groups.
Next are the internal network, subnet and router so all instances can internally connect to each other. All instances can now be reached via the LBaaS instances which we create next. We use the same resource type we defined earlier with different parameters.

Now that we have security groups, a network and loadbalancers it is time to create the actual instances. We create the instances by nesting them in a AutoScalingGroup and defining the common parameters. We tell what port will be added to which LBaaS pool. This ensures that every instance that is started via the AutoScalingGroup will have a http, https and RTMP port added to the LBaaS instance. OpenStack will take care of all the boring stuff like naming and ip numbering.

Without adding anything OpenStack Heat would create a set of instance, that would never scale up or down. For that we need to add a scaleup and a scaledown policy. In this cluster we have defined that the policy for scaling up and down is adding or removing 1 instance every time.

The policy is triggerd by a Ceilometer alarm (the OpenStack metering component). The alarm we defined is a CPU util alarm. If the utilisation reaches 80% a additional instance is spawned. If it gets below 15% one is removed.

We have not loadtested this yet, so all of you visiting will have to prove whether this will be sufficient for our workload ;)

Finally we deploy a single instance with an floating IP that will function as a deploy server and an entrypoint for us to debug the autoscaling cluster.

Challenges we encountered

Especially integrating the RTMP stream in an web interface (without flash of course) has proven to be quite a challenge.
RTMP does not play nice with HTML5. Our collegue Huib has managed to set up a transcoder to HLS and integrate that into a html5 page. Transcoding to HLS introduces a lot of delay due to the design of HLS which buffers the video stream into downloadable chucks.

Besides the delay we noticed a lot of (mobile) browsers that did not fully support the HTML 5 video tag correctly.
So, since we are ops guys, we decided not to try and build that our selves. Instead we pushed the stream to youtube and integrated their player.

We did keep the RTMP streaming cluster, because we can. Also, the NGINX instances host the webapp with the youtube stream.

Wannahaves for version 2.0

  • Make the API and the twitterbot run on a HA setup. The API can be easily made HA with 2 pi’s and keepalived. The twitterbot would be more of a challenge, since it would trigger twice if we run it twice. This will require something like Corosync/Pacemaker
  • It should include an SSL cert (shame on me)
  • The resources should be devided over availability zones (it is included, but does not yet work)
  • Add ipv6 support
  • Loadtest with jmeter or tsung

How can we see this amazing christmas tree?

Go to http://kerstcloud.nl or open rtmp://kerstcloud.nl/live/cloudvpstree with your favorite player like VLC as a network stream for even lower latency!

CloudVPS wishes you all a Merry Christmas and a Happy New Year!

Cees Moerkerken

P.s. Till this day, the ir transmitter for the tv still doesn’t work…