Building a Custom Terraform Provider for Wavefront

At Space Ape we’re increasingly turning to Golang for creating tools and utilities, for example – De-comming EC2 Instances With Serverless and Go. Inevitably we’ll need to interact with our metric provider – Wavefront. To this end, our colleague Louis has been working on a Go client for interacting with the Wavefront API, which allows us to query Wavefront and create resources such as Alerts and Dashboards. Up until now, we’ve been configuring these components by hand, which worries us – what happens if they disappear or are changed? How do we revert to a known good version or restore a lost Dashboard?

We were set to start creating our own tool for managing Wavefront resources, but as luck would have it Hashicorp released version 0.10.0 of Terraform which splits providers out from the main Terraform code base and allows you to load custom (not managed by Hashicorp) providers without recompiling Terraform.

So we set about creating a custom provider and have so far implemented Alerts, Alert Targets and Dashboards and fully intend to continue adding functionality to both the SDK and the provider in the future.

Now creating an Alert is as simple as:

resource "wavefront_alert" "a_terraform_managed_alert" {
 name                   = "Terraform Managed Alert"
 target                 = "test@example.com"
 condition              = "ts()"
 display_expression     = "ts()"
 minutes                = 4
 resolve_after_minutes  = 4
 additional_information = "This alert is triggered because..."
 severity               = "WARN"

 tags = [
   "terraform",
 ]
}

You can find the latest released version, complete with binary here.

Creating our own provider for Wavefront means that we get all the benefits of Terraform; resource graphs, plans, state, versioning and locking with just a little bit of effort required by us. Hashicorp has made a number of helper methods which means that writing and testing the provider is relatively simple.

Another benefit to writing a provider is that we can use the import functionality of Terraform to import our existing resource into state. Hopefully, Hashicorp will improve this to generate Terraform code soon, in the meantime, it shouldn’t be too difficult to script turning a state file (JSON) into Terraforms HCL.

Using the Provider

Terraform is clever enough to go and fetch the officially supported providers for you when you run terraform init. Unfortunately, with custom providers, it’s a little bit more complicated. You need to build the binary (We upload the compiled binary with each of our releases) and place it in the ~/.terraform.d/plugins/darwin_amd64/ (or equivalent for your system). Now when we run terraform init it will be able to find the plugin. After this the setup is pretty simple:

provider "wavefront" {
 address = "foo.wavefront.com"
 token   = "wavefront_token"
}

You can export the address and token as an environment variable (WAVEFRONT_ADDRESS and WAVEFRONT_TOKEN respectively) to avoid committing them to Source Control (We highly recommend you do this for the token!).

Writing your own Provider

If you fancy having a go at writing your own provider then this blog post by Hashicorp is a good way to get started. I’d also recommend taking a look at the Hashicorp supported providers and using them as a reference when writing your own.

 

How to load test a realtime multiplayer mobile game with AWS Lambda and Akka

Tencent’s Kings of Glory is one of the top grossing games worldwide in 2017 so far.

Over the last 12 months, we have seen a number of team-based multiplayer games hit the market as companies look to replicate the success of Tencent’s King of Glory (known as Arena of Valor in the west) which is one of the top grossing games in the world in 2017.

Even our partners Supercell has recently dipped into the genre with Brawl Stars, which offers a different take on the traditional MOBA (Multiplayer-Online-Battle-Arena) formula.

Supercell’s Brawl Stars offers a different experience to the traditional MOBA format, it is built with mobile in mind and prefers simple controls & maps, as well as shorter matches.

Here at Space Ape Games, we have been exploring ideas for a competitive multiplayer game, which is still in prototype so I can’t talk about it here. However, I can talk about how we use AWS Lambda to load test our homegrown networking stack.

Why Lambda?

The traditional approach of using EC2 servers to drive the load testing has several problems:

  • slow to start : any sizeable load test would require many EC2 instances to generate the desired load. Since it costs you to keep these EC2 instances around, it’s likely that you’ll only spawn them when you need to run a load test. Which means there’s a 10–15 mins lead time before every test just to wait for the EC2 instances to be ready.
  • wastage : when the load test is short-lived (say, < 1 hour) you can incur a lot of wastage because EC2 instances are billed by the hour with a minimum charge for one hour (per-second billing is coming to non-Windows EC2 instances in Oct 2017, which would address this problem).
  • hard to deploy updates : to update the load test code itself (perhaps to introduce new behaviours to bot players), you need to invest in the infrastructure for updating the load test code on the running EC2 instances. Whilst this doesn’t have to be difficult, after all, you probably already have a similar infrastructure in place for your game servers. Nonetheless, it’s yet another distraction that I would happily avoid.

AWS Lambda addresses all of these problems.

It does introduce its own limitations — especially the 5 min execution time limit. However, as I have written before, you can work around this limit by writing your Lambda function as a recursive function and taking advantage of container reuse to persist local state from one invocation to the next.

I’m a big fan of the work the Nordstrom guys have done with the serverless-artillery project. Unfortunately we’re not able to use it here because the game (the client app written in Unity3D) converses with the multiplayer server in a custom protocol via TCP, and in the future that conversation would happen over Reliable UDP too.

Akka

Our multiplayer server is written in Scala with the Akka framework. To help us optimize our implementation we collect lots of metrics about the Akka system as well as the JVM — GC, heap, CPU usage, memory usage, etc.

The Kamon framework is a big help here, it made quick work of getting us insight into the running of the Akka system — no. of actors, no. of messages, how much time a message spends waiting in the mailbox, how much time we spend processing each message, etc.

All of these data points are sent to Wavefront, via Telegraf.

We collect lots of metrics about the Akka system and the JVM.

We also have a standalone Akka-based load test client that can simulate many concurrent players. Each player is modelled as an actor, which simulates the behaviour of the Unity3D game client during a match:

  1. find a multiplayer match
  2. connect to the multiplayer server and authenticate itself
  3. play a 4 minute match, sending inputs at 15 times a second
  4. report “client side” telemetries so we can collect the RTT (Round-Trip Time) as experienced by the client, and use these telemetries as a qualitative measure for our networking stack

In the load test client, we use the t-digest algorithm to minimise the memory footprint required to track the RTTs during a match. This allows us to simulate more concurrent players in a memory-constrained environment such as a Lambda function.

AWS Lambda + Akka

We can run the load test client inside a Java8 Lambda function and simulate 100 players per invocation. To simulate X concurrent players, we can create X/100 concurrent executions of the function via SNS (which has an one-invocation-per-message policy).

To create a gradual ramp up in load, a recursive Orchestrator function will gradually dial up the no. of current executions by publishing more messages into SNS, each triggering a new recursive load test client function.

LoadTest function that is triggered by API Gateway allows us to easily kick off a load test from a Jenkins pipeline.

Using the push-pull pattern (see this post for detail), we can track the progress of all the concurrent load test client functions. When they have all finished simulating their matches, we’ll kick off the Aggregator function.

The Aggregator function would collect the RTT metrics published by the load test clients and produce a report detailing the various percentile RTTs.

{
  "loadTestId": "62db5790-da53-4b49-b673-0f60e891252a",
  "status": "completed",
  "successful": 43,
  "failed": 2,
  "metrics": {    
    "client-interval": {      
      "count": 7430209,
      "min": 0,
      "max": 140,
      "percentile80": 70.000000193967,
      "percentile90": 70.00001559848,
      "percentile99": 71.000000496589,
      "percentile99point9": 80.000690623146,
      "percentile99point99": 86.123610689566
    },    
    "RTT": {      
      "count": 744339,
      "min": 70,
      "max": 320,
      "percentile80": 134.94761466541,
      "percentile90": 142.64720935496,
      "percentile99": 155.30086042676,
      "percentile99point9": 164.46137375328,
      "percentile99point99": 175.90215268392
    }
  }
}

If you would like to learn more about the technical challenges in developing successful mobile games, come join us for an evening of talks, drinks, food and networking in our office on the 12th Oct.

We’re running a free event in partnership with AWS where we will talk about:

  • the opportunities and challenges in building a realtime multiplayer game
  • data science and machine learning
  • serverless with AWS Lambda (by Dr Steve Turner from AWS)

Get your free ticket here!