Devops at Scale: Videos

A big thank you to everyone who came to our offices to see our speakers last Wednesday night at the Devops at Scale event. A big thank you, also, to everyone involved in organising everything for the big night!

We’ve uploaded the videos of our talks, so if you weren’t able to come or are interested in what folk had to say, here they all are!

Steve Lowe: Devops at Scale: A Cultural Change

Sam Pointer: Smashing the Monolith for Fun and Profit: Telemetry-led Infrastructure at Hive

Louis McCormack: Monitoring at Scale

Upcoming Event: Devops at Scale

We’re working with Burns Sheehan, Wavefront and Hive to host an event at Space Ape HQ on Wednesday April 12th on the theme of Devops at Scale.

The evening will explore topics focussing on the adoption of DevOps at scale, hearing from businesses and individuals who have successfully driven these new DevOps approaches.

Richard Haigh and Steve Lowe will be speaking from Betfair, telling the story of their shift to DevOps and how pushing for attitudinal change drives effective DevOps implementation. From Hive, Sam Pointer will talk about how they’ve used a telemetry-first approach to break apart a monolithic application and implement infrastructure transformation at scale. Finally, Louis McCormack of Space Ape Games will take a look at the challenges of monitoring everything, when “everything” keeps changing.

You can learn more at the event page, where you can also sign up to attend.


Space Ape Live Ops Boot Camp – Part 2 (GDC Edition)

(This is the second in our series of posts on Live Ops.  Part 1 can be found here:

Last week at the annual Games Developers Conference (GDC) in San Francisco, Space Ape’s first product guy, Joe Raeburn, took to the stage to share what we’ve learned about Live Ops.

The full video will be available in the GDC Vault soon, but we thought we’d share a summary of the talk and annotated slides for those who can’t get access.

Apart from some quality trolling of Kiwis and puns of goats the point of the presentation was to help frame how studios should think about Live Ops.  Joe’s personal experience was formed at his previous company where he was a Product Manager on the hugely successful Sims Social.  That game rocketed to over 60m users in a matter of months but the studio paid a high price with 75% of the studio’s headcount consumed with running the game.  At Space Ape we nearly fell into the same trap with around the same number focussed on operating our live games in 2015.  

The solution:  we radically transformed how we develop and operate games to comply with Joe’s first commandment:

“LEAN COMMANDMENT 1: Thou Shalt Go Lean … or thine studio shalt be encumbered with unsustainable weight and die.”

Today, we are 100 people, with a little over one third of the studio working on live games that more than pay the bills, freeing up the majority of developers to work on new transformative games.

Joe’s talk shows how we quantify the impact of Lean Live Ops and how we designed systems to ensure that the most desirable content we wanted players to chase, was able to be produced cheaply without reliance on developers.   He also shows how we’ve carried this philosophy through to the design of our new games Super Karts and Fastlane: Road to Revenge.

The full slideshare presentation can be found here:

Space Ape Live Ops Boot Camp

We spend a lot of time swapping notes with other game developers about one thing or another, but by far, the most common questions we get are about Live Ops. How are our teams organised? What tools and tech did we build vs buy? How impactful is it really? How do we avoid cannibalisation screwing up an already successful game? What stats do we track? How transferable is what we do to other genres?

In three years our games have generated over $80m revenue.  Depending on how you account for it, Live Ops initiatives generated between one and two-thirds of that revenue. At Space Ape Live Ops is more of a pervasive philosophy than a discrete team or tool. We don’t really distinguish in our sprints or dev teams between game features and tools, or between community marketing and events because Live Ops underpins everything we do.  

I thought it would be nice to pull together some of the presentations we’ve shared on this topic over the years into one place.  Combined, they paint a good picture of how we manage our games over the long haul in a very efficient way, freeing up front line developers to work on new concepts.

Plan Now, Live Forever: Engineering Mobile Games for Live Ops from Day 1 

Download the slides here

Designing Successful Live Ops Systems in Free to Play Gacha Economises 

Live Ops Lead Andrew Munden (formerly Live Ops at Kabam and Aeria) shares the content strategies that work in gacha collection games as well as how to build a manageable content furnace and balance player fatigue in a sustainable way.

A Brief History of In-Game Targeting 

Analytics lead Fred Easy (ex Betfair, Playfish/EA) will share the evolution of his offer targeting technology from its belt and braces beginnings to sophisticated value based targeting and the transition to a dynamic in-session machine learning approach.

Under the Hood: Rival Kingdom’s CMS tools

Game changing content is introduced to Rival Kingdoms every month, with in-game events at least every week. Product Manager Mitchell Smallman (formerly Rovio, Next Games) and Steven Hsiao (competitive StarCraft player turned community manager turned Live Ops lead) will demonstrate the content management tools that allow them to keep the game fresh for players without developer support.

Games First Helsinki (April 2016) 

Product Owner Joe Raeburn and Game Designer John Collins talk to the Finnish gaming scene about how we transitioned Samurai Siege into Live Ops mode and laid the foundation of our current Live Ops platform. A good overview of the journey to Live Ops and introduction of the high-level concepts that underpin our approach: the Tools, the Toybox and the Treadmill.

GDC Europe  (June 2016)

Life After Launch: How to Grow Mobile Games with In-Game Events. COO Simon Hade talks generally about Live Ops, the impact it has had on the business and shares more detail about the Space Ape Toybox for in-game events.

The Great British Big Data Game Analytics Show & Tell (February 2015)

Space Ape Analytics Stack. A somewhat outdated but still relevant overview of the original data stack used in Samurai Siege.

I’d love to hear from studios who have used any of the content we shared, or have different approaches that could help us improve. We get inspired by the stories that come out of Finland about the gaming ecosystem’s willingness to share and collaborate and how that ultimately benefits everyone. Hopefully, this will inspire other London developers to do likewise.

Custom Inspec Resources

When developing Chef cookbooks, a good test suite is an invaluable ally. It confers the power of confidence, confidence to refactor code or add new functionality and be…confident that you haven’t broken anything.

But when deciding how to test cookbooks, there is a certain amount of choice. Test Kitchen is a given, there really is no competition. But then do you run unit tests, integration tests, or both? Do you use Chefspec, Serverspec or Inspec? At Spaceape we have settled on writing unit tests only where they make sense, and concentrating on integration tests: we want to test the final state of servers running our cookbooks rather than necessarily how they get there. Serverspec has traditionally been our framework of choice but, following the lead of the good folks at Chef, we’ve recently started using Inspec.

Inspec is the natural successor to Serverspec. We already use it to test for security compliance against the CIS rulebook, so it makes sense for us to try and converge onto one framework. As such we’ve been writing our own custom Inspec resources and, with it being a relatively new field, wanted to share our progress.

The particular resource we’ll describe here is used to test our in-house Redis cookbook, sag_redis. It is a rather complex cookbook that actually uses information stored in Consul to build out Redis farms that register themselves with a Sentinel cluster. We’ll forego all that complexity here and just concentrate on how we go about testing the end state.

In the following example, we’ll be using Test Kitchen with the kitchen-vagrant plugin.

Directory Structure:

Within our sag_redis cookbook, we’ll create an inspec profile. This is a set of files that describe what should be tested, and how. The directory structure of an inspec profile is hugely important, if you deviate even slightly then the tests will fail to run. The best way to ensure compliance is to use the Inspec CLI, which is bundled with later versions of the Chef DK.

Create a directory test/integration then run:

inspec init profile default

This will create an Inspec profile called ‘default’ consisting in a bunch of files, some of which can be unsentimentally culled (the example control.rb for instance). As a bare minimum, we need a structure that looks like this:

└── integration
│       ├── default
│       │   ├── controls
│       │   ├── inspec.yml
│       │   └── libraries

The default inspec.yml will need to be changed, that should be self-evident. The controls directory will house our test specs, and the libraries directory is a good place to stick the custom resource we are about to write.

The Resource:

First, lets take a look at what an ‘ordinary’ Inspec matcher looks like:

describe user('redis') do
  it { should exist }
  its('uid') { should eq 1234 }
  its('gid') { should eq 1234 }

Fairly self-explanatory and readable (which incidentally was one of the original goals of the Inspec project). The purpose of writing a custom resource is to bury a certain amount of complexity in a library, and expose it in the DSL as something akin to the above.

The resource we’ll write will be used to confirm that on-disk Redis configuration is as we expect. It will parse the config file and provide methods to check each of the options contained therein. In DSL it should look something like this:

describe redis_config('my_redis_service') do
  its('port') { should eq(6382) }
  its('az') { should eq('us-east-1b') }

So, in the default/libraries directory, we’ll create a file called redis_config.rb with the following contents:

class RedisConfig < Inspec.resource(1)
  name 'redis_config'

  desc '
    Check Redis on-disk configuration.

  example '
    describe redis_config('dummy_service_6') do
      its('port') { should eq('6382') }
      its('slave-priority') { should eq('69') }

  def initialize(service)
    @service = service
    @path = "/etc/redis/#{service}"
    @file = inspec.file(@path)

      @params = Hash[*@file.content.split("\n")
                           .reject{ |l| l =~ /^#/ or l =~ /^save/ }
                           .collect { |v| [ v.chomp.split ] }
        rescue StandardError
          return skip_resource "#{@file}: #{$!}"

  def exists?

  def method_missing(name)


There’s a fair bit going on here.

The resource is initialised with a single parameter – the name of the Redis service under test. From this we derive the @path of the its on-disk configuration. We then use this @path to initialise another Inspec resource: @file.

Why do this, why not just use a common-or-garden ::File object and be done with it? There is a good reason, and this is important: the test is run on the host machine, not the guest. If we were to use ::File then Inspec would check the machine running Test Kitchen, not the VM being tested. By using the Inspec file resource, we ensure we are checking the file at the given path on the Vagrant VM.

The remainder of the initialize function is dedicated to parsing the on-disk Redis config into a hash (@params) of attribute:value pairs. The ‘save’ lines that configure bgsync snapshotting are unique in that they have more than one value after the parameter name, so we ignore them. If we wanted to test these options we’d need to write a separate function.

The exists? function acts on our Inspec file resource, returning a boolean. Through some Inspec DSL sleight-of-hand this allows us to use the matcher it { should exist } (or indeed it { should_not exist } ).

The final function delegates all missing methods to the @params hash, so we are able to reference the config options directly as ‘port’ or ‘slave-priority’, for instance.

The Controls:

In Inspec parlance, the controls are where we describe the tests we wish to run.

In the interests of keeping it simple, we’ll write a single test case in default/controls/redis_configure_spec.rb that looks like this:

describe redis_config(“leaderboard_service") do
  it { should exist }
  its('slave-priority') { should eq('50') }
  its('rdbcompression') { should eq('yes') }
  its('dbfilename') { should eq('leaderboard_service.rdb') }

The Test:

Now we just need to instruct Test Kitchen to actually run the test.

The .kitchen.yml file in the base of our sag_redis cookbook looks like this:

  name: vagrant
  require_chef_omnibus: 12.3.0
  provision: true
    - vagrant.rb

  name: chef_zero

  name: inspec

  - name: ubuntu-14.04
      box: ubuntu64-ami
        memory: 1024

  - name: default
        environment: test
      - role[sag_redis_default]

Obviously this is quite subjective, but the important points to note are that we set the verifier to be inspec and we provide the name: default to the particular suite we wish to test (recall that our Inspec profile is called ‘default’).

And thats it! Now we can just run kitchen test and our Inspec custom resource will check that our Redis services are configured as we expect.

Space Ape are hiring for Devops

Here on the Space Ape Devops Team, we’ve been busy building out the tech for our next generation of mobile games and now it’s time to bring some fresh faces onto the team to help continue our journey. If you’re a passionate technologist, Devops engineer or infrastructure wrangler then we’d love to hear from you.

Being a Devop at Space Ape is an important role. On our existing titles, you’ll be responsible for maintaining the quality of our players’ experience, working with the team to roll out new features and upgrades and finding new ways to optimise the stacks. On our new titles, you’ll be working with the development team to build out new stacks, solve new problems and prepare for big scale launches.

Along the way, you’ll learn how we use tools to build and update our stacks and roll them out without impacting our players and developers. You’ll also learn how we write those tools in Ruby, Angular and sometimes Go. Eventually, you’ll learn what it is that our teams need and start bringing fresh new ideas for how we can make things better; perhaps improving our containerisation platform, serverless workloads or the security of our platforms.

If you’re interested, have a poke round some of our other posts and drop your details in on our careers page where you can find out a bit more about the Devops role and the technology we use.

Trajectory prediction with Unity Physics

In some recent prototyping work, we needed to display a prediction for a projectile trajectory in the game. You’ve probably seen something similar in many games before, such as Angry Birds:


The tutorial from Angry Birds 2. Note the dotted line, showing you the predicted trajectory of your bird, if you released the slingshot now.

Our prototype game was in Unity, and the projectile was set up using Unity’s physics engine. We had several requirements for the prediction:

  • Immediate. Player input can change from frame to frame, and the prediction needs to stay in sync with it.
  • Accurate. The time of flight could be several seconds, and any small error will accumulate to product significantly incorrect results.
  • Simulates drag. We’re using drag on our rigidbody, which many solutions do not account for.

I assumed this sort of problem came up often and searched online to see what popular implementations were out there. They generally fell into three groups:

  • Accurate, but slow. These solutions introduce an invisible projectile clone into the world and launch it along the flight path, recording its motion over time. As there’s no way to step the Unity physics simulation along yourself, you have to wait for this prediction in real time. This means that a three-second flight takes three seconds to fully predict. This is far too slow – the prediction would constantly lag behind the player’s changing inputs.
  • Doesn’t include drag. There are some good, accurate solutions, but most will specifically rule out drag.
  • Inaccurate. Some combination of incorrect equations, assumptions, and approximations meant that with longer flight times and more drag (or different gravity) the prediction would be wrong.

Perhaps the perfect solution for us is out there, but I hadn’t found it. By combining existing solutions and running some tests, I came up with my own implementation, which is presented below.

 public static Vector2[] Plot(Rigidbody2D rigidbody, Vector2 pos, Vector2 velocity, int steps)
     Vector2[] results = new Vector2[steps];
     float timestep = Time.fixedDeltaTime / Physics2D.velocityIterations;
     Vector2 gravityAccel = Physics2D.gravity * rigidbody.gravityScale * timestep * timestep;
     float drag = 1f - timestep * rigidbody.drag;
     Vector2 moveStep = velocity * timestep;
     for (int i = 0; i < steps; ++i)
         moveStep += gravityAccel;
         moveStep *= drag;
         pos += moveStep;
         results[i] = pos;
     return results;

This function plots the trajectory of a rigidbody under the effect of Unity’s physics by simulating some FixedUpdate iterations and returning the positions of the projectile at each iteration. It uses the global Physics2D.gravity setting, and takes into account rigidbody drag and gravityScale. Note that the mass of the rigidbody is irrelevant.

float timestep = Time.fixedDeltaTime / Physics2D.velocityIterations;

The code attempts to produce the same results as running the normal Unity physics iterations. To do this, it must also run as an iterative solution. A common error here is to assume that one iteration is run every FixedUpdate(). Instead, the number of iterations to be performed is accessible and tweakable – it’s Physics2D.velocityIterations. This helps us compute the timestep.

Vector2 gravityAccel = Physics2D.gravity * rigidbody.gravityScale * timestep * timestep;

We take into account the rigidbody’s gravityScale property when computing the effect of gravity. We found that we wanted a different amount of gravity and drag on each object, so this per-body setting was really helpful.

float drag = 1f - timestep * rigidbody.drag;

Drag acts as a reduction on moveStep in each iteration. We can compute it upfront and then apply it to each step of the iteration, producing a cumulative effect.

for (int i = 0; i < steps; ++i)
     moveStep += gravityAccel;
     moveStep *= drag;
     pos += moveStep;
     results[i] = pos;

Finally, the main loop. Each iteration, you’ll move due to gravity, reduce the movement due to drag, and then accumulate and store the new position in the results.

This solution worked well for us. While not exhaustively tested, we used it for projectiles that had lots of different velocities and drags, and it proved accurate each time, even after 4-5 seconds of flight.



There’s a lot of computation involved for long trajectories. With default settings, you have to run the loop 400 times for each second of flight you want to predict. We only have one projectile to predict in our prototype, so we’re just running one prediction, which doesn’t cost very much. If you used this to predict lots of projectiles for lots of different launchers in a large scale game, perhaps this would begin to be a problem for you.

Also, it’s only simulating the trajectory, and not actually running the physics engine or simulating anything else in the game. This means it doesn’t predict collisions or collision resolution. If you render this path as-is, it’ll just clip through walls or other obstacles in the world, which obviously isn’t actually what will happen when the projectile is launched.

These drawbacks weren’t a problem for our prototype, so it turned out to be pretty useful code. We share this now in the hope that someone else out there is faced with the same requirements and finds it useful too.


Possible Future Upgrades

I have some ideas around the drawbacks of this method. This is the main area for improvement, as the actual functionality is fine.

For performance, no profiling or optimisation work has been performed. I’ve just laid things out in the way that made sense to me. It’s hard to guess at optimisations, but perhaps a little profiling would reveal some simple speedups. The bigger step would be to push this code out to a native dll and get down to nitty-gritty c++ optimisation – perhaps with SIMD instructions. You can’t parallelise the steps (each iteration of the loop depends on the result of the previous) but you could parallelise multiple projectile predictions – e.g. if you have many projectiles, run 4 or 8 predictions in parallel.

The other big upgrade is around prediction. For some games a true prediction would be really valuable – for example, visualising the outcome of collisions and reactions in a pool table game. You’d want to see the predicted path of the ball, even after several bounces. This isn’t going to happen with any simple model if you have any in-depth physics properties. You’d need a big shift in your approach – to run the physics engine yourself. I’d find an appropriate existing physics engine and build it into the game/Unity, which is a shame as it’s duplicating the work that Unity’s already done. But after doing that, you’d have control over the physics simulation and how you update it.

You’d try for a setup where you’d be able to clone the existing simulation and run some update ticks – to, essentially, look into the future – you’ll be tracking the future state of the simulation, assuming no inputs change. This would have to be a separate simulation, as you wouldn’t want the actual state of the pool table to change – just to compute the predicted future state. This will be even more expensive that just running the basic prediction code we had above – it’s the full physics simulation.

Go Wavefront!

Long ago we took the decision to outsource our metrics platform. We generate a lot of metrics, and we came to realise that our solution at the time, Graphite, was not up to the task. Instead of spending in-house resource building a new platform, we decided to find an external partner, so we could focus on our core competency – running mobile games.

We eventually settled on Wavefront, in private beta at the time. Even in these early stages of their development, we were wildly impressed with the product. The responsiveness of the graphs in-browser, and the stability of the metric ingestion platform particularly impressed us.

This was over 2 years ago. Since then we have grown alongside Wavefront and watched as they came out of stealth mode, and continuously added to their bevy of features to offer the world-beating product they have today.

We contributed largely to their Ruby client, which has been open-sourced and continues to improve. But now we’re happy to announce another OSS project, go-wavefront.

Go-wavefront is a set of Golang libraries and a bundled CLI for interacting with the Wavefront API. It also includes a simple Writer library for sending metrics. It was borne out of an itch we needed to scratch to integrate the smattering of Go applications we have with our metric provider. We hope that in opening it up to the wider Wavefront and Golang community we can improve what we have, and be better able to keep up with the new features Wavefront throw at us.

As a cute-but-probably-useless gimmick, the CLI can plot a Wavefront graph live in the terminal window! Check it out, all feedback and pull requests are welcome.


Introducing ComposeECS

The DevOps team here at Space Ape have just open-sourced a small Ruby gem that provides a mechanism to convert Docker Compose specifications to AWS EC2 Container Service task definitions – we’ve called it ComposeECS.

We run the majority of our infrastructure on AWS, using source-controlled CloudFormation templates to manage each of our stacks. Over time we’ve built up a toolchain to help us manage changes to our CloudFormation stacks known internally as ApeStack which incorporates CFNDSL and our internal conventions and processes.

Toward the end of 2015 we started to build out a new platform that would support deployments of containerised applications where it made sense. As heavy users of AWS, EC2 Container Service (ECS) was the obvious choice for running containers in the cloud. The potential advantages of deep integration with AWS services like ELB and IAM have significant implications when it comes to integrating the new platform with our existing stack.

As a team, we are huge fans of specifying configuration in YAML. It’s then perhaps no surprise that we much prefer the syntax of Docker Compose definitions over the JSON-based ECS task definitions. We wanted to be able to specify our task definitions in YAML alongside our CloudFormation templates. Furthermore, we wanted to construct ECS services and their supporting infrastructure with a single command. To this end, we wrote ComposeECS.

ComposeECS reads any Docker Compose file and translates supported attributes(including volume definitions) into an equivalent ECS task definition JSON – sanity-checking your attributes and ensuring compatibility with the ECS task definition specification as it does so. The advantages ComposeECS provides include:

  • Container definitions are more readable and therefore easier to maintain.
  • Docker Compose definitions written to run in our local Docker environment now run on ECS with little modification.
  • Unlike services translated with the ECS-CLI, which supports Docker Compose deployment, our services are CloudFormation-managed whilst still taking advantage of the Docker Compose syntax.

We’re very pleased with ComposeECS. However, it is but one of many hurdles on the path to a production-ready Space Ape container platform. For us, many questions remain around service discovery; efficient and reliable deployment; injection of configuration; handling scale and how to effectively monitor our cluster. We look forward to sharing more from our journey.

If you’re interested in integrating ComposeECS into your toolchain, or contributing, head over to the project Github page.