Vault Configuration as Code

Here at Space Ape we use Vault extensively. All of our instances authenticate with Vault using the EC2 auth backend which allows us to restrict the scope of secrets any instance has access to.

Behind Vault, we use Consul as a backend to persist our secrets with a good level of durability and make use of Consul’s snapshot feature to create backups, which means we can restore both Consul and Vault from the backup if the worst case occurred.

Where we’ve struggled with Vault is in managing the configuration: which policies, roles, auth backends do we have? Which of our AWS accounts are setup for the EC2 auth and how do we update or replicate any of these configurations? If we had to set up a new instance of Vault, or recover an existing one, how long would it take us to get everything setup? Probably a lot longer than it should.

This isn’t something we accept elsewhere in our estate: We use CloudFormation to manage precisely how out AWS infrastructure looks; we use Chef to manage exactly how our instances are setup and applications are configured. All of this is configuration is stored in Git. In short we treat our configuration as code.

For those looking to manage configuration in Vault, help is at hand. In November 2016 Hashicorp’s Seth Vargo penned a blog post that caught our interest – Codifying vault policies and configuration – in which he describes how to use the Vault API to apply configuration from files. There a few things we can learn from Seth’s post:

  • The API calls are idempotent
  • The script ignores the response as you’ll often get non-200 responses (for instance if a mount already exists)
  • He maps the directory structure to the API, this makes it easy to rewrite the code in any language without having to change your directory structure.
  • API calls need to be applied in the correct order (e.g. An Auth backend must exist before you can apply configuration to it.
  • You can integrate this into your CI lifecycle.

A couple of things that are missing:

  • Code testing
  • Verifying the result of our API call was successful.

Taking Seth’s blog post as our starting point, we set out to implement configuration-managed Vault clusters using the API.

We use a lot of Ruby here so it makes sense to create a gem to apply our configuration for us and we can take the opportunity to apply unit tests. We can use Jenkins to test applying our actual configuration.

Requirements

  • Code should be tested
  • We should verify that our config has been applied correctly
  • We want a CI pipeline for our configuration.

We quickly realised that a lot of the process is repeated for each API endpoint:

  1. Locate files containing the configuration
  2. Parse the files containing the configuration
  3. Apply the configuration
  4. Verify the configuration

We have a Setup class that handles creating an instance of the Vault Client and locating the relevant files for each configuration type.

We created a Base class that our implementation (policies, auths backends etc) classes can inherit that will parse, apply and verify configuration.

Setup class

To create a Vault client it’s as simple as using the  Vault gem and providing the usual configuration details such as the address and a token.

We also have methods to locate the relevant files for any configuration item, such as policies. We simply need to supply the path to the directory in which the configuration files reside.

Base class

In the Base class we start by parsing files that the Setup class located for us. We accept hcl, yaml or json files and parse them into a hash.

We then call apply and verify methods which are implemented in classes specific to the configuration item such as Policies or Auths.

Policies

Applying policies is a good starting point as they represent a lot of our configuration and are referenced by other sections of configuration.

We save time by inheriting the the Base class we discussed above and we have an instance of the Setup class so that we can locate the files we need and have a Vault client to use.

We then implement the apply and verify methods. For a Policy the apply method is very simple, it simply uses the name of the file as the name of the policy and the contents of the file, which we translate into Json, as the body of the policy

client.sys.put_policy(name, hash.to_json)

Next we verify that our Policy was correctly applied. The first step of this is to request the policy from Vault which we can simply ask for by name (also filename).

client.sys.policy(name)

Then we can:

  1. Check that we received a Policy and not an error or an empty blob of Json.
  2. That the Json we receive matches the Json that we sent. We use the JsonCompare gem to verify each key value pair that is returned.
└── sys
    └── policy
        └── admins.hcl

The directory structure in which we store our policies. /sys/policy/admins would be the api path to post a policy if you wanted to use the API directly.

path "*" {
 policy = "sudo"
}

A really bad example of a Vault role that admin.hcl might contain. We parse this as HCL to post to /sys/policy/admins.

Testing

One of our requirements was to write tests. Below are our tests for policies.

require "spec_helper"

describe Spaceape::VaultSetup::Policy do
  subject do
    Spaceape::VaultSetup::Policy.new(
      Spaceape::VaultSetup::Setup.new(
        vault_address: "http://vault:8200",
        ssl_verify: false,
        config_dir: "spec/fixtures/main",
        vault_token: vault_token
      ),
      false
    )
  end

  let(:test_policy) do
    {
      "path": {
         "auth/app-id/map/user-id/*":
           {
             "policy": "write"
           }
        }
     }
   end

  it "applies and verifies a policy" do
    subject.apply("test-policy", test_policy)
      expect { subject.verify("test-policy", test_policy) }
        .to_not raise_error
  end

  it "identifies invalid policy" do
    subject.apply("test-policy", test_policy)
    wrong_role = test_policy.dup
    wrong_role[:path] = "/auth/app-id/map/uuuuuu/*"
    expect { subject.verify("test-policy", wrong_role) }
      .to raise_error(Spaceape::VaultSetup::ItemMismatchError)
  end

  it "applies all policies in config_dir" do
    subject.apply_items(subject.policy_files)
    expect(subject.client.sys.policies)
      .to include("test-policy2", "test-policy")
  end
end

From the test above you can see that can see that we test against a vault server at vault:8200. We run these tests in in Docker and make use of Docker compose so we can create a Vault server in dev mode and then a ruby container, with our code mounted in a volume, to run our tests.

Auths and Mounts

Policies were easy – we parse the file, make a single API call to apply the policy and another to verify it. Auths and Mounts are a bit more complicated. There are essentially three parts to each:

  1. Enable the Auth/Mount
  2. Tune the Auth Mount
  3. Configure the Auth Mount

Enabling is pretty simple you pass the name (this is what you want to call it), the type (such as secret, github or pki) and an optional description.

We store this information in sys/auth/<name>.ext. The API endpoint is sys/auth/<name>.

└── sys
    └── auth
        └── github-spaceape.json

This contents of this file may look like:

{
  "type": "github",
  "description": "spaceape github",
  "config": {
    "max_lease_ttl": "87600h",
    "default_lease_ttl": "3h"
  }
}

Notice it contains the type and description which we covered above. It also includes a config key, this is actually the tuning we can apply to the Auth/Mount. This is applied to the API endpoint sys/auth/<name>/tune so it seems to make sense to store it in this file.

So far so good, but now we come onto configuring the Auth or Mount. There’s no standard pattern here and they sometimes require secrets. We decided to exclude any secrets from the config. These can be applied as manual steps later. We can however apply some configuration.

For example we can set the organisation for the Github auth, but we don’t wouldn’t want to set the AWS credentials for the EC2 auth backend.

The API endpoint for applying configuration to Auths is auth/<name>/config and Mounts is <name>/config/<config_item>. We decided to group our mounts under a mounts directory, veering slightly from the file structure matching the API path.

Our directory structure now looks a little like this:

└── sys
|   └── auth
|   |   └── github-spaceape.json
|   └── mounts
|       └── spaceape-pki
└── auth
|   └── github-spaceape
|       └── config.json
└── mounts
    └── spaceape-pki
        └── config
        |   └── urls.json
        └── roles
            └── example-role.json

This is where mapping the file path as the API comes into it’s own: we can handle any of the Auths or Mounts without having explicitly write code for the exact type, we just have to get the structure correct.

Gotchas

There are a few things to look out for.

  1. When verifying our changes were applied Vault sometimes gives you more back than you expect. We just verify the fields we pass in.
  2. Time based fields (like the various ttl fields) are not always returned in the same format, you may get the time in seconds, or days and hour, etc. We found the chronic_duration gem useful for parsing the times for easy comparison.
  3. You may find some configuration on an Auth or Mount may have to be applied in a specific order, this is where we would have to write custom code to handle that particular type of Auth or Mount. Perhaps a configuration file could define the order in which to apply certain configuration.

Continuous Integration

When we check in code, a Jenkins job is triggered which will run our tests. As mentioned earlier we run our tests inside of Docker containers, this means that we don’t have to worry about having clashing versions of gems from other Ruby based builds we have on Jenkins.

More interesting to us is that we can now test our actual Vault configuration. So when we add a new policy we know it applies correctly. Again we use Jenkins to do this. Each time we commit a change to our Vault configuration git repository we trigger a build which attempts to apply the configuration to an instance of Vault running in Dev mode. If any of the configuration fails we can be prompted through Slack to see what caused it.

It’s still up to us to apply the changes to the production instance of Vault after the Jenkins tests have run successfully. This is mainly because we don’t want to give privileged Vault tokens out to Jenkins.

Final Words

The process we’ve described above for managing Vault configuration is just one way you could go about solving this problem. From our experience, it works – we can test our configuration and apply it in a repeatable and programmatic way.

It is however still a work in progress and the will doubtless be a few problems to overcome as we continue our development. We hope to Open Source the code in the future, but right now we feel there are still some improvements, for instance at the moment we test against Vault 0.6.5 (the latest release is 0.7.3) and we’ve only tested against a handful of Mounts and Authentication backends.