On Wednesday 13th January some members of our Devops team attended the CoreOS London Meetup hosted at BlackRock.
At Space Ape we are taking steps towards running some of our services in containers and are very interested in CoreOS and the ecosystem around it.
We started off with an overview of Kubernetes installation on AWS and covered some of the problems you may run into. One to note was pulling Kubernetes from GitHub. If you decide to pull from GitHub over a lot of nodes in a VPC you’ll probably run into API limits, to get around this you can host Kubernetes in an S3 bucket (or other endpoint you control)
and pull from there. This approach is also preferable as it allows you more granular control over the version of Kubernetes you are deploying.
Ric talked us through their CoreOS setup and how they’ve worked to implement AWS best practises in their cluster. To start with they run everything in a VPC and ensure that they deploy the cluster over multiple availability zones (AZ). Within each AZ they deploy at least two subnets, one which will contain their Elastic Load Balancers (ELB) and a second which contains the CoreOS nodes and shared storage nodes.
The CoreOS nodes are setup in autoscaling groups which has allowed them to scale the fleet up and down automatically. On top of this they’ve got the Kubernetes replication controller deploying the containers around the cluster, ensuring they’ve always got the desired amount. The autoscaling groups can either be scaled manually or they can
make use of cloudwatch alarms. One improvement they are working on is making use of custom metrics for scaling (e.g. container count) as right now they depend on the metrics from the hypervisors.
Outside of the autoscaling groups they will deploy the Kubernetes master node. This allows them to control the cluster and to ensure that the node will not get destroyed during a scaling event.
The approach to load balancers was very interesting. While Kubernetes can control ELB creation, ngineered opted not to use this and instead use ELBs for public access which forward the traffic to haproxy instances. These haproxy instances run on all of the CoreOS instances and forward the traffic to the right container via IP addresses defined for that service by Kubernetes. It was a good example of controlling costs while still ensuring that the setup was flexible.
It was really good to see a production Kubernetes cluster being deployed on AWS and to get an insight into the challenges that you could face doing it. It was helpful to get an insight into what works and what doesn’t and how careful design will help with cost control.
The slides from this talk are available on Google Docs.
The second talk of the evening was by Joseph Schorr from CoreOS who was telling us about the security work CoreOS have been doing at all levels of the stack.
Joseph started off talking about system compromise and asking at what level can you now trust the system? How do you know if the hardware/bootloaders have been compromised?
To combat this uncertainty CoreOS have been working on signing all levels of the system boot process using keys that are stored in a trusted platform module (TPM). At a high level this allows for a set of keys to be embedded in hardware on the system. First the TPM is verified to ensure it’s authentic. Once that has taken place each component in the system can be loaded, with a signature being checked at each stage. If the signature is validated the component will be loaded. If there are any failures the boot will halt.
This system ensures that by the time the OS has been loaded you’ve got a verified trail of each step of the process which can later be audited.
Within your trusted OS environment you can now deploy your containers. It would be expected that you would build your containers from within a trusted environment and have them signed. Using these signed containers you are able to verify and deploy them into your trusted OS environment.
This is a great step forward for security and integrity of the OS and should give administrators a lot more confidence in the systems they are deploying. CoreOS have blogged on this system providing far more technical detail on how it works.
Joseph then went on to talk about Quay.io and the steps they’ve been implementing to provide security insights into the containers they are hosting. Quay.io are using their new tool Clair to scan all of the containers they host against the CVE database (and common vendor databases) and report when your container contains an insecure package. Each layer of the container is scanned so you will get alerted on insecure packages that you might not know are present on the system. Alerts are triggered via webhooks which allow you to receive notifications in a number of different places.
If you’re using Quay.io to host your container images using the Clair scanning seems like something worth using right away.
The slides from this talk are also available on Google Docs.
This talk really showed in todays world with exploits getting more and more sophisticated, security needs to be thought of at all levels of the system from the hardware up. It also showed that through the use of clever tooling it’s possible to be alerted to potential problems in your infrastructure very soon after they are made public and how containers can help you to resolve those problems and quickly prove they are resolved.
Thanks to BlackRock for hosting, to Ric and Joseph for speaking and for the organisers for organising an enjoyable evening. We’re looking forward to the next one.