I’ve been an active user of AWS for over six years now. The application I work on – a vertical niche app for the mining industry – is hosted there. But as the lead developer – and sole DevOps type – of a small team with an aggressive development schedule, I haven’t ever managed to explore CloudFormation properly, or even organise our servers properly. The most I’d managed to do to keep them organised and under control was to use OpsWorks and Chef to manage configuration and deployment of our servers.
This Christmas period, however, we brought in a number of university students on work experience programs. Most of them went to help the mining engineers on their research programs – but I got one to help me tame our AWS environments. He did a lot of background research on AWS CloudFormation for me, and this week I’ve taken the time to put it to use. I’m going to write a few posts covering some of what I learnt – not just about CloudFormation, but how to configure networks ‘properly’ in AWS – for my own edification and later reference. If that helps some reader as well, I can live with that. 😉
What is CloudFormation?
Put simply, CloudFormation is a way of managing textual descriptions of AWS resources, and a set of tools to take those descriptions and configure AWS resources – such as servers, networks, and load balancers.
Being text-based, you can put the files into source control (very important!). With the tooling, you can do things like:
- create new ‘stacks’ of resources quickly
- apply changes to the stacks in a consistent fashion
- test configuration changes prior to applying them to your production system
- automatically check for ‘drift’ (changes to your infrastructure that isn’t in the files)
- automate the above using a build server
You know, all the good sensible adult things you should do with a production system that your livelihood depends on.
What’s the problem I’m solving
First – the app I’m working on isn’t a massive resource hog. It’s an industry-vertical niche web-based engineering app, with less than a thousand active users as a target (we aren’t there yet). It doesn’t need lots of servers, or to cope with typical load patterns. In fact, a lot of the time the app just sits idle, with nobody using it. When they do use it, that use can be fairly sporadic (a few requests per hour) – or they can do large-scale batch jobs that we have to bring up a hundred+ servers to manage. In other words, I’m not looking to solve problems common to consumer-facing sites.
I need to manage approximately 10 servers, each running a slightly different configuration of our app. This includes test environments, a ‘public’ demo site, and then several instances that run the same version but include different run-time plugins based on customer needs. (These plugins model different types of mining equipment, some of which are proprietary to the individual customer). The test environments get updated regularly, while the customer sites get updated roughly quarterly, as feature sets get finished. We don’t need staggered deployments that can be done with no downtime – but we do need to have a controlled process.
We also practice multi-tenancy, at least for customers that don’t have proprietary models. This is done by host-based routing.
What’s my goal here?
Well, besides simply putting as much of my infrastructure into CloudFormation as possible, I also want to improve my network layout to be more secure. So I’m also learning about private & public subnets, load balancers, and the things you need to do to make that all work. This is going to be a series of posts tackling such things as:
I’m not going to get all of my infrastructure into CloudFormation. Some of it you can’t (e.g. SSH keys). Some of it already exists outside of CloudFormation, and I don’t want to destroy it just to recreate it (e.g. my Route53 Hosted Zones). Some of it I just won’t get around to at this time. And some of it I do manage I won’t want to write up. But there’ll be a lot of notes here, and it’ll be in a generally agnostic approach.
I can’t promise that the advice here is “good”. In particular – I’m not a network guy. I know enough to get me in trouble, and some decent rules-of-thumb, but there may be some obvious mistakes here. All I can say is that this is what I’m doing, and it’s hopefully working for me.
I’ll update this page with links to the individual articles as they get written.