This is a technical post. You have been warned. Thar be dragons.

At SparkPost the SRE team is lean and built for efficiency. We are constantly hunting for ways to improve by promoting self-service among the various development teams. Before we dive into some cool tech we’ve developed, let me get you a bit of background. Our world is entirely built on AWS. We live and breathe clouds. We love the flexibility provided by “easy” autoscaling, “easy” infrastructure provisioning, and the “joys” of cloud computing. The SRE team is full of old graybeard sysadmins who believe in oddball things like “The Principle of Least Privilege”, have discussions about our Model M keyboards, and talk about how much we love vi. (We have one person who uses nano, but lets ignore that for now…)

So… Least Privilege? 

The principle means giving a user account or process only those privileges which are essential to perform its intended function.” us-cert.gov

In the AWS world, permissions are handled via IAM, an extremely powerful and complex tool. To ease entry into the fluffy internet clouds, AWS provides predefined IAM roles and policies which reduce the barrier of entry. Problem is, many of these predefined permissions are built in a generic fashion and ignore the Least Privilege Principle. For example, you might see something like this:

This works great if you need to get stuff out of SSM. However, it allows the instance using the policy to get everything out of SSM. Not ideal. Not Least Privilege

Your response may be “Dave! Just go create custom policies for each service, you don’t need to use those scary predefined policies!” Problem solved! Blog over!

Our goal as an SRE team is to build tooling and pipelines to empower developers to work on their own. The last thing I want is my team operating as a blocker for the fast moving development going on around us.

Remember me talking about SRE being a lean and efficient team? How about that self-service stuff? SRE was beat up by the constant requests for custom IAM policies. This is good, the development teams are embracing least privilege! However, it’s also a huge time suck. SRE decided to build a toolset to provide “IAM as Code”. We’ve been through a couple permutations of the tool. I’m going to walk you through the process. And SPOILERS give you our tooling. 

Phase One – A wild bash script appears

A few beers (okay maybe more than a few) and a bunch of hours of bash scripting later we had a tool! Utilizing Rundeck to automagically deploy IAM roles/policies, and Github PRs to gatekeep bad policies; this (frankly… awful) series of bash scripts was working! We had a POC running! Teams are able to write from scratch their own IAM policies and roles, commit them to GitHub, and then all the SRE team needs to do is review PRs and merge code. Easy peasy.

We ran this way for a few months, ironing out the process, we saw HUGE returns on our investment. Well over 200 permission request PRs in those first few months. HOURS of saved back and forth between development teams and SRE. Lots of fist bumps. Although, the tool had problems. Loads of bugs (beer related), it wasn’t doing much error checking, and took forever to run. 

Phase Two – Weyland-Yutani Corporation “Building Better Worlds

SparkPost already utilizes Terraform heavily for the rest of our infrastructure. Why not do the same with IAM? 

So we did. Enter DIYIAM! The result is a MUCH cleaner tool wrapped in Terraform. We get all the great stuff that comes with Terraform, now managing IAM. We intentionally kept our Terraform repos for other infrastructure separate from IAM to allow more granular control of permissions and auditing. Reviewing and validating IAM permissions is considerably easier when in an isolated repository from “Everything Else”. 

Want to take a look?

This is very much written as an internal tool. No warranties. No guarantees. Feel free to create issues on our GitHub.

The tool assumes you’re using S3/DynamoDB to manage Terraform state. It handles the state creation for you, once you define the account, bucket, and table.

Make sure you follow the setup instructions in the Readme!

Github

~ Dave