My Profile Photo

AndrewCz


Using liberty-minded opensource tools, and using them well


Hashi-cloud




After reviewing my approach, I decided I'd try to figure out what advantages the cloud could do for me beyond what a colo cloud. This is going to be utilizing mainly hashicorp tools to give them a test drive.

Workspace

Tool installation

To start this craziness, I spun up a Manjaro Cinnamon VM in Virtualbox in order to provide a temporary work environment, since it will have the latest and greatest packages, and a sane DE Linux-based setup. Next, I installed the two tools that I should be using for this - packer

$ cd ~/Downloads
$ wget https://releases.hashicorp.com/packer/1.3.3/packer_1.3.3_linux_amd64.zip
$ wget https://releases.hashicorp.com/terraform/0.11.10/terraform_0.11.10_linux_amd64.zip
$ for i in ./*.zip; do unzip "${i}"; done
$ for i in ./packer ./terraform; do sudo mv "${i}" /usr/local/bin/; done

Project Structure

Since we are going to be using Packer and Terraform for sure, let’s configure a sane Project Structure

$ cd ~/Documents; mkdir hashidemo; cd hashidemo
$ mkdir -p packer/{files,scripts,templates,logs} tfvars
$ ssh-keygen -t ed25519

Manual Workflow

First I want to deploy a base Apache server on DigitalOcean, as a POC.

Create DO Space

Packer

To find the available images, pipe the output of the list images API call to python -m json.tool | grep -e 'distribution\|slug\|name'.

$ cd packer
$ cat << EOF > ./build_machine_image.sh
#!/usr/bin/env bash

set -e

packer build \
  -var "api_token=$DIGITALOCEAN_API_TOKEN" \
  ./packer/templates/base.json | tee ./packer/logs/packer_output.txt
EOF
$ chmod +x ./build_machine_image.sh
$ cat << EOF > templates/base.json
{
  "builders": [
    {
      "type": "digitalocean",
      "api_token": "{{user `api_token`}}",
      "region": "nyc1",
      "image": "fedora-28-x64",
      "ssh_username": "root",
      "size": 512mb
    }
  ],
  "provisioners": [
    {
      "type": "shell",
      "inline": [
        "sudo dnf update -y",
        "sudo dnf install -y httpd"
      ]
    },
  ]
}
EOF
$ export DITITALOCEAN_API_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This will return “A snapshot was created: ‘packer-XXXXXXXXX’ (ID: YYYYYYYYYY)”. This ID is the new identifier for your new built machine image.

To be fair though, this is not an actual machine image, but rather a snapshot on DigitalOcean’s “Images” feature. Also, the built fedora-28 atomic image totalled 1.24G in size after this process, as opposed to the 692MB that a stock qcow2 image takes up.

You can put that image into a variable file manually. There is definitely ways to do this programmatically, but we’re doing MVP-level work here, and I’m not talking “Most Valuable Programmer”…

$ cat tfvars/packer_machine_image.tfvars
packer_built_machine_image = "YYYYYYYYYYY"

Terraform

Next, we need to set up the terraform config that defines the totality of our infrastructure. This should be easy with a pre-existing image and only a couple of VMs, but let’s see how far we can get.

$ cat << EOF > main.tf
variable "packer_built_machine_image" {}
variable "do_token" {}

provider "digitalocean" {
    token = "${var.do_token}"
}

resource "digitalocean_droplet" "packer-image-1" {
    image = "${var.packer_built_machine_image}"
    name = "packer-image-1"
    region = "nyc1"
    size = "512mb"
    ssh_keys = [
        "XXXXXXXXXXXXXXXXXXXXXXXX"]
}
EOF
$ cat << EOF > apply.sh
#!/usr/bin/env bash

printf "\n\n\t\033[35;1mTerraform Apply\033[0m\n\n"

terraform get

terraform apply \
    -var-file=tfvars/packer_machine_image.tfvars \
    -var "do_token=${DIGITALOCEAN_API_TOKEN}"\
    -auto-approve=false
EOF
$ chmod +x ./apply.sh
$ terraform init
$ ./apply.sh

You can get your SSH key fingerprint from the GUI (Account -> Security) or from the API

POC success

This constitutes a success of this particular POC. It is by no means the end of working with Hashicorp tooling. While I was impressed by the ease of use, I am very concerned that there seem to be a dearth of best practices and what feels like a solid foundation - neither seem to be present in the tooling provided. That being said, both tools worked just fine.

My biggest avoidance to Terraform would be the necessity of a state file. My biggest issue with Packer is that it stores images when it sees fit to, instead of offering options to the operator. And they’re both distributed as precompiled binaries instead of with something that an operations-leaning person could maintain, which makes me doubt their mindset coming into this.

Round 2

Here I want to start diving into doing the stuff that I chose this route in order to do. Namely, to use terraform to create infrastructure as code (in all senses of the phrase; not just VMs, and not ad-hoc;) and use packer to create immutable images for my infrastructure as code.

Terraform

Multiple VMs

First, let’s spin up multiple VMs all running our shitty image:

Oh how I wish they would’ve just used YAML

$ cat << EOF > round2.tf
variable "packer_built_machine_image" {}
variable "do_token" {}

provider "digitalocean" {
    token = "${var.do_token}"
}

resource "digitalocean_droplet" "packer-image-1" {
    count = '3'
    image = "${var.packer_built_machine_image}"
    name = "packer-image-${count.index + 1}"
    region = "nyc1"
    size = "512mb"
    ssh_keys = [
        "XXXXXXXXXXXXXXXXXXXXXXXX"]
}

Load Balancer

Now I want a load balancer to load-balance those multiple VMs all running our shitty image:

Terraform doesn’t take single quotes? Wow…..

$ cat << EOF > round2.tf
variable "packer_built_machine_image" {}
variable "do_token" {}

provider "digitalocean" {
    token = "${var.do_token}"
}

resource "digitalocean_droplet" "packer-image" {
    count = "1"
    image = "${var.packer_built_machine_image}"
    name = "packer-image-${count.index + 1}"
    region = "nyc1"
    size = "512mb"
    ssh_keys = [
        "XXXXXXXXXXXXXXXXXXXXXXXX"]
}

resource "digitalocean_loadbalancer" "packer-image-lb" {
    name = "packer-image-lb"
    region = "nyc1"
    forwarding_rule {
        entry_port = 80
        entry_protocol = "http"

        target_port = 80
        target_protocol = "http"
    }

    healthcheck {
        port = 22
        protocol = "tcp"
    }

    droplet_ids = ["${digitalocean_droplet.packer-image.*.id}"]
}

Here the unfortunately named “splat” syntax is being used. This is because our digitalocean_droplet.packer-image is no longer a single object, but a list of droplets. The droplet_ids assignation at the very end creates a list of all of their IDs and assigns them to the load balancer.

TIL there is a type system in terraform.

Block Storage

Many times, I’ll have a service that requires a data directory for large file storage - pictures and videos and the like. This is most likely not going to be based on object-based storage, as I’ll mainly be transitioning legacy apps into the cloud, but rather block-based storage that is shared among various instances of one layer of the app.

While mounting a volume to more than one droplet sounds great in theory, this does not work well with the design of volumes / block storage. The design is that the droplet will see the storage as a locally attached drive, exactly the same as a physical hard drive placed into the computer. For the same reasons you cannot place a drive in one computer and also attach it to the computer next to it, block storage cannot be treated that way either. The operating systems are simply not built to function that way, the data would be corrupted.

jarland, DO Mod

Round 3

Sweet. However, now after hacking on terraform for a bit, let’s take a step back and analyze how we should be doing this.

You will never see this step being performed by any team larger than you can reliably remember everyone’s name, where the team involves everyone from the guy cutting the check to the hosting provider, to the devs, to the salespeople. So, basically The Dunbar Number.

Terraform

Secrets

So my whole goal of IaC is to be able to push code up to a git repo, and have it hosted there instead of living in documentation or somewhere else. However, that means that I need to provide a way not only for myself to obfuscate secrets, but for other people to integrate them as well.

In the event of a token that is not pushed up as part of the build or as part of the code, environmental variables are fairly safe to use. However, keep in mind that everyone gets their own, and to not use them on a shared login. Right? Right.

One of the cool things about terraform is that we can specify variables as environment variables as long as the string TF_VAR is prepended to the variable name. So in this case, we can rip the switch that sets the var out of apply.sh, and as long as we export TF_VAR_do_token=XXXXXXXX, we should be OK.

Modules

Code is meant to be reused. That’s why Open Source works so well. (see what I did there?) Modules are terraform’s way of creating reusable code.

Variable File

Terraform as we’re running it right now has another argument that we should be able to get rid of that specifies where the variables file is. That is dumb, as that should be automatic.

Terraform will run all of the *.tf files in your target directory, so it’s current best practice to include all of your variables in a file called variables.tf. This gives you a single point of reference later on for everything that you can configure in your infrastructure. However, the syntax is a bit different:

variable "packer_built_machine_image" {
    default = 12345678
    description = "The machine image ID to use in the creation of droplets"
}

Alternatively, we can include a terraform.tfvars file or a *.auto.tfvars file that will be gathered automatically in the same format (key=value) that was specified initially.

Validating

If you’ve gotten this far and already failed a couple of times, I’m sure you know that Terraform does its own validating before running a change, and if something fails then you gotta go fix it before anything will work.

Hidden directory

When I went to move all of my scripts to a terraform subdirectory, I found that everything broke! Specifically my digitalocean plugin. This was due to the fact that, unbeknownst to me, terraform had created a .terraform/ directory in my pwd. So after I moved that and cd‘d to the correct location, I was able to run everything just fine.

Packer

Secrets

My favorite part of terraform was the ability to export the Digital Ocean API token as an environment token. However, unlike terraform, packer does not require that we prepend the environment variable with any strings, but that we define it in the templates using user variables.

{
    "variables": {
        "do_token": "{{env `TF_VAR_do_token`}}"
    },
    "builders": [
    ...
    ],
    "provisioners": [
    ...
    ]
}

Glue scripts

So, I was hoping (against hope) that two tools made by the same company to accomplish different stages of a very opinionated pipeline would have some sort of well-thought out integration with each other. Well, it turns out that they don’t, so like every other sysadmin before me, after my aspirations of sitting back on my ass while well-written tools take over the heavy lifting, giving me time to dedicate to architecture that follows best practices and adheres to established paradigms, I’m back to writing my own scripts to cover for someone else’s scripts.

All in all, it’s like these two tools were written by two different organizations. Their conventions are not the same, nor do they interoperate well.

Project Structure

So I know that I’m much more familiar with Ansible, yet I can’t help but feel there is very little to this file structure. In the end, I ended up with the following, which worked just fine, if not better for the little demo I put together:

[manjaro@manjaro-cinnamon hashidemo]$ ls *
packer:
base.json

terraform:
round1.tf.bak  terraform.tfstate
round2.tf      terraform.tfvars

Which is quite piddling to say the least. Maybe it’s because this is only a demo, but it seems like these tools are not very complex in the ways in which they utilize files in a directory structure. I would be interested in seeing how this fleshes itself out with multiple full stacks of applications being implemented in one file.


References: