Auto-scaling a Docker Swarm

Setting up a Docker Swarm (“docker swarm mode” if you want to be more accurate), is pretty much a trivial process. You need 3 nodes, create a swarm on one of them and join the other two nodes to the swarm. Simple. What if you wanted a 100 swarm nodes? How fast can you provision new nodes to scale up your Docker Swarm?

Budget friendly

Maybe you’re well funded, have lots of income and can afford running auto-scaling groups on AWS into tens, hundreds or even thousands of instances. If you are, maybe you should look into something more reliable and battle tested, like Amazon Auto Scaling groups or their Google Compute counterpart.

Maybe you’re bootstrapping on a very limited budget, but still need to be ready to scale up when you get a traffic spike. Adding a few servers by hand still takes quite some time to provision, even if you’re starting off with a basic template.

"Can you set up new Docker Swarm nodes quickly, while still being cost friendly?" via @TitPetric
Click to Tweet

You can consider provisioning new nodes on DigitalOcean. If you haven’t signed up yet, you can use this affiliate link to give you $10 in credit. This gives you a node for a month, or when you’re dealing with elastic loads it can give you about 30 instances for an hour. Sounds good? It is.

Want to learn more about Docker and DevOps?
Check out my book 12 Factor Applications with Docker and Go.

DigitalOcean CLI

DigitalOcean provides a CLI interface, which allows you to perform operations like provisioning droplets, listing running instances and ultimately destroying them. Creating a Docker Swarm is a basic iteration of the following steps:

Provision a new node with Docker,
Create or join an existing Docker Swarm

Did you think that scaling a Docker Swarm will be so simple? It really is.

When it comes to Amazon, they provide something they call “AMI” or “Amazon Machine Images”. This is similar to what Docker images are, and DigitalOcean also provides images of common deployments. To save time configuring our own images, we can use an existing image:

# doctl compute image list | grep -i docker
23219707    Docker 17.03.0-ce on 14.04     docker
24445730    Docker 17.04.0-ce on 16.04     docker-16-04

As we want the latest and greatest, we can use the docker-16-04 image currently available. We will use this image to spin up new droplets (DigitalOcean version of a VM instance).

Provisioning DigitalOcean instances

Provisioning a new DigitalOcean instance is very simple:

doctl compute droplet create $NAME -v \
	--image docker-16-04 \
	--size 2gb \
	--region ams3 \
	--ssh-keys ${SSH_KEY}

With the doctl compute droplet create command we can create any size of Droplet supported by DigitalOcean. We can set size, enable private networking, spin droplets up in different regions and provide a script to be executed by cloud-init when the instance will spin up.

We can provision a Docker Swarm with cloud-init and automate this process.

When it comes to automating the provisioning of Docker Swarm, we have three operations that are important for us. These operations are:

Creating a new swarm,
Adding a new manager,
Adding a new worker

I created a set of scripts available on the 12FA book GitHub repository. Consider checking it out to follow along with this article. I’ll be explaining parts of what the scripts do as we go along.

"How fast can you provision new nodes to scale up your Docker Swarm?" via @TitPetric
Click to Tweet

Managers

Manager nodes handle the main orchestration part of the swarm cluster. You can have as many manager nodes, but you need a majority to function correctly. So, out of 3 nodes, one can fail without impacting the swarm. Out of 5 nodes, two can fail without impacting the swarm. You should keep an odd number of manager nodes to keep your swarm highly available.

To add a manager node, run:

./add-manager.sh

There are a few things being done here to set up a manager node. First, a tag is created with doctl compute tag create swarm. This will allow us to tag our manager nodes with the tag swarm. The script uses another command, to get the IP of a live manager node.

IPADDR=$(doctl compute droplet list --tag-name swarm --format PublicIPv4 --no-header | head -n1)

Depending on the output of this command - we can determine if any manager droplets are already running. If none are running, we create a new swarm. If at least one manager is running, we join that swarm. Creating a swarm is done by using the --user-data-file capability when creating a droplet:

doctl compute droplet create $DROPLET -v --wait \
	--image docker-16-04 \
	--size 2gb \
	--tag-name $TAG \
	--enable-private-networking \
	--region ams3 \
	--ssh-keys ${SSH_KEY} \
	--user-data-file ./cloud-init/create.sh

In comparison to the previous example, there are a few subtle differences of how we’re creating a new droplet in the swarm:

We’re using the --wait option to wait until the droplet is fully created. As we’re creating the first manager node, it will take some time before it comes online. During this time, we can’t get a token to join the swarm, so we can’t run any manager or worker nodes. This takes about a minute.

We tag the image with --tag-name, so we know by looking at tags exactly how many running managers or workers we will have at any given time.

We enable the private networking with --enable-private-networking. This means that the instances within the same region (ams3) will be reachable over a private LAN. This LAN is available on the interface eth1. We will however be using the public IP of the droplets for communication. This allows us to run nodes in different regions, and still being the part of the same Docker Swarm.

The --user-data-file is the most important part of setting up a Docker Swarm. To create a swarm a number of commands should run immediately after the machine comes online. The script ./cloud-init/create.sh runs the following commands:

#!/bin/bash
ufw allow 2377/tcp
export PUBLIC_IPV4=$(curl -s http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/address)
docker swarm init --advertise-addr "${PUBLIC_IPV4}:2377"

This script is passed to cloud-init. It will run on the server after it is done spinning up and setting up any software included in the image.

What this script does is pretty straighforward: The script will open up the firewall with ufw to allow traffic on port 2377. This is the port that’s used for communication between swarm nodes. The script uses curl and the DigitalOcean Metadata service to get the public IP of the instance. Using this information, a Docker Swarm is created.

"Creating an elastic Docker Swarm with @DigitalOcean, step by step" via @TitPetric
Click to Tweet

The process take about 2 minutes until the first manager node is created. After that you can add additional managers or worker nodes at a higher speed / concurrently. The join token is being passed to the instance over a cloud-init script and doesn’t change between runs, so you can literally spin up tens of instances in the span of a few minutes.

Adding additional manager nodes is pretty much the same, apart from one small detail. The script that’s passed with --user-data-file changes so that the new node will join the existing cluster, and not create a new one:

#!/bin/bash
ufw allow 2377/tcp
export PUBLIC_IPV4=$(curl -s http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/address)
docker swarm join --advertise-addr "${PUBLIC_IPV4}:2377" \
	--token "[token-code]" \
	"[manager-ip]:2377"

The difference is subtle - we are issuing a docker swarm join command, with two additional pieces of information. One piece of information is the [token-code]. This token is generated by connecting to a live master node and requesting a join-token:

TOKEN=$(ssh $IPADDR docker swarm join-token -q manager)

If for some reason this token can’t be retrieved (for example, because your swarm is not yet created and you still need to wait a bit), then adding a new manager will fail. You should retry this, usually the first node needs only a few minutes.

The second part of the information is the [manager-ip]. This is the $IPADDR of the manager that produced a join token for our swarm. Docker will try to join the swarm existing on the same IP address.

Workers

Worker nodes don’t get a management interface to the swarm, but just an execution. This means that you can’t perform any swarm actions from the workers, but the managers schedule containers for execution on them. They work, not manage.

To add a worker node, run:

./add-worker.sh

If no workers exist, you’ll end up with a descriptive error:

To add a worker node, a manager node must exist first

You can have as many workers as you like, you can even have none if you don’t. The join token is being passed to the instance over a cloud-init script, just like it is with adding new managers. The only difference is that the type of token is specifically created for workers:

TOKEN=$(ssh $IPADDR docker swarm join-token -q worker)

While technically, you could scale your swarm by having a large number of masters, this will add strain to the system. Manager nodes should be used to provide redundancy, while worker nodes should be added to provide capacity.

"Automate adding new workers to Docker Swarm and scale up to handle your load" via @TitPetric
Click to Tweet

Removing managers

Removing a manager first demotes the manager to a worker, and then the worker node is removed from the swarm, after which, the DigitalOcean droplet is purged.

So:

Drain docker containers from manager,
Demote manager to worker,
Purge droplet,
Remove node from swarm

I’m trying to make this process as expected to the system as possible. By setting the availability to drain, we’re already telling Docker Swarm that this node shouldn’t be used for containers anymore. New containers will start to spawn on other nodes.

We will demote the manager to worker, so we may safely destroy the node. This is a managed way how to reduce the number of managers without causing an outage. When demoting managers, the process should never result in the failure of the swarm.

When we purge the droplet, we basically shut down the instance and any data it may have had. It’s effectively reducing the capacity of your instance pool. Any containers will continue to be scheduled on the remaining instances.

Finally, we should remove the worker node from the swarm. The worker node can’t remove itself and this is why we already purged the instance. We connect to one of the managers and use docker node rm [instance] to purge it from the swarm.

All this can be done by running:

./remove-manager.sh

For example:

# ./list-swarm.sh
ID                           HOSTNAME          STATUS  AVAILABILITY  MANAGER STATUS
uoks2o8ce27dl0w1iz9upd3xz    swarm-1493615119  Ready   Active        Leader
x1njngdyj53tbhxjtlk8ie4fh    swarm-1493615341  Ready   Active        Reachable
ybp04oaayxhv82agvfrfrraja *  swarm-1493615112  Ready   Active        Reachable
# ./remove-manager.sh
Leaving swarm: swarm-1493615112
Manager swarm-1493615112 demoted in the swarm.
Purging droplet: swarm-1493615112
swarm-1493615112
# ./list-swarm.sh
ID                           HOSTNAME          STATUS  AVAILABILITY  MANAGER STATUS
uoks2o8ce27dl0w1iz9upd3xz *  swarm-1493615119  Ready   Active        Leader
x1njngdyj53tbhxjtlk8ie4fh    swarm-1493615341  Ready   Active        Reachable

As the node is removed gracefully, the constraints about fault tolerance do not apply any more. The swarm size goes down, and with it, availability guarantees and constraints.

Removing workers

Removing a worker is just as simple as removing a manager:

./remove-worker.sh

You can remove any number of workers without causing an outage, as long as your managers have enough capacity to run the containers you were running on the workers until the point of removing a worker.

"Having the ability to scale-back and keep your costs low with Docker Swarm is priceless" via @TitPetric
Click to Tweet

Destroying everything

Just run ./destroy.sh and it will wipe all your swarm manager and worker instances from existence.

Other

There are a few utility scripts to facilitate the functionality presented above:

./list-managers.sh - lists running manager nodes,
./list-workers.sh - list running worker nodes,
./list-swarm.sh - prints output of docker node ls, showing nodes in swarm,
./list.sh - show all DigitalOcean droplets running,
./ssh-key.sh - provide ssh key to DigitalOcean instance for logging in,
./ssh.sh - run a command on all manager nodes,
./ssh-one.ssh - run a command on a single manager node

For example, if you want to run uname on all manager nodes, you can do:

# ./ssh uname -r
> swarm-1493616101
4.4.0-75-generic
> swarm-1493616219
4.4.0-75-generic
> swarm-1493616226
4.4.0-75-generic

Testing the swarm

After spinning up the managers, you can create a service on them:

# ./ssh-one.sh docker service create --replicas 10 --name sonyflake titpetric/sonyflake
> swarm-1493616101
ld66ax987n5nyypm1itegy6io

# ./ssh-one.sh docker service ps sonyflake --format '{{.Node}}' \| sort \| uniq -c
> swarm-1493616101
      4 swarm-1493616101
      3 swarm-1493616219
      3 swarm-1493616226

As you can see, the service created 10 containers, distributed between the available nodes.

And removing a manager:

# ./remove-manager.sh
Leaving swarm: swarm-1493616101
swarm-1493616101
Manager swarm-1493616101 demoted in the swarm.
Purging droplet: swarm-1493616101
swarm-1493616101

# ./ssh-one.sh docker service ps sonyflake --format '{{.Node}}' -f 'desired-state=running' \| sort \| uniq -c
> swarm-1493616219
      5 swarm-1493616219
      5 swarm-1493616226

I added the option -f to filter container based on their desired state, listing only running containers. Here we see that the containers have re-scheduled on the available managers without issue. As long as there is one manager left, we can remove managers without failure of the swarm.

# ./remove-manager.sh
Leaving swarm: swarm-1493616219
swarm-1493616219
Manager swarm-1493616219 demoted in the swarm.
Purging droplet: swarm-1493616219
swarm-1493616219

# ./ssh-one.sh docker service ps sonyflake --format '{{.Node}}' -f 'desired-state=running' \| sort \| uniq -c
> swarm-1493616226
     10 swarm-1493616226
# ./list-swarm.sh
ID                           HOSTNAME          STATUS  AVAILABILITY  MANAGER STATUS
naoyvnd0nu1vxegi0mplxy1sf *  swarm-1493616226  Ready   Active        Leader

Conclusion

With the set of these scripts it’s possible to provide swarm elasticity. If your load is elastic, you can use the provided scripts to add and remove worker and manager nodes as needed to faster process your workloads. Depending on any monitoring rules you set up, you can create your own system to grow and scale your Docker Swarm based on a number of inputs that you monitor. This may be anything from CPU usage, to the number of items in your worker queue.

Just as an interesting fact: to test out the above (with a lot of trial and error), I used about $1.75. It’s the average price for a cup of coffee in Slovenia.

While I have you here...

It would be great if you buy one of my books:

I promise you'll learn a lot more if you buy one. Buying a copy supports me writing more about similar topics. Say thank you and buy my books.

Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.