Setting up your own Docker swarm

Scaling your service has usually been in the domain of system operators, which installed servers and developers tweaking software when the load got high enough to warrant scaling. Soon enough you’d be looking at tens or even hundreds of instances which took a lot of time to manage. With the release of Docker 1.12, you now have orchestration built in - you can scale to as many instances as your hosts can allow. And setting up a docker swarm is easy-peasy.

Initialize swarm

First off - i’m starting with a clean docker 1.12.0 installation. I’ll be creating a swarm with a few simple steps:

root@swarm1:~$ docker swarm init
Swarm initialized: current node (4i0lko1qdwqp4x1aqwn6o7obh) is now a manager.

To add a worker to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-9ycry5kc20rnw5cbxhyduzg1f \
    10.55.0.248:2377

To add a manager to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
    10.55.0.248:2377

I now have a swarm consisting of exactly 1 manager node. You can attach additional swarm workers, or add new managers for high-availability. If you’re running a swarm cluster with only one manager and several workers, you’re risking an interruption of service if the manager node fails.

“In Docker Swarm, the Swarm manager is responsible for the entire cluster and manages the resources of multiple Docker hosts at scale. If the Swarm manager dies, you must create a new one and deal with an interruption of service.”

As we’re interested in setting up a two-node swarm cluster, it makes sense to make both nodes in the swarm be managers. If one goes down, the other would take it’s place.

root@swarm2:~# docker swarm join \
> --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
> 10.55.0.248:2377
This node joined a swarm as a manager.

To list the nodes in the swarm, run docker node ls.

root@swarm2:~# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh    swarm1    Ready   Active        Leader
9gyk5t22ngndbwtjof80hpg54 *  swarm2    Ready   Active        Reachable

Creating a a service

As you see, when adding a new manager node, it’s automatically added but not promoted to the leader. Let’s start some service that will perform something that can be scaled over both hosts. I will ping google.com, for example. I want to have 5 instances of this service available from the start by using the --replicas flag.

root@swarm2:~# docker service create --replicas 5 --name helloworld alpine ping google.com
31zloagja1dlkt4kaicvgeahn

As the service started without problems, we just get the id of the service which was started. By using docker service ls we can get more information about the running service.

root@swarm2:~# docker service ls
ID            NAME        REPLICAS  IMAGE   COMMAND
31zloagja1dl  helloworld  5/5       alpine  ping google.com

Of course, as we’re talking orchestration, the services in the examples are split between swarm1 and swarm2 nodes. You can still use docker ps -a on indivudual nodes to inspect single containers, but there’s the handy docker service ps [name].

root@swarm1:~# docker service ps helloworld
ID                         NAME          IMAGE   NODE    DESIRED STATE  CURRENT STATE          ERROR
5fxtllouvmd91tmgzoudtt7a4  helloworld.1  alpine  swarm1  Running        Running 7 minutes ago
cqvgixx3djhvtiahba971ivr7  helloworld.2  alpine  swarm2  Running        Running 7 minutes ago
99425nw3r4rf5nd66smjm13f5  helloworld.3  alpine  swarm2  Running        Running 7 minutes ago
1dj3cs7v5ijc93k9yc2p42bhj  helloworld.4  alpine  swarm1  Running        Running 7 minutes ago
0hy3yzwqzlnee10gat6w2lnp2  helloworld.5  alpine  swarm1  Running        Running 7 minutes ago

Testing fault tolerance

As we connected two managers to run our service, let’s just bring one of them down. I’m going to power off swarm1, the current leader, so that it will hopefully do the following:

elect a new leader (swarm2),
start up additional helloworld containers to cover the outage

root@swarm1:~# poweroff
Connection to 10.55.0.248 closed by remote host.
Connection to 10.55.0.248 closed.

First off, let’s list the cluster state.

root@swarm2:~# docker node ls
Error response from daemon: rpc error: code = 2 desc = raft: no elected cluster leader

Uh oh, this was slightly unexpected. After bringing up swarm1, I’m seeing that swarm2 was promoted to a leader. But it’s not exactly the fail-over I imagined. While swarm1 was offline, the ping service only ran as 2/5 and didn’t automatically scale on swarm2 as expected.

root@swarm2:~# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh    swarm1    Ready   Active        Reachable
9gyk5t22ngndbwtjof80hpg54 *  swarm2    Ready   Active        Leader
root@swarm2:~# docker service ps helloworld
ID                         NAME              IMAGE   NODE    DESIRED STATE  CURRENT STATE           ERROR
4x0zgeiucsizvmys5orih2bru  helloworld.1      alpine  swarm1  Running        Running 3 minutes ago
5fxtllouvmd91tmgzoudtt7a4   \_ helloworld.1  alpine  swarm1  Shutdown       Complete 3 minutes ago
cqvgixx3djhvtiahba971ivr7  helloworld.2      alpine  swarm2  Running        Running 21 minutes ago
99425nw3r4rf5nd66smjm13f5  helloworld.3      alpine  swarm2  Running        Running 21 minutes ago
5xzldwvoplqpg1qllg28kh2ef  helloworld.4      alpine  swarm1  Running        Running 3 minutes ago
1dj3cs7v5ijc93k9yc2p42bhj   \_ helloworld.4  alpine  swarm1  Shutdown       Complete 3 minutes ago
avm36h718yihd5nomy2kzhy7m  helloworld.5      alpine  swarm1  Running        Running 3 minutes ago
0hy3yzwqzlnee10gat6w2lnp2   \_ helloworld.5  alpine  swarm1  Shutdown       Complete 3 minutes ago

So, what went wrong? A bit of reading and I’ve come up to the following explanation of how Docker uses the RAFT consensus algorithm for leader selection:

Consensus is fault-tolerant up to the point where quorum is available. If a quorum of nodes is unavailable, it is impossible to process log entries or reason about peer membership. For example, suppose there are only 2 peers: A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum.

Adding an additional manager to enable fault tolerance

So, if you have three managers, one manager can fail, and the remaining two represent a majority, which can decide which one of the remaining managers will be elected as a leader. I quickly add a swarm3 node to the swarm. You can retrieve credentials to add nodes by issuing docker swarm join-token [type] where type can be either worker or manager.

root@swarm2:~# docker swarm join-token manager
To add a manager to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
    10.55.0.238:2377

And we run this command on our swarm3 machine.

root@swarm3:~# docker swarm join \
> --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
> 10.55.0.238:2377
This node joined a swarm as a manager.
root@swarm3:~# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh    swarm1    Ready   Active        Reachable
9gyk5t22ngndbwtjof80hpg54    swarm2    Ready   Active        Leader
b9dyyc08ehtnl62z7e3ll0ih3 *  swarm3    Ready   Active        Reachable

Yay! Our swarm3 is ready. I cleared out the container inventory to start with a clean swarm.

Scaling our service with fault tolerance

I deleted the service with a docker service rm helloworld, and cleaned up the containers with a docker ps -a -q | xargs docker rm. Now I can start the service again from zero.

root@swarm1:~# docker service create --replicas 5 --name helloworld alpine ping google.com
5gmrllue1sgdwl1yd5ubl16md
root@swarm1:~# docker service scale helloworld=10
helloworld scaled to 10
root@swarm1:~# docker service ps helloworld
ID                         NAME           IMAGE   NODE    DESIRED STATE  CURRENT STATE               ERROR
2hb76h8m7oop9pit4jgok2jiu  helloworld.1   alpine  swarm1  Running        Running about a minute ago
5lxefcjclasna9as4oezn34i8  helloworld.2   alpine  swarm3  Running        Running about a minute ago
95cab7hte5xp9e8mfj1tbxms0  helloworld.3   alpine  swarm2  Running        Running about a minute ago
a6pcl2fce4hwnh347gi082sc2  helloworld.4   alpine  swarm2  Running        Running about a minute ago
61rez4j8c5h6g9jo81xhc32wv  helloworld.5   alpine  swarm1  Running        Running about a minute ago
2lobeil8sndn0loewrz8n9i4s  helloworld.6   alpine  swarm1  Running        Running 20 seconds ago
0gieon36unsggqjel48lcax05  helloworld.7   alpine  swarm1  Running        Running 21 seconds ago
91cdmnxarluy2hc2fejvxnzfg  helloworld.8   alpine  swarm3  Running        Running 21 seconds ago
02x6ppzyseak8wsdcqcuq545d  helloworld.9   alpine  swarm3  Running        Running 20 seconds ago
4gmn24kjfv7apioy6t8e5ibl8  helloworld.10  alpine  swarm2  Running        Running 21 seconds ago
root@swarm1:~#

And powering off swarm1, gives us:

root@swarm2:~# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh    swarm1    Ready   Active        Unreachable
9gyk5t22ngndbwtjof80hpg54 *  swarm2    Ready   Active        Leader
b9dyyc08ehtnl62z7e3ll0ih3    swarm3    Ready   Active        Reachable

and additional containers have spawned, just as intended:

root@swarm2:~# docker service ps helloworld
ID                         NAME              IMAGE   NODE    DESIRED STATE  CURRENT STATE               ERROR
bb8nwud2h75xpvkxouwt8rftm  helloworld.1      alpine  swarm2  Running        Running 26 seconds ago
2hb76h8m7oop9pit4jgok2jiu   \_ helloworld.1  alpine  swarm1  Shutdown       Running 2 minutes ago
5lxefcjclasna9as4oezn34i8  helloworld.2      alpine  swarm3  Running        Running 2 minutes ago
95cab7hte5xp9e8mfj1tbxms0  helloworld.3      alpine  swarm2  Running        Running 2 minutes ago
a6pcl2fce4hwnh347gi082sc2  helloworld.4      alpine  swarm2  Running        Running 2 minutes ago
8n1uonzp2roy608kd6v888y3d  helloworld.5      alpine  swarm3  Running        Running 26 seconds ago
61rez4j8c5h6g9jo81xhc32wv   \_ helloworld.5  alpine  swarm1  Shutdown       Running 2 minutes ago
17czblq9saww4e2wok235kww8  helloworld.6      alpine  swarm2  Running        Running 26 seconds ago
2lobeil8sndn0loewrz8n9i4s   \_ helloworld.6  alpine  swarm1  Shutdown       Running about a minute ago
6f3tm5vvhq07kwqt3zu0xr5mi  helloworld.7      alpine  swarm3  Running        Running 26 seconds ago
0gieon36unsggqjel48lcax05   \_ helloworld.7  alpine  swarm1  Shutdown       Running about a minute ago
91cdmnxarluy2hc2fejvxnzfg  helloworld.8      alpine  swarm3  Running        Running about a minute ago
02x6ppzyseak8wsdcqcuq545d  helloworld.9      alpine  swarm3  Running        Running about a minute ago
4gmn24kjfv7apioy6t8e5ibl8  helloworld.10     alpine  swarm2  Running        Running about a minute ago

Move the services away from a specific node (drain)

With this setup we can tolerate failure of one manager node. But say we wanted a bit more “graceful” procedure of removing containers from one node? We can set the availability to drain to empty containers on one node.

root@swarm2:~# docker node update --availability drain swarm3
swarm3
root@swarm2:~# docker service ps helloworld | grep swarm3
5lxefcjclasna9as4oezn34i8   \_ helloworld.2  alpine  swarm3  Shutdown       Shutdown 19 seconds ago
8n1uonzp2roy608kd6v888y3d   \_ helloworld.5  alpine  swarm3  Shutdown       Shutdown 19 seconds ago
6f3tm5vvhq07kwqt3zu0xr5mi   \_ helloworld.7  alpine  swarm3  Shutdown       Shutdown 19 seconds ago
91cdmnxarluy2hc2fejvxnzfg   \_ helloworld.8  alpine  swarm3  Shutdown       Shutdown 19 seconds ago
02x6ppzyseak8wsdcqcuq545d   \_ helloworld.9  alpine  swarm3  Shutdown       Shutdown 19 seconds ago
root@swarm2:~# docker service ps helloworld | grep swarm2 | wc -l
10

All the containers on swarm3 shut down, and started up on the remaining node, swarm2. Let’s scale down the example to only one instance.

root@swarm2:~# docker service scale helloworld=1
helloworld scaled to 1
root@swarm2:~# docker service ps helloworld | grep swarm2 | grep -v Shutdown
17czblq9saww4e2wok235kww8  helloworld.6      alpine  swarm2  Running        Running 7 minutes ago

Cleaning up the containers is still very much in the domain of the sysadmin. I started up swarm1, and scaled our service to 20 instances.

root@swarm2:~# docker service scale helloworld=20
helloworld scaled to 20
root@swarm2:~# docker service ls
ID            NAME        REPLICAS  IMAGE   COMMAND
5gmrllue1sgd  helloworld  2/20      alpine  ping google.com
root@swarm2:~# docker service ls
ID            NAME        REPLICAS  IMAGE   COMMAND
5gmrllue1sgd  helloworld  10/20     alpine  ping google.com
root@swarm2:~# docker service ls
ID            NAME        REPLICAS  IMAGE   COMMAND
5gmrllue1sgd  helloworld  16/20     alpine  ping google.com
root@swarm2:~# docker service ls
ID            NAME        REPLICAS  IMAGE   COMMAND
5gmrllue1sgd  helloworld  20/20     alpine  ping google.com

As you can see here, it does take some time for the instances to start up. Let’s see how it distributed.

root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c
     10 swarm1
     10 swarm2

Enabling and scaling to a new node

As we put swarm3 into drain availability, we don’t have any instances running on it. Let’s fix that very quickly by putting it back into active availability mode.

root@swarm3:~# docker node update --availability active swarm3
swarm3

As the already running service will stay the same, we need to scale our service to populate swarm3.

root@swarm3:~# docker service scale helloworld=30
helloworld scaled to 30
root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c
     10 swarm1
     10 swarm2
     10 swarm3

It takes a bit of getting used to, but docker service is a powerful way to scale out your microservices. It might be slightly more tricky when it comes to data volumes (mounts), but that’s the subject of another post.

Closing words

Keep in mind, if you’re provisioning swarm managers, you need a majority to resolve failures gracefully. That means you should have an odd number of managers, where N > 2. A cluster of N managers is able to tolerate failure of ((N-1)/2) nodes, for example 3 managers = 1 failed node, 5 managers = 2 failed nodes, 7 managers = 3 failed nodes and so on.

A worker in comparison doesn’t replicate the manager state, and you can’t start or query services from a worker. You should only do that from any of the manager nodes - the commands will be ran on the leader node.

About the author

I’m the author of API Foundations in Go. Consider buying it if you like the article. If you’d like to be notified of new posts, sign up for my mailing list - I’ll notify you when I post new articles like this one. It may be minutes before I post it on Twitter.

You should also give me a follow on Twitter and let’s talk. I’m also available for consulting / development jobs. Fixing bottlenecks and scaling services to cope with high traffic is my thing.

While I have you here...

It would be great if you buy one of my books:

Buying a copy supports me writing more about similar topics.

For business inqueries, send me an email. I'm available for consultany/freelance work. See my page for more detail..