Setting up your own Docker swarm
Scaling your service has usually been in the domain of system operators, which installed servers and developers tweaking software when the load got high enough to warrant scaling. Soon enough you’d be looking at tens or even hundreds of instances which took a lot of time to manage. With the release of Docker 1.12, you now have orchestration built in - you can scale to as many instances as your hosts can allow. And setting up a docker swarm is easy-peasy.
Initialize swarm
First off - i’m starting with a clean docker 1.12.0
installation. I’ll be
creating a swarm with a few simple steps:
root@swarm1:~$ docker swarm init
Swarm initialized: current node (4i0lko1qdwqp4x1aqwn6o7obh) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-9ycry5kc20rnw5cbxhyduzg1f \
10.55.0.248:2377
To add a manager to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
10.55.0.248:2377
I now have a swarm consisting of exactly 1 manager node. You can attach additional swarm workers, or add new managers for high-availability. If you’re running a swarm cluster with only one manager and several workers, you’re risking an interruption of service if the manager node fails.
“In Docker Swarm, the Swarm manager is responsible for the entire cluster and manages the resources of multiple Docker hosts at scale. If the Swarm manager dies, you must create a new one and deal with an interruption of service.”
As we’re interested in setting up a two-node swarm cluster, it makes sense to make both nodes in the swarm be managers. If one goes down, the other would take it’s place.
root@swarm2:~# docker swarm join \
> --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
> 10.55.0.248:2377
This node joined a swarm as a manager.
To list the nodes in the swarm, run docker node ls
.
root@swarm2:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Leader
9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Reachable
Creating a a service
As you see, when adding a new manager node, it’s automatically added but not
promoted to the leader. Let’s start some service that will perform something
that can be scaled over both hosts. I will ping google.com
, for example. I
want to have 5 instances of this service available from the start by using the
--replicas
flag.
root@swarm2:~# docker service create --replicas 5 --name helloworld alpine ping google.com
31zloagja1dlkt4kaicvgeahn
As the service started without problems, we just get the id of the service which was started.
By using docker service ls
we can get more information about the running service.
root@swarm2:~# docker service ls
ID NAME REPLICAS IMAGE COMMAND
31zloagja1dl helloworld 5/5 alpine ping google.com
Of course, as we’re talking orchestration, the services in the examples are split between
swarm1
and swarm2
nodes. You can still use docker ps -a
on indivudual nodes to inspect
single containers, but there’s the handy docker service ps [name]
.
root@swarm1:~# docker service ps helloworld
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
5fxtllouvmd91tmgzoudtt7a4 helloworld.1 alpine swarm1 Running Running 7 minutes ago
cqvgixx3djhvtiahba971ivr7 helloworld.2 alpine swarm2 Running Running 7 minutes ago
99425nw3r4rf5nd66smjm13f5 helloworld.3 alpine swarm2 Running Running 7 minutes ago
1dj3cs7v5ijc93k9yc2p42bhj helloworld.4 alpine swarm1 Running Running 7 minutes ago
0hy3yzwqzlnee10gat6w2lnp2 helloworld.5 alpine swarm1 Running Running 7 minutes ago
Testing fault tolerance
As we connected two managers to run our service, let’s just bring one of them down. I’m going
to power off swarm1
, the current leader, so that it will hopefully do the following:
- elect a new leader (swarm2),
- start up additional helloworld containers to cover the outage
root@swarm1:~# poweroff
Connection to 10.55.0.248 closed by remote host.
Connection to 10.55.0.248 closed.
First off, let’s list the cluster state.
root@swarm2:~# docker node ls
Error response from daemon: rpc error: code = 2 desc = raft: no elected cluster leader
Uh oh, this was slightly unexpected. After bringing up swarm1
, I’m seeing that swarm2
was
promoted to a leader. But it’s not exactly the fail-over I imagined. While swarm1
was offline,
the ping service only ran as 2/5 and didn’t automatically scale on swarm2
as expected.
root@swarm2:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Reachable
9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Leader
root@swarm2:~# docker service ps helloworld
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
4x0zgeiucsizvmys5orih2bru helloworld.1 alpine swarm1 Running Running 3 minutes ago
5fxtllouvmd91tmgzoudtt7a4 \_ helloworld.1 alpine swarm1 Shutdown Complete 3 minutes ago
cqvgixx3djhvtiahba971ivr7 helloworld.2 alpine swarm2 Running Running 21 minutes ago
99425nw3r4rf5nd66smjm13f5 helloworld.3 alpine swarm2 Running Running 21 minutes ago
5xzldwvoplqpg1qllg28kh2ef helloworld.4 alpine swarm1 Running Running 3 minutes ago
1dj3cs7v5ijc93k9yc2p42bhj \_ helloworld.4 alpine swarm1 Shutdown Complete 3 minutes ago
avm36h718yihd5nomy2kzhy7m helloworld.5 alpine swarm1 Running Running 3 minutes ago
0hy3yzwqzlnee10gat6w2lnp2 \_ helloworld.5 alpine swarm1 Shutdown Complete 3 minutes ago
So, what went wrong? A bit of reading and I’ve come up to the following explanation of how Docker uses the RAFT consensus algorithm for leader selection:
Consensus is fault-tolerant up to the point where quorum is available. If a quorum of nodes is unavailable, it is impossible to process log entries or reason about peer membership. For example, suppose there are only 2 peers: A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum.
Adding an additional manager to enable fault tolerance
So, if you have three managers, one manager can fail, and the remaining two represent a majority,
which can decide which one of the remaining managers will be elected as a leader. I quickly add
a swarm3
node to the swarm. You can retrieve credentials to add nodes by issuing docker swarm join-token [type]
where type can be either worker or manager.
root@swarm2:~# docker swarm join-token manager
To add a manager to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
10.55.0.238:2377
And we run this command on our swarm3
machine.
root@swarm3:~# docker swarm join \
> --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \
> 10.55.0.238:2377
This node joined a swarm as a manager.
root@swarm3:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Reachable
9gyk5t22ngndbwtjof80hpg54 swarm2 Ready Active Leader
b9dyyc08ehtnl62z7e3ll0ih3 * swarm3 Ready Active Reachable
Yay! Our swarm3
is ready. I cleared out the container inventory to start with a clean swarm.
Scaling our service with fault tolerance
I deleted the service with a docker service rm helloworld
, and cleaned up the containers with a
docker ps -a -q | xargs docker rm
. Now I can start the service again from zero.
root@swarm1:~# docker service create --replicas 5 --name helloworld alpine ping google.com
5gmrllue1sgdwl1yd5ubl16md
root@swarm1:~# docker service scale helloworld=10
helloworld scaled to 10
root@swarm1:~# docker service ps helloworld
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
2hb76h8m7oop9pit4jgok2jiu helloworld.1 alpine swarm1 Running Running about a minute ago
5lxefcjclasna9as4oezn34i8 helloworld.2 alpine swarm3 Running Running about a minute ago
95cab7hte5xp9e8mfj1tbxms0 helloworld.3 alpine swarm2 Running Running about a minute ago
a6pcl2fce4hwnh347gi082sc2 helloworld.4 alpine swarm2 Running Running about a minute ago
61rez4j8c5h6g9jo81xhc32wv helloworld.5 alpine swarm1 Running Running about a minute ago
2lobeil8sndn0loewrz8n9i4s helloworld.6 alpine swarm1 Running Running 20 seconds ago
0gieon36unsggqjel48lcax05 helloworld.7 alpine swarm1 Running Running 21 seconds ago
91cdmnxarluy2hc2fejvxnzfg helloworld.8 alpine swarm3 Running Running 21 seconds ago
02x6ppzyseak8wsdcqcuq545d helloworld.9 alpine swarm3 Running Running 20 seconds ago
4gmn24kjfv7apioy6t8e5ibl8 helloworld.10 alpine swarm2 Running Running 21 seconds ago
root@swarm1:~#
And powering off swarm1
, gives us:
root@swarm2:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Unreachable
9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Leader
b9dyyc08ehtnl62z7e3ll0ih3 swarm3 Ready Active Reachable
and additional containers have spawned, just as intended:
root@swarm2:~# docker service ps helloworld
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
bb8nwud2h75xpvkxouwt8rftm helloworld.1 alpine swarm2 Running Running 26 seconds ago
2hb76h8m7oop9pit4jgok2jiu \_ helloworld.1 alpine swarm1 Shutdown Running 2 minutes ago
5lxefcjclasna9as4oezn34i8 helloworld.2 alpine swarm3 Running Running 2 minutes ago
95cab7hte5xp9e8mfj1tbxms0 helloworld.3 alpine swarm2 Running Running 2 minutes ago
a6pcl2fce4hwnh347gi082sc2 helloworld.4 alpine swarm2 Running Running 2 minutes ago
8n1uonzp2roy608kd6v888y3d helloworld.5 alpine swarm3 Running Running 26 seconds ago
61rez4j8c5h6g9jo81xhc32wv \_ helloworld.5 alpine swarm1 Shutdown Running 2 minutes ago
17czblq9saww4e2wok235kww8 helloworld.6 alpine swarm2 Running Running 26 seconds ago
2lobeil8sndn0loewrz8n9i4s \_ helloworld.6 alpine swarm1 Shutdown Running about a minute ago
6f3tm5vvhq07kwqt3zu0xr5mi helloworld.7 alpine swarm3 Running Running 26 seconds ago
0gieon36unsggqjel48lcax05 \_ helloworld.7 alpine swarm1 Shutdown Running about a minute ago
91cdmnxarluy2hc2fejvxnzfg helloworld.8 alpine swarm3 Running Running about a minute ago
02x6ppzyseak8wsdcqcuq545d helloworld.9 alpine swarm3 Running Running about a minute ago
4gmn24kjfv7apioy6t8e5ibl8 helloworld.10 alpine swarm2 Running Running about a minute ago
Move the services away from a specific node (drain)
With this setup we can tolerate failure of one manager node. But say we wanted a bit more “graceful” procedure
of removing containers from one node? We can set the availability to drain
to empty containers on one node.
root@swarm2:~# docker node update --availability drain swarm3
swarm3
root@swarm2:~# docker service ps helloworld | grep swarm3
5lxefcjclasna9as4oezn34i8 \_ helloworld.2 alpine swarm3 Shutdown Shutdown 19 seconds ago
8n1uonzp2roy608kd6v888y3d \_ helloworld.5 alpine swarm3 Shutdown Shutdown 19 seconds ago
6f3tm5vvhq07kwqt3zu0xr5mi \_ helloworld.7 alpine swarm3 Shutdown Shutdown 19 seconds ago
91cdmnxarluy2hc2fejvxnzfg \_ helloworld.8 alpine swarm3 Shutdown Shutdown 19 seconds ago
02x6ppzyseak8wsdcqcuq545d \_ helloworld.9 alpine swarm3 Shutdown Shutdown 19 seconds ago
root@swarm2:~# docker service ps helloworld | grep swarm2 | wc -l
10
All the containers on swarm3
shut down, and started up on the remaining node, swarm2
. Let’s
scale down the example to only one instance.
root@swarm2:~# docker service scale helloworld=1
helloworld scaled to 1
root@swarm2:~# docker service ps helloworld | grep swarm2 | grep -v Shutdown
17czblq9saww4e2wok235kww8 helloworld.6 alpine swarm2 Running Running 7 minutes ago
Cleaning up the containers is still very much in the domain of the sysadmin. I started up swarm1
,
and scaled our service to 20 instances.
root@swarm2:~# docker service scale helloworld=20
helloworld scaled to 20
root@swarm2:~# docker service ls
ID NAME REPLICAS IMAGE COMMAND
5gmrllue1sgd helloworld 2/20 alpine ping google.com
root@swarm2:~# docker service ls
ID NAME REPLICAS IMAGE COMMAND
5gmrllue1sgd helloworld 10/20 alpine ping google.com
root@swarm2:~# docker service ls
ID NAME REPLICAS IMAGE COMMAND
5gmrllue1sgd helloworld 16/20 alpine ping google.com
root@swarm2:~# docker service ls
ID NAME REPLICAS IMAGE COMMAND
5gmrllue1sgd helloworld 20/20 alpine ping google.com
As you can see here, it does take some time for the instances to start up. Let’s see how it distributed.
root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c
10 swarm1
10 swarm2
Enabling and scaling to a new node
As we put swarm3
into drain
availability, we don’t have any instances running on it. Let’s fix that very
quickly by putting it back into active
availability mode.
root@swarm3:~# docker node update --availability active swarm3
swarm3
As the already running service will stay the same, we need to scale our service to populate swarm3
.
root@swarm3:~# docker service scale helloworld=30
helloworld scaled to 30
root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c
10 swarm1
10 swarm2
10 swarm3
It takes a bit of getting used to, but docker service is a powerful way to scale out your microservices. It might be slightly more tricky when it comes to data volumes (mounts), but that’s the subject of another post.
Closing words
Keep in mind, if you’re provisioning swarm managers, you need a majority to resolve failures gracefully. That means you should have an odd number of managers, where N > 2. A cluster of N managers is able to tolerate failure of ((N-1)/2) nodes, for example 3 managers = 1 failed node, 5 managers = 2 failed nodes, 7 managers = 3 failed nodes and so on.
A worker in comparison doesn’t replicate the manager state, and you can’t start or query services from a worker. You should only do that from any of the manager nodes - the commands will be ran on the leader node.
About the author
I’m the author of API Foundations in Go. Consider buying it if you like the article. If you’d like to be notified of new posts, sign up for my mailing list - I’ll notify you when I post new articles like this one. It may be minutes before I post it on Twitter.
You should also give me a follow on Twitter and let’s talk. I’m also available for consulting / development jobs. Fixing bottlenecks and scaling services to cope with high traffic is my thing.
While I have you here...
It would be great if you buy one of my books:
- Go with Databases
- Advent of Go Microservices
- API Foundations in Go
- 12 Factor Apps with Docker and Go
Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.
Want to stay up to date with new posts?
Stay up to date with new posts about Docker, Go, JavaScript and my thoughts on Technology. I post about twice per month, and notify you when I post. You can also follow me on my Twitter if you prefer.