Waiting on Goroutines

Go is a program language which has basic concurrency syntax built in. Maybe calling it basic isn’t exactly right - simplicity is the more correct description. After all to run a function independent of the main execution, all you have to do is prefix the invocation with the go keyword - that function call will now live on it’s own goroutine.

For people not familiar with threading or concurrency in general, it may take a while to figure out what the behaviour here is. After all, when you run a function in a goroutine, it has an effect on your program - the way you access data is a big one, but perhaps a more trivial one and often misunderstood is that a Go program doesn’t wait for your goroutines to finish. You have to ensure a way where your invoking function will wait for some sort of a signal to continue execution.

Christian Rebischke wrote the following tweet:

TIL today: a Go program doesn’t wait for your go routines to finish. Wait groups are nice.

It’s accurate but what hit me was “hey, he’s using wait groups, that’s not really the most common way to do this, is it? It’s also not like there are only two ways to do it, or that there’s always a correct way. So let’s see how we can wait on a goroutine or a group of goroutines.

The channel

The most primitive way to wait on a goroutine is to use a channel. The intent of a channel is to be a communication primitive. To paraphrase Rob Pike (in lieu of tracking down a citation):

Don’t communicate by sharing memory, share memory by communicating

I think this is actually written on a godoc page for the sync/atomic package. Let’s assume that it checks out, it’s widely attributed to him.

We can use channels to wait for values. So the most primitive way of waiting for a goroutine might be as follows:

finished := make(chan bool) // the messaging channel

greet := func() {
	time.Sleep(time.Second)
	fmt.Println("Hello 👋")
	finished <- true // we send to the channel when done
}

go greet()

<-finished // we are waiting for greet to finish

The context

The context value in Go was introduced to add cancelation to functions. While timeouts for various clients (database, http,…) are the usual use case, it’s also common to provide external signals that cancel a context.

For this reason, we can use context.WithCancel as we would a channel:

ctx, cancel := context.WithCancel(context.Background())

greet := func() {
	defer cancel() // we cancel the ctx when done

	time.Sleep(time.Second)
	fmt.Println("Hello 👋")
}

go greet()

<-ctx.Done() // wait here :)

The whole thing seems like just a wrapper around a channel. It does a bit more than that, as it also propagates the cancelation from the parent to the child. For example, you could combine timeouts and manual cancelation to ensure something is cancelled as it finishes naturally, or within a pre-set timeout.

ctx := context.Background()
ctx = context.WithTimeout(ctx, time.Second)
ctx, cancel := context.WithCancel(ctx)

So, when the timeout is reached, the final context will also be cancelled. Just a bit of trivia there, in case you ever need it.

The lock

An uncommon way to produce the same result would be to use a Mutex lock. We can “get” the lock from the parent function, and “release” the lock in the goroutine when it’s done.

var lock sync.Mutex

greet := func() {
	defer lock.Unlock() // we unlock when done

	time.Sleep(time.Second)
	fmt.Println("Hello 👋")
}

// get the lock before invoking greet
lock.Lock()

go greet()

// get lock again - this will only happen
// when greet() will finish
lock.Lock() // try to get lock again

We can already see from all the verbosity - locks don’t make sense here. It’s use case is firmly cemented in protecting access to shared memory.

The group

With channels and groups it gets progressively annoying to handle multiple goroutines and wait for them. But, as the tweet already mentioned it - let’s look at wait groups. A wait group can be considered an integer value - you Add(X), and then you substract X by calling Done() X times. When reaching 0, this will trigger a return from Wait().

var wg sync.WaitGroup

greet := func() {
	defer wg.Done() // we unlock when done

	time.Sleep(time.Second)
	fmt.Println("Hello 👋")
}

// how many functions will we call?
wg.Add(1)

go greet()

wg.Wait() // wait here :)

Breaking down wait groups

Wait groups aren’t super complex to implement on your own. While the actual implementation is a bit more complex, we can use atomic operations and a channel to implement wait groups on our own as an exercise.

What are atomic operations?

An atomic operation in our case would be an increment of a value. It’s atomic because two increments can’t happen at the same time - regardless of how many goroutines you have running. This is implemented in low-level code, often in specific CPU instructions themselves.

This means we can avoid having locking instructions when we implement the Add() and Done() functionality of wait groups. Since the increment also returns the value, we can check it for zero, and write to a channel to signal the parent function that all the wait group tasks have completed.

Increment State atomically for Add(),
Decrement State atomically for Done(),
Trigger channel write from Done() when we reach zero,
Wait by reading from channel in Wait()

type WG struct {
	wait  chan struct{}
	state int64
}

func NewWG() *WG {
	return &WG{
		wait: make(chan struct{}),
	}
}

func (wg *WG) Add(x int64) {
	atomic.AddInt64(&wg.state, x)
}

func (wg *WG) Done() {
	after := atomic.AddInt64(&wg.state, -1)
	if after == 0 {
		wg.wait <- struct{}{}
	}
}

func (wg *WG) Wait() {
	<-wg.wait
}

Error groups

In addition to wait groups specifically, there’s one more way which is quite common in the wild - running multiple goroutines, which may produce errors. There’s a package called golang.org/x/sync/errgroup, which handles those.

var group errgroup.Group

greet := func() error {
	time.Sleep(time.Second)
	fmt.Println("Hello 👋")
	return nil
}

group.Go(greet)

// get the first error from the group
if err := group.Wait(); err != nil {
	log.Fatal("We have an error:", err)
}

But we can also use the groups cancelation functionality. As soon as any member of the group will be canceled, the context will be canceled too. Oh but, we won’t get the actual error from the context. I’m not sure this is a deal breaker, but we can just take a look:

errFinished := errors.New("done")

// the produced context is meant to be used in the
// passed functions to group.Go for cancelation
group, ctx := errgroup.WithContext(context.Background())

greet := func() error {
	time.Sleep(time.Second)
	fmt.Println("Hello 👋")
	return errFinished
}

group.Go(greet)

// if we don't return an error, this would cause
// a deadlock - no running goroutines
<-ctx.Done()

// get the first error from the group
if err := group.Wait(); err != nil {
	log.Fatalf("We have an error, err=%s, ctx.err=%s", err, ctx.Err())
}

The last example isn’t great. You wouldn’t return errors as the expected result, as the intent of returning errors is… returning errors.

But you could pick up that group context and then use it in your functions, so you could cancel either long running goroutines, or break out of them all as soon as an error is encountered in any of them.

You remember, that the cancelation from the parent context is propagated? You can easily feed error groups a context with a timeout, and it’s going to honor it.

Now, the function signature for errgroup is func() error. This makes it a bit inconvenient to propagate the context further to the invoked functions. As the consumer of the errgroup package, you’re left to either wrap the functions yourself, or pass the ctx via function scope.

Imagine you could pass the context implicitly?

func (g *Group) GoContext(fn func(context.Context) error) {
        g.Go(func() error {
                return fn(g.Context())
        }
}

That would work well enough for a lot of use cases where cancelation is required. If it’s something you’d like to see added to errgroup, I opened a request for this on the Go issue tracker. Let me know if you have examples where this addition would improve the quality of your code, by commenting on the #39312 issue.

While I have you here...

It would be great if you buy one of my books:

Buying a copy supports me writing more about similar topics.

For business inqueries, send me an email. I'm available for consultany/freelance work. See my page for more detail..