Real world optimisation

Don’t optimize prematurely - but have a plan how you’re going to do it. With the services I write, a lot of the code may be clear, simple and readable over actually optimizing for speed. Sometimes, the difference between that code, and the optimized code, is a short refactor.

One of the services I write, is a comments service for the Slovenian national TV & Radio station, RTV Slovenia. On a typical day, the Elastic APM transactions end up being something like this:

What this tells us, that we optimized the hot path down to a 1ms average, with a 99 percentile time of 4ms. A 99th percentile means that 99% of requests took less than 4ms, while 1% of requests took longer.

We can also see that the list call has the biggest impact. When considering all requests, the impact is the commulative time used for all requests against an endpoint. It’s calculated by multiplying the average duration with transactions per minute.

To optimize something, you must first understand how something works. The comments API has two listing endpoints, list and thread. The list endpoints returns all comments and their replies in a selected order, while the thread endpoint returns only the replies to a single comment, or a list of top-level comments without the replies.

For each reply, the list endpoint retrieves the parent comment for display. This step of the API call had a pretty un-optimal O(N) situation where the service retrieved the parent comment for each reply in the list, individually.

for key, comment := range result.Comments {
    if comment.SelfID > 0 {
        reply, err := svc.comments.Get(ctx, comment.SelfID, currentUser)
        if err == nil && reply != nil {
            reply.Comments = &types.CommentListThreadShort{
                CommentListShort: &types.CommentListShort{
                    Count: threadCounts[comment.SelfID],
                },
            }
            result.Comments[key].Parent = reply
        }
    }
}

There are a few code smells around this code:

The err never got logged or returned,
The naming for reply is misleading, as this is a parent comment
Cosmetic issues like nesting depth

So, my bet is, that by collecting all the valid SelfIDs beforehand, I can optimize this down from an O(N) into an O(1) operation, optimizing the service as we go, and improving the code quality as I do.

I need to:

Collect the SelfID values together into a slice,
Issue a “multi-get” query against the database,
Read the parents from a simple map[ID]Comment

Here is where the “planning” part comes in - I already had a multi-get call for comments implemented due to how the service itself structures comment metadata, and then retrieves the comment body in the second step. The normalized comment data meant I had to implement a multi-get for use in all the main comment list responses, and now I can re-use it here to retrieve all the parent comments.

parents := make(map[int64]*types.Comment)
parentComments := []*types.Comment{}
parentIDs := []int64{}

for _, comment := range result.Comments {
    if comment.SelfID > 0 {
        parentIDs = append(parentIDs, comment.SelfID)
    }
}

if err := svc.comments.Comments(ctx, &parentComments, parentIDs, currentUser); err != nil {
    return err
}

for _, parent := range parentComments {
    parents[parent.ID] = parent
}

Before the loop we first collect all valid SelfID values into parentIDs, retrieve all the comments having those ID values into the parentComments slice, and finally, filling out a map, parents, with the results. This is also generally safer than the previous code, as possible errors are returned.

We could improve the allocations further to avoid the append call, but given the very limited response sizes, pre-allocating the parentIDs slice just extends the code without much benefit. It may be a valid optimisation technique for other cases.

Finally, by restructuring the initial code a bit to read from the parents map, we fill out the values we need without any additional database lookups. All the data has been pre-filled with a single DB query.

for key, comment := range result.Comments {
    if comment.SelfID == 0 {
        continue
    }
    if parent, ok := parents[comment.SelfID]; ok {
        parent.Comments = &types.CommentListThreadShort{
            CommentListShort: &types.CommentListShort{
                Count: threadCounts[comment.SelfID],
            },
        }
        result.Comments[key].Parent = parent
    }
}

Finally, after the deploy, we can take a look at the logged transaction durations. There has been a strong improvement over the board, as we optimized the endpoint with the highest impact. Overall the optimisation reduced the 95th and 99th percentile noticably, all with little effort.

All in, I probablly haven’t needed to optimize this, since the service itself doesn’t accrue any significant penalty in operations. With a few percent of CPU in use over the service group, the change itself doesn’t have a real impact on cost, but as it goes, it immediately improves the user experience (lower latency) and reduces technical debt.

After running the service for some time, we can see that the API call latency dropped to an average of 20ms (down from 33ms) and the 99th percentile decreased to 48ms (down from 79ms). We have optimized the service, to respond -39% as fast - OVERALL - and that’s amazing. At a big scale, these could be significant gains. And they all come from a single endpoint.

It’s clear that code with which we started is only waiting for a refactor down the line. If you can clean up the code regularly and optimize small inefficiencies like this, hopefully outgrowing your infrastructure becomes harder to do. After all - we don’t code for self-punishment, so it’s best to avoid these situations with regular maintenance work.

While I have you here...

It would be great if you buy one of my books:

I promise you'll learn a lot more if you buy one. Buying a copy supports me writing more about similar topics. Say thank you and buy my books.

Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.

Real world optimisation

While I have you here...

Want to stay up to date with new posts?