Purging cached items from NGINX with LUA

Those of you any how familiar with scaling up a service, will by now be familiar with NGINX as a strong workhorse, when it comes to proxy HTTP requests between multiple backend servers. There’s one thing the open source NGINX version doesn’t do however - you can’t issue a PURGE request against a cached resource and delete it from the cache. That is, until we bring in LUA.

First off, a bit of understanding is needed in regards to how NGINX constructs it’s cache files. Cache files are, plainly, just a MD5 hash of the cache key. The cache key is constructed of the upstream server, and the request URI. For example:

I have a file:

cache1/7/c0/c2789c49835d6c5266484488f005fc07

and in this file, there’s a line starting with KEY:

KEY: http://dev/upload/upload.txt

If you take that value of KEY and run it thru md5, you will get c2789c49835d6c5266484488f005fc07, which is the exact filename above. The folders are determined with the levels parameter, when declaring a proxy cache in NGINX.

proxy_cache_path /tmp/cache1 levels=1:2 keys_zone=cache1:10m
                 max_size=100m inactive=60m;

So in this case, the levels 1:2 value would generate two folders, based on the last characters of the md5 hash. The first folder would be the last (1) character, and the second folder would consist of the two (2) characters before those. Given the last three characters as c07, the file will be stored in the folders 7/c0/ based on the 1:2 levels value.

Given this information we realize two things:

  • Deleting individual cache items is simple, as you know the URL beforehand
  • You could even delete them by wildcard, but the lack of an index means we’d have to read every cache item

But let’s stick to just the first thing here. I want to delete this cached item, so the next request will refresh it from the back-end service.

Handling a PURGE request with LUA

For our purposes handling a purge request with LUA is very trivial. This is the part of the configuration that enables invoking a LUA script that will handle cache deletions:

location /upload {
	if ($request_method = PURGE) {

		set $lua_purge_path "/tmp/cache1/";
		set $lua_purge_levels "1:2";
		set $lua_purge_upstream "http://dev";

		content_by_lua_file $site_root/lua/purge.lua;
	}
	proxy_pass http://dev;
	proxy_cache cache1;
}

I abbreviated my upload folder location here a bit, leaving in the relevant parts. As you see, there are three variables declared, meant for LUA. These three variables are what’s difficult or impossible to get in LUA, but are generally just configuration values for the cache location on disk, the levels parameter that is required to build a filename, and the upstream parameter, which in combination with request URI creates the cache KEY from above.

What’s happening here is that in the case of a PURGE request, the LUA script will be invoked for that request instead of passing the request forward to the proxy. In comparison with the NGINX proxy option proxy_cache_bypass, we only clear the cached item, not requesting a new one from the back-end which would repopulate the cache.

So the only thing left to do is write some LUA code which will delete individual assets.

LUA code for deleting a single NGINX proxy cached item

This is a bit longer, so I just put up a GIST for all you interested people:

To give you a breakdown, it does:

  1. creates a cache KEY by combining $lua_purge_upstream and the request URI,
  2. creates a MD5 of the cache KEY,
  3. creates a filename in line with $lua_cache_levels (line 33-47),
  4. removes the file if it exists on disk,
  5. prints OK when it’s done

To test it I used the curl command line utility like this:

# curl -X PURGE http://peer.lan/upload/upload.txt
OK

As the file disappears from disk, the next request will be a cache-miss, causing NGINX to repopulate the cache by performing another request against the upstream. As it stands, this is preferable to cache-breaking mechanisms on the front end, like appending “?t=[timestamp]” or “?rand=[random int]” to the URL. I wish it wasn’t so, but it’s exactly how some frameworks like jquery circumvent caching on the servers.

Setting cache to false will only work correctly with HEAD and GET requests. It works by appending “_={timestamp}” to the GET parameters.

If you’d like to leave me a comment, please say hello on this reddit thread.

While I have you here...

It would be great if you buy one of my books:

I promise you'll learn a lot more if you buy one. Buying a copy supports me writing more about similar topics. Say thank you and buy my books.

Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.