Purging cached items from NGINX with LUA
Those of you any how familiar with scaling up a service, will by now be familiar with NGINX as a strong workhorse, when it comes to proxy HTTP requests between multiple backend servers. There’s one thing the open source NGINX version doesn’t do however - you can’t issue a PURGE request against a cached resource and delete it from the cache. That is, until we bring in LUA.
First off, a bit of understanding is needed in regards to how NGINX constructs it’s cache files. Cache files are, plainly, just a MD5 hash of the cache key. The cache key is constructed of the upstream server, and the request URI. For example:
I have a file:
cache1/7/c0/c2789c49835d6c5266484488f005fc07
and in this file, there’s a line starting with KEY:
KEY: http://dev/upload/upload.txt
If you take that value of KEY and run it thru md5, you will get c2789c49835d6c5266484488f005fc07
,
which is the exact filename above. The folders are determined with the levels
parameter, when
declaring a proxy cache in NGINX.
proxy_cache_path /tmp/cache1 levels=1:2 keys_zone=cache1:10m
max_size=100m inactive=60m;
So in this case, the levels 1:2
value would generate two folders, based on the last characters
of the md5 hash. The first folder would be the last (1) character, and the second folder would
consist of the two (2) characters before those. Given the last three characters as c07
, the file
will be stored in the folders 7/c0/
based on the 1:2
levels value.
Given this information we realize two things:
- Deleting individual cache items is simple, as you know the URL beforehand
- You could even delete them by wildcard, but the lack of an index means we’d have to read every cache item
But let’s stick to just the first thing here. I want to delete this cached item, so the next request will refresh it from the back-end service.
Handling a PURGE request with LUA
For our purposes handling a purge request with LUA is very trivial. This is the part of the configuration that enables invoking a LUA script that will handle cache deletions:
location /upload {
if ($request_method = PURGE) {
set $lua_purge_path "/tmp/cache1/";
set $lua_purge_levels "1:2";
set $lua_purge_upstream "http://dev";
content_by_lua_file $site_root/lua/purge.lua;
}
proxy_pass http://dev;
proxy_cache cache1;
}
I abbreviated my upload
folder location here a bit, leaving in the relevant parts. As you see, there are
three variables declared, meant for LUA. These three variables are what’s difficult or impossible to get
in LUA, but are generally just configuration values for the cache location on disk, the levels parameter
that is required to build a filename, and the upstream parameter, which in combination with request URI
creates the cache KEY from above.
What’s happening here is that in the case of a PURGE request, the LUA script will be invoked for that request
instead of passing the request forward to the proxy. In comparison with the NGINX proxy option proxy_cache_bypass
,
we only clear the cached item, not requesting a new one from the back-end which would repopulate the cache.
So the only thing left to do is write some LUA code which will delete individual assets.
LUA code for deleting a single NGINX proxy cached item
This is a bit longer, so I just put up a GIST for all you interested people:
To give you a breakdown, it does:
- creates a cache KEY by combining
$lua_purge_upstream
and the request URI, - creates a MD5 of the cache KEY,
- creates a filename in line with
$lua_cache_levels
(line 33-47), - removes the file if it exists on disk,
- prints OK when it’s done
To test it I used the curl
command line utility like this:
# curl -X PURGE http://peer.lan/upload/upload.txt
OK
As the file disappears from disk, the next request will be a cache-miss, causing NGINX to repopulate the cache by performing another request against the upstream. As it stands, this is preferable to cache-breaking mechanisms on the front end, like appending “?t=[timestamp]” or “?rand=[random int]” to the URL. I wish it wasn’t so, but it’s exactly how some frameworks like jquery circumvent caching on the servers.
Setting cache to false will only work correctly with HEAD and GET requests. It works by appending “_={timestamp}” to the GET parameters.
If you’d like to leave me a comment, please say hello on this reddit thread.
While I have you here...
It would be great if you buy one of my books:
- Go with Databases
- Advent of Go Microservices
- API Foundations in Go
- 12 Factor Apps with Docker and Go
Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.
Want to stay up to date with new posts?
Stay up to date with new posts about Docker, Go, JavaScript and my thoughts on Technology. I post about twice per month, and notify you when I post. You can also follow me on my Twitter if you prefer.