Batch resolving of promises
I tend to have a lot of development ideas stemming from repetitive workloads or from an optimization standpoint. I tend to obsess over inefficient code structures in both. I've literally had dreams that provided me with answers which I implemented during the day. If only we could code at night during sleep. In retrospective, the Pareto principle applied to that subconsciously-influenced code base, meaning 80% of it's usage was fine, and 20% was outside of the scope I was trying to solve and introduced other problems. More about that some other time.
We have a fairly complex setup over at RTV Slovenia. The landing page, which takes the majority of all requests, is constructed from a variety of data sources. There are news items, comment counts, menu items, static content, recent items in the social section of the web site, video news and a lot of relationships that make the whole mess the most complex part of the web site by a wide margin.
There is little chance of rewriting it. But it's an interesting logical problem on how to optimize it without throwing it away. One of the common optimization techniques we use is that we group our data together, when possible. Given the traditional programming flow, this is sometimes quite tedious to optimize globally, instead the optimizations happen on lower levels - display items for one section, fetch all items, fetch all related comment counts, fetch all related videonews,... But we display about 10 sections of new items. We could fetch all news items in one bulked request, but it would take some significant refactoring. And there's still the other data sources we would need to worry about.
The concept of Futures and promises seems to be a good solution to this model. A Promise is defined as a deferred Value. In practice this is an object which value can be resolved at a later time. Seems perfect, all we need to add to extend this model is:
1) Promise relationships
2) Batch resolving of multiple promises
When I say "relationships" I'm trying to approach this from a data driven standpoint, and not program flow per-se. I don't want to use the then() keyword to trigger resolving of promises, and I don't want to keep track of the promises in a sea of closures when they resolve.
A Promise containing more Promises, containing more Promises is a good way to specify relationships between promises. It's not that simple, you still need some program flow when creating promises, but this is nothing that can't be solved with a getPromises() method and some recursion. A Promise defines a set of Promises to be resolved. A news item would define a comment count promise and return it here.
We stray from the traditional use of Promises here. Using Promise objects in this way gives us the ability of batch resolving promises whereas you don't get that ability from using Promises when you're implementing common Promises programming patterns. And we're maintaining the data relationships between them.
All that is left to do it resolve the promises in the final data tree. My approach was to traverse the tree and reference it by class name in a final list. This way it was possible to resolve a list of promises using a resolveAll($promises) method defined in a specific promise class. This is the batching function which takes all the promises of the same type and resolves them using one function call. This function takes care of fetching the data and resolving promises. You would do this in MySQL by using a query with the SET type, or you could use memcache::get or redis::mget.
You can check out my attempt at a solution here:
So, while the landing page would still need significant refactoring, this is a step in the right direction. The resulting data tree is perfect because it is resolved with no data duplication and the maximum amount of batching. Whatever data source you use, chances are it would only add one SQL query to get all results. And optimizing one SQL query call is much easier as having to optimize 20 of them over your complete application stack. It is also so nice to reduce the number of SQL queries you're working with in case you need to implement sharding, moving the database or some other data management changes.
The approach is sequential and you're given your data tree directly after execution of the resolve / resolveAll calls. There exists an opportunity to fetch data asynchronously, depending on the source of your data. If you're consuming API responses over HTTP, SQL queries over MySQL or any kind of data over a non-blocking connection, the resolving could be adapted to take advantage of this.
Fetching the data in such a way is a nice optimization, but it needs to be implemented over your complete MVC solution to really take advantage of the benefits. The goal is to come as close to possible to complete coverage, so none of your data calls get duplicated.
There is some thought that needs to be put into how your MVC framework can live with this data model, and where it should be avoided. The thing to keep in mind is, that this is basically an efficient model for fetching data while keeping relationships between data. This is somewhat a superset of DAO / DAL logic, since it approaches this data from a global viewpoint, and not a specific data structure viewpoint.
p.s. a significant pitfall here is also the PHP engine. I'm sure the performance could/would increase dramatically if this was running in a JVM. While the benchmark is not bad, the 95th percentile shows significant overhead in the initial runs, before PHP does some of it's pre-allocation magic to speed things up.
While I have you here...
It would be great if you buy one of my books:
Want to stay up to date with new posts?