So the change that brought the performance benefit that I mentioned before, has already been in merged. It’s the one you posted before this.
Both the PRs haves similar performance characteristics. We were able to fix the already existing cache using some new apis, and some clever ways to use the cache. The main point for the new cache was that the ristretto cache implementation couldn’t keep consistency.
On our 21 million dataset, we ran 63 queries 1 billion times. We took the average of all the query times. For all the queries combined, we saw an improvement of around 23%. But for the larger queries (average around 3ns), we saw an improvement of 200% (now around 1ns average)
In this PR, we have introduced a new parameter in the cache settings, keep-updates. If this is set as true, the cache will not be invalidated after upserts. This is not set by default for now, while we test it further. It has some performance consequences during heavy mutations.
It has passed most reviews, and hopefully we would do a final review tomorrow and publish it.