How large should I set the cache size?

Thanks for bringing it up — and also, really appreciate the PRs you sent.

Right now, the idea is that you can configure the cache size based on how many items you want in the cache. So if you can make a decision based on that, you can set the value accordingly.

We’re working on improving this — the plan is to make the cache size adapt automatically based on the total available memory. That’ll be part of an upcoming update.

Some context about the issue (since you seem quite familiar with the code):
I had tried setting the size of the list here, but it actually performed worse. There could be a couple of reasons for that, though I haven’t deeply investigated them yet:

  • Issue 1: Ristretto tends to evict large keys quickly. A list of size 100 was considered a “big key”, and this significantly reduced its cache usefulness.
  • Issue 2: Calculating the size of a posting list incurs some overhead, which impacts how useful the cache ends up being. The good news is we can now fix this — we can precompute and store the size when the posting list is created.

If you’d like to hop on a call to discuss this further, just let me know.