Thanks for reading and thanks for the many suggestions.
Method 2: I should write a bit more. For each element x in sorted B, do a loop over |A| lists. For each of the |A| lists, check if x is inside by a binary search. I did make a mistake here. The running time should be |B||A| ignoring log factors for binary search, which look squite bad in the case where |B| = N, i.e., no overlap.
Method 4: I didn’t like |A| network calls either, but the other three seem really bad for pagination.
Clarification about what I wrote about method 4: I was thinking of what you proposed, except that it is not batched up over all the lists. I thought it was obvious and didn’t write much about it. What I mean by “worker can trim the results and return early. It can iterate over RocksDB until… and return” is to iterate over RocksDB, intersect with index bucket until we get enough results (hence implicitly trimming the results) and return (early) once we have enough results.
Extra memory overhead for method 3: I was thinking that maybe we still need to keep the sorted UIDs. In retrospect, we probably don’t. That said, the sorting by position in sorted B can be done in-place, if we don’t need to keep one copy of UIDs sorted in original order, leading to no memory overhead at all.
Overall method: I like what you propose. To me, it is method 4, but batching over the posting lists, which is great. As a first step, I might assume that the index and predicate values live on the same worker. Later on, I can update the code to do MutateOverNetwork.