We love things to be sorted. And we have been trying to keep things sorted.
Yet another random idea that you probably have considered:
What if the key is just (pred, sub, obj) instead of (pred, sub)? To get the “posting list” from RockDB, we just seek to the prefix (pred, sub) and keep reading? The iterating might be painful, as opposed to a simple key-value fetch. But since RocksDB operates in blocks of about 4K, we can expect that once we hit a (pred, sub, obj), the other objs should be in the same block.
A row scan seems really really slow. I can try tweaking some settings. Would suspect that a lot of other graph databases might be storing rdf triplets instead of what we’re doing here. If so, this might be one reason why Dgraph can shine, and might help with some marketing, maybe.
The hope is that row scan doesn’t take too much longer than point queries and we can get rid of gotomic hash, mutation layers and perhaps greatly simplify the codebase. Another benefit is that RocksDB can take care of super long posting lists automatically. Unfortunately, the benchmarks don’t look good.
(In case you wonder why point queries slowed down a bit, it’s because we force a flush from memtable and force everybody to re-read. That is the case for a typical query for a typical database.)