I read blogs on dgraph and badger.
badger is based on WISCKEY paper, this implies two things
key => sorted, stored in LSM trees, persisted as “sst” files (SSTables)
values => append only logs, persisted as “vlog” files
And posting list means key-value pairs.
So when dgraph claims that posting list is sorted, what exactly is sorted? I assume only keys, and not values.
Or are values sorted? If yes, is this a dgraph logic implemented on top of badger?
I hope values aren’t sorted, if yes, the the whole point of sequential IO for values, as presented in WISCKEY paper is lost
Posting list are values which stores sorted information. For example, in case of indexes, the key would be the token + predicate (and some metadata) and values are list of sorted UIDs encoded using various techniques that we use internally. For more information, you could check out https://blog.dgraph.io/post/datetime-indexes-dgraph/.
Values in vlog are NOT sorted, just the memtables are sorted. Each value contains sequence of UIDs (or sequence of values) which is sorted (within each value, not across values). I think you are assuming that keys are UIDs which is not always the case. UIDs could be part of the key or part of the value depending upon whether it is corresponding to data (the primary index, uid + predicate → posting list) or corresponding to indexes (rest of the secondary indexes, token + predicate → posting list of UIDs).
That cleared a lot of my questions, thank you so much.
I have one last question remaining, related to WAL. since vlog are append only, they are equivalent to WAL.
Why have 2 different vlog files in both /p (posting list) directory and /w (write ahead log) directory.
How is the sst inside /w directory generated/stored? Since sst is sorted list, this is NOT append only. Or is the sst in /w directory append only and not sorted?
Mostly for separation of concern. p directory deals with data that is stored in the database, i.e. primary and all the secondary indexes (i.e. all the posting lists) whereas w (or zw in case of zero) directory mostly stores control information, for raft logs. Both needs to be durable.