Looked into above issue. Most of the memory is taken by SSTs of w directory Badger, as tables there are opened with LoadToRAM option. Ideally we should not have much data in w directory as we take snapshot periodically and we delete all the previous raft indexes(lower than snapshot index).
But these indexes (and data belonging to them) are only marked as deleted and will only be deleted from disk during compaction. Hence these take significant amount of memory.
I did some experiments on w directory after stopping Dgraph cluster. At start, size of w was around
~6.5GB, once I ran badger flatten on it and resultant size of directory is ~1.5GB, which can be considered reasonable.
Output of dgraph debug
[Decoder]: Using assembly version of decoder
I0721 15:59:07.034752 7324 util_ee.go:126] KeyReader instantiated of type <nil>
Opening DB: ./wcopy
badger 2020/07/21 15:59:07 INFO: All 335 tables opened in 300ms
badger 2020/07/21 15:59:07 INFO: Replaying file id: 190 at offset: 38347976
badger 2020/07/21 15:59:07 INFO: Replay took: 3.115µs
rids: map[1:true]
gids: map[1:true]
Iterating with Raft Id = 1 Groupd Id = 1
Snapshot Metadata: {ConfState:{Nodes:[1] Learners:[] XXX_unrecognized:[]} Index:737765 Term:2 XXX_unrecognized:[]}
Snapshot Alpha: {Context:id:1 group:1 addr:"localhost:7080" Index:737765 ReadTs:984245 Done:false SinceTs:0 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
Hardstate: {Term:2 Vote:1 Commit:770724 XXX_unrecognized:[]}
Checkpoint: 770724
Last Index: 770724 . Num Entries: 32958 .
As per above output we should have around 32960 indexes.
I made some change in badger info command and ran it as below:
go run main.go info --dir=/home/ashish/temp/wcopy --read-only=false --show-keys --with-prefix="000000000000000100000001" --history
Got below result for original w directory:
[Summary]
Total Number of keys: 1195879
Total Number of deleted keys: 737765
Total Number of discardEarilierKey: 0
Total Number of deletedAndPresent: 425154
Total Number of rest: 32960
badger 2020/07/22 10:28:38 INFO: Got compaction priority: {level:0 score:1.73 dropPrefixes:[]}
While running above command on flattened w directory gives following result:
[Summary]
Total Number of keys: 32960
Total Number of deleted keys: 0
Total Number of discardEarilierKey: 0
Total Number of deletedAndPresent: 0
Total Number of rest: 32960
badger 2020/07/22 10:41:20 INFO: Got compaction priority: {level:0 score:1.73 dropPrefixes:[]}
Meaning of keys
deleted keys: Keys with deletion marker
deletedAndPresent: Keys which have deletion marker but previous version with data is also present, these are responsible for increase of RAM.
discardEarlierKey: Keys marked with discardEarlierVersion
I made the following changes in info.go
func showKeys(db *badger.DB, prefix []byte) error {
if len(prefix) > 0 {
fmt.Printf("Only choosing keys with prefix: \n%s", hex.Dump(prefix))
}
txn := db.NewTransaction(false)
defer txn.Discard()
iopt := badger.DefaultIteratorOptions
iopt.Prefix = []byte(prefix)
iopt.PrefetchValues = false
iopt.AllVersions = opt.keyHistory
// iopt.InternalAccess = opt.showInternal
it := txn.NewIterator(iopt)
defer it.Close()
rest := 0
totalKeys := 0
deletedKeys := 0
discardEarlier := 0
deletedAndPresent := 0
var prevkey []byte
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
if item.IsDeletedOrExpired() {
deletedKeys++
prevkey = item.KeyCopy(prevkey)
} else if item.DiscardEarlierVersions() {
discardEarlier++
} else if bytes.Equal(prevkey, item.Key()) {
deletedAndPresent++
} else {
rest++
}
/*
if err := printKey(item, false); err != nil {
return errors.Wrapf(err, "failed to print information about key: %x(%d)",
item.Key(), item.Version())
}
*/
totalKeys++
}
fmt.Print("\n[Summary]\n")
fmt.Println("Total Number of keys:", totalKeys)
fmt.Println("Total Number of deleted keys:", deletedKeys)
fmt.Println("Total Number of discardEarilierKey:", discardEarlier)
fmt.Println("Total Number of deletedAndPresent:", deletedAndPresent)
fmt.Println("Total Number of rest:", rest)
return nil
}
Thanks for help @ibrahim in above analysis.