How to clean up values for old versions of items

I noticed that it doesn’t matter if I set WithNumVersionsToKeep() or not, Badger will always store all versions of the item. I can see that by observing the size of the vlog files. If I create a new database and for example set NumVersionsToKeep to 1 and then I write different value to the same key 100 times, the vlog size will be the same as if I repeat the same test with NumVersionsToKeep = 100.

I update my keys often and my database is growing very big because of this, Is there a way to clean up the old data?

@adwinsky Thank you for the question. One of the Dgraph engineers will respond soon.

Hey @adwinsky, the value log (vlog) file serves as a Write-Ahead-Log (WAL) file. It also stores the values. The increase in vlog size is because every time you perform a write, the write is stored in the WAL. The WAL allows badger to recover in case of a crash.

You can run RunValueLogGC periodically to clean up value log file. Note - The call to RunValueLogGC doesn’t guarantee a value log file will be garbage collected. If there is enough stale data in the value log file, it will be garbage collected otherwise ErrNoRewrite (GC didn’t result in any cleanup error) is returned.

In my case the GC is very inefficient. I have a Kafka topic with 12 partitions. For every partition I create a database. Each database grows quite quickly (about 12*30GB per hour) and the TTL for most of the events is 1h, so the size should stay at constant level. Now for every partition I create a separate transaction and I process read and write operations sequentially, there is no concurrency, when the transaction is getting to big I commit it, in separate go-routine I start RunValueLogGC(0.5). Most of GC runs end up with ErrNoRewrite. Even tried to repeat RunValueLogGC until I have 5 errors in the row, but still I was running out of disk space quite quickly. My current fix is to patch the Badger GC, make it run on every fid that is before the head. This works fine, but eventually becomes slow when I have too many log files.

This is a database configuration I am using:

opts := badger.DefaultOptions(fmt.Sprintf(dir + "/" + name))
opts.SyncWrites = false
opts.Logger = nil
opts.ValueLogLoadingMode = options.FileIO

@adwinsky Try reducing the value log file size. The default size is 1 GB, you can try with 500 MB of log file size and also set the discard ratio to 0.001 RunValueLogGC(0.001). This should perform GC more actively.