Hi,
I’m building a service with Go and GRPC (https://grpc.io/) that is able to query the pwned password list (Have I Been Pwned: Pwned Passwords).
The source list comes in a single .txt
file with a size of 30 GB in the format:
SHA1;Count-Of-Breaches
In order to make this data queryable I have built a GRPC service that has a single function CheckPassword
and returns a single bool Leaked
. Now comes the tricky part: I searched for an integrated Key-Value database to store the 30GB in the filesystem (like SQLite, but for Key-Value pairs optimized) and found RocksDb. This (as most of you obviously know) hasn’t worked very well due to cgo
. Now I found Badger and would like to use it as the persistence layer.
After the decision to use Badger was made I wrote a little “Import” tool to convert the 30GB text file to a badger database. After the start, I realized ultra-low-performance for the conversion. I ended up with 800KB of data per 30sec. Therefore I would nearly need two weeks (~13 days) to fully import the data. Therefore I think I made a terrible mistake somewhere or I’m just not fully aware of how Badger works.
Here is the somewhat shorted code I used for my Import-tool
for {
buf, _, err := r.ReadLine()
if err != nil {
if err.Error() == "EOF" {
break
}
log.Fatalln(err)
}
key := strings.ToLower(string(buf)[:40])
err = db.Update(func(txn *badger.Txn) error {
return txn.Set([]byte(key), []byte{})
})
if err != nil {
log.Fatalln(err)
}
}
Does someone have any idea what I did wrong? Thanks in advance