Start from scratch without exporting to RDF(Badger)

MichelDiz · April 4, 2023, 4:43am

Stream data using Badger

BEFORE ANY ATTEMPT. PLEASE BACKUP YOUR FILES.

Download Badger binary and put it in /usr/local/bin/badger

wget https://github.com/dgraph-io/dgraph/releases/download/v23.0.0/badger-linux-amd64.tar.gz
tar -xzf badger-linux-amd64.tar.gz
mv badger-linux-amd64 /usr/local/bin/badger
badger -h

Commands to use

badger info --dir ./p

This will check the integrity of the data.

Check the part => Abnormalities

Abnormalities:
2 extra files.
0 missing files.
0 empty files.
0 truncated manifests.

What matter is missing and truncated. This will check for corrupted files.
If there is, only luck you can count on.

You can also flat your db before streaming it. This can help with streaming. (OPTIONAL)

badger flatten --dir ./p

Stream your data to a new DB

mkdir p_backup
mv p p_backup

badger stream --dir ./p_backup --out ./p

This will copy your data to a new place.

Now you can delete p(with mv it was deleted already), w, zw and t directories.

rm -fr t w zw

After that start a Zero group and Alpha. This step above you should do for each shard/group.
Stop them all. (otherwise all nodes will be “nodeCount”: 0)
Start the cluster again.
You should have all nodes intact.

Why would I do this?

With a completely new DB, it avoids file conflicts, bad configs, etc. It’s like starting from scratch, but with the same data as before.

Some other case is when you can’t start your cluster for any reason other than corruption.

MichelDiz · March 18, 2024, 1:10am

If you encounter an issue similar to the one described below:

1 groups.go:954] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
1 groups.go:966] Got Zero leader: zero1:5080
1 draft.go:770] Proposal with key: 368760529821136 already applied. Skipping index: 70557120. Delta: max_assigned:76962721 group_checksums:<key:1 value:17473684108039954740 >  Snapshot: <nil>.
2024/03/17 14:40:56 unexpected EOF
github.com/dgraph-io/dgraph/x.Check
	/home/runner/work/dgraph/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh.func1
	/home/runner/work/dgraph/dgraph/worker/draft.go:759
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh
	/home/runner/work/dgraph/dgraph/worker/draft.go:818
runtime.goexit
	/opt/hostedtoolcache/go/1.19.12/x64/src/runtime/asm_amd64.s:1594
1 init.go:85]

You can resolve it by above the method I’ve shared. It took me several hours to fix this(A very big dataset). I tried every possible solution but couldn’t move past the faulty snapshot. It was completely stuck. So, I backed up the ‘p’ folder and then deleted ‘p’. I performed the stream as mentioned in the tip above. I removed the paths ‘/w’ and ‘/zw’ (but kept zero running). For safety, I made another backup. Then, I initiated the Alpha again. Success.

I hope this helps.

Topic		Replies	Views
Binary backup of badger files in Dgraph Dgraph	6	774	January 31, 2023
Error while creating badger KV posting store Users	0	404	April 28, 2022
How to subscribe to dgraph changes? Dgraph	4	1561	March 9, 2020
HA Cluster setup Question Dgraph	5	855	July 4, 2019
Dgraph v1.0.7-rc3 release Dgraph	26	3337	July 27, 2018

Start from scratch without exporting to RDF(Badger)

Stream data using Badger

Stream your data to a new DB

Why would I do this?

Related topics