That node seems to be back to getting a constant stream of snapshots.
I0318 16:32:45.590937 19 draft.go:1169] ---> SNAPSHOT: {Context:id:1 group:1 addr:"ADDR_REDACTED:7080" Index:97963566 ReadTs:98317853 Done:false SinceTs:98269910 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}. Group 1 from node id 0x1
I0318 16:32:45.591033 19 draft.go:180] Operation started with id: opSnapshot
I0318 16:32:47.537808 19 log.go:34] [2] [E] LOG Compact 5->6 (1, 1 -> 1 tables with 1 splits). [2370752 . 2370797 .] -> [2370804 .], took 2.266s
I0318 16:32:47.665330 19 log.go:34] [1] [E] LOG Compact 4->5 (1, 1 -> 1 tables with 1 splits). [2370759 . 2370722 .] -> [2370803 .], took 7.502s
I0318 16:32:49.909059 19 log.go:34] [2] [E] LOG Compact 5->6 (1, 1 -> 1 tables with 1 splits). [2370745 . 2370804 .] -> [2370812 .], took 2.371s
I0318 16:32:52.123233 19 log.go:34] [2] [E] LOG Compact 5->6 (1, 1 -> 1 tables with 1 splits). [2370602 . 2370812 .] -> [2370821 .], took 2.214s
I0318 16:32:52.681479 19 log.go:34] [1] [E] LOG Compact 5->6 (1, 1 -> 2 tables with 1 splits). [2364164 . 2349888 .] -> [2370823 2370824 .], took 2.139s
--
--
I0318 16:46:05.381449 19 log.go:34] [1] [E] LOG Compact 5->6 (1, 4 -> 4 tables with 2 splits). [2379826 . 2381349 2381198 2320302 2318146 .] -> [2381405 2381409 2381406 2381407 .], took 2.191s
I0318 16:46:07.902136 19 log.go:34] [3] [E] LOG Compact 4->5 (1, 1 -> 1 tables with 1 splits). [2381393 . 2381372 .] -> [2381445 .], took 4.263s
I0318 16:46:08.312027 19 log.go:34] [2] [E] LOG Compact 2->3 (1, 5 -> 6 tables with 2 splits). [2381408 . 2381425 2381367 2381369 2381365 2381368 .] -> [2381430 2381434 2381436 2381431 2381438 2381446 .], took 2.219s
I0318 16:46:10.033231 19 log.go:34] [0] [E] LOG Compact 5->6 (1, 11 -> 6 tables with 4 splits). [2375833 . 2368990 2368991 2320314 2318150 2297071 2285123 2285096 2278902 2236704 2197393 2378678 .] -> [2381455 2381456 2381441 2381440 2381453 2381454 .], took 3.23s
I0318 16:46:11.125481 19 log.go:34] [2] [E] LOG Compact 3->4 (1, 0 -> 1 tables with 1 splits). [2381446 . .] -> [2381471 .], took 2.813s
I0318 16:46:12.138428 19 snapshot.go:119] Snapshot writes DONE. Sending ACK
I0318 16:46:12.138550 19 snapshot.go:126] Populated snapshot with data size: 22 GiB
I0318 16:46:12.155827 19 draft.go:1175] ---> Retrieve snapshot: OK.
I0318 16:46:12.155874 19 draft.go:1181] ---> SNAPSHOT: {Context:id:1 group:1 addr:"ADDR_REDACTED:7080" Index:97963566 ReadTs:98317853 Done:false SinceTs:98269910 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}. Group 1. DONE.
I0318 16:46:12.155978 19 draft.go:124] Operation completed with id: opSnapshot
W0318 16:46:12.156021 19 draft.go:1313] Raft.Ready took too long to process: Timer Total: 13m26.565s. Breakdown: [{disk 13m26.565s} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: false
I0318 16:46:12.931363 19 log.go:34] [0] [E] LOG Compact 0->2 (5, 0 -> 5 tables with 1 splits). [2381396 2381404 2381428 2381444 2381452 . .] -> [2381459 2381461 2381464 2381470 2381481 .], took 2.898s
--
W0318 16:46:12.156021 19 draft.go:1313] Raft.Ready took too long to process: Timer Total: 13m26.565s. Breakdown: [{disk 13m26.565s} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: false
I0318 16:46:12.931363 19 log.go:34] [0] [E] LOG Compact 0->2 (5, 0 -> 5 tables with 1 splits). [2381396 2381404 2381428 2381444 2381452 . .] -> [2381459 2381461 2381464 2381470 2381481 .], took 2.898s
I0318 16:46:13.229421 19 log.go:34] 2 [commit: 97963566, lastindex: 97963566, lastterm: 103] starts to restore snapshot [index: 97995088, term: 103]
I0318 16:46:13.229447 19 log.go:34] log [committed=97963566, applied=97963566, unstable.offset=97963567, len(unstable.Entries)=0] starts to restore snapshot [index: 97995088, term: 103]
I0318 16:46:13.229470 19 log.go:34] 2 restored progress of 1 [next = 97995089, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I0318 16:46:13.229479 19 log.go:34] 2 restored progress of 2 [next = 97995089, match = 97995088, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I0318 16:46:13.229484 19 log.go:34] 2 restored progress of 3 [next = 97995089, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I0318 16:46:13.229491 19 log.go:34] 2 [commit: 97995088] restored snapshot [index: 97995088, term: 103]
I0318 16:46:13.229522 19 draft.go:1151] Drain applyCh by reaching 0 before retrieving snapshot
I0318 16:46:13.229527 19 draft.go:1048] Drained 0 proposals
--
I0318 16:46:13.229527 19 draft.go:1048] Drained 0 proposals
anything else I can do here or is that peer in a bad state indefinitely? If I were on hardware I would remove the node and add him back with a new raft idx… but in kubernetes it uses the statefulset ordinal as the raft idx and man that would cause headaches if I removed one and it were blacklisted…