Critical bug in v21.12 permanently crashloops whole groups

just an update here: The problems with v21.12.0 were just too much to handle and I had to roll back to a v21.03.2+manifest corruption patch. We found divergent data on peers in a group and had to call it quits after that.

  • This new corruption/panic happens somewhere around the oracle keeping track of pending mutations and I feel like I am really close with my research in this comment but I need someone with more knowledge of this part of the codebase to validate.
  • The query performance of v21.12.0 is fantastic - unless you are inserting data - then it is incredibly slow. Rolling back to v21.03 has fixed this, query times during ingestion are back to ~150ms 50th percentile, as opposed to 2s-4s. See V21.12 slow queries
  • Please release a v21.03.4 with badger fixes if you can.
  • I am so glad backup+restore was open sourced. I really want to use it on v21.03 until v21.12 is fixed… maybe switch it over to OSS in a v21.03.4 release?
2 Likes