Bulk uploader not making equal shards

As the doc says:

Using this compression setting (Snappy) provides a good compromise between the need for a high compression ratio and efficient CPU usage.

I think it depends on the performance of your machine, the size of the dataset and which indexes are used. There are no definite numbers. You need to run the import many times to get experience.

I can’t find the document of --num_go_routines. I remember it is positively related to import speed and memory consumption.