About that, I had not observed the question well. But the answer for that is simple.
When you make the program in py you need to create a specific logic for batches (you can check the Live loader logic as inspiration). You have to split your millions of records into smaller pieces. Something like 1k triples per transaction is ideal.
You can also create logic that balances mutations between each node in your cluster. This would help lessen the stress of the cluster as a whole.