Which is the best batch upsert?

rhpwd · July 4, 2023, 8:01am

I want to upsert in batches and currently implement several methods:

there are 100,000 nodes and 240,000 edge relationships

Machine: 1 12-core cpu 16g memory, 1 alpha and 1 zero

First query all the data that contains node and edges. There will be a query of 340,000 data, which takes 2.5 minutes. It is realized by assembling query statements, such as query1…query100000, and then judge whether to execute the old one based on the latest data. Delete the relationship and add the new relationship, then compare whether to update the node according to the content of the node, and finally update the edges that need to be deleted and the newly added edges, there are about 400,000 statements.(finaly it cost 7.5mins, and live import cost 5mins，if exist old data, it cost 2.5mins to query all data and 0~5mins to create or delete)
Directly use the upsert statement(query1…query 100000 {many mutations}) to upload and update all node content and edges, then return all edges, compare and delete old edges.

question:

Which method has the best performance?
In the first method of querying the full amount of data, is it possible to query in parallel? If I use parallel query, it will cause out ouf memory, so I can only check 1000 points and relationships for each query
Is there room for optimization in query1…query100000?

rhpwd · July 4, 2023, 9:35am

here is batch query pprof file

query example:
query13(func: eq(subnet.default.id, 1)) {
uid
dgraph.type
expand(all) {
uid
dgraph.type
}
}

rhpwd · July 4, 2023, 2:01pm

I removed trace code（span.Annotatef） which was written in query code, finaly a batch query costs 80s better than 330s before

MichelDiz · July 4, 2023, 4:41pm

This is not optimized for the context you mentioned.

Topic		Replies	Views
What is best way to batch mutate and avoid node duplication? Users	5	813	May 31, 2018
Managing large upserts Users	1	444	August 24, 2018
Batch upserts in dgo Dgraph kind:question , dgo , dgraph	3	559	March 15, 2021
How to update a large amount of data in dgraph every day Dgraph mutation	23	3655	August 10, 2020
Many small mutations vs one large. Best usage patterns Users kind:question	5	637	April 20, 2023