Hey,I am testing bulk loader of dgraph v1.1, I got the problem of losing data and Type System can work correctly.
I tested shards with 1 and 3.
If shards is 1,Type System can work correctly, but get a incorrect data size which obviouly less than the original 。
If shards is 3,Type System can’t work correctly, and data size as same as the metioned above。
I have no idea to resolve this problem。Following is my test procedure and data
I’m uploading file to google cloud,and providing source link when finished
UPDATE:
https://drive.google.com/open?id=1ndT1O1EllhL9FY814NCJwJgzWjdtC6zc
dgraph detail:
[Decoder]: Using assembly version of decoder
Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.hypermode.com.
To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
When shard is 1:
a.schema:
a.rdf:831MB
clear directory:
rm -rf /data/sdv2/dgraph/data/z && mkdir /data/sdv2/dgraph/data/z && rm -rf /data/sdv2/dgraph/data/0 && mkdir /data/sdv2/dgraph/data/0
start zero:
/data/sdv2/dgraph/opt/dgraph zero --idx 1 --replicas 1 --cwd /data/sdv2/dgraph/data/z --log_dir /data/sdv2/dgraph/data/z --my dl01:5080
do bulk load:
/data/sdv2/dgraph/opt/dgraph bulk
–files a.rdf
–schema a.schema
–format rdf
–map_shards 15
–reducers 1
–reduce_shards 1
–num_go_routines 1
–store_xids
–logtostderr
–v 10
–log_dir log
–ignore_errors
–zero dl01:5080
directory was created,but it’s only 359M, much less than a.rdf:751MB
out/0/p:359M
copy p to alpha’s work directory
cp -r /data/sdv2/dgraph/home/out/0/p /data/sdv2/dgraph/data/0
start aplha:
/data/sdv2/dgraph/opt/dgraph alpha --idx 1 --lru_mb 2048 --zero dl01:5080 --port_offset 1 --cwd /data/sdv2/dgraph/data/0 --log_dir /data/sdv2/dgraph/data/0
start dgraph-ratel and test in browser.
/data/sdv2/dgraph/opt/dgraph-ratel -addr dl01:5080
I known the data is in a.rdf
_:Q103 “Q103” .
_:Q103 <dgraph.type> “Entity” .
_:Q103 “Supercalifragilisticexpialidocious” .
_:Q103 “超級酷斃宇宙世界霹靂無敵棒” .
_:Q103 “song from the film and musical Mary Poppins” .
_:Q103 _:Q1860 .
_:Q1860 “Q1860” .
_:Q1860 <dgraph.type> “Entity” .
_:Q1860 “English” .
_:Q1860 “英语” .
_:Q1860 “West Germanic language originating in England with linguistic roots in French, German and Vulgar Latin” .
_:Q1860 “起源於英格蘭的一種語言” .
_:Q1860 “English language” .
_:Q1860 “en” .
_:Q1860 “eng” .
_:Q1860 “英文” .
_:Q1860 “英語” .
{
#Type System work correctly
q(func:type(“Entity”)){
count(uid)
}
}
{
“data”: {
“q”: [
{
“count”: 472755
}
]
},
}
{
#Type System work correctly
q(func:eq(id,“Q103”)){
expand(all)
}
}
{
#return data
“data”: {
“q”: [
{
“desc”: [
“song from the film and musical Mary Poppins”
],
“id”: “Q103”,
“name”: [
“超級酷斃宇宙世界霹靂無敵棒”,
“Supercalifragilisticexpialidocious”
]
}
]
},
…
}
{
#return nothing,but Tiel is in a.rdf
q(func:eq(id,“Q103”)){
Tiel {
name
}
}
}
{
#return nothing , node “Q1860” was lost
q(func:eq(id,“Q1860”)){
id
name
}
}
When shard is 3:
clear directory:
rm -rf /data/sdv2/dgraph/data/z && mkdir /data/sdv2/dgraph/data/z && rm -rf /data/sdv2/dgraph/data/0 && mkdir /data/sdv2/dgraph/data/0 && rm -rf /data/sdv2/dgraph/data/1 && mkdir /data/sdv2/dgraph/data/1 && rm -rf /data/sdv2/dgraph/data/2 && mkdir /data/sdv2/dgraph/data/2
start zero:
/data/sdv2/dgraph/opt/dgraph zero --idx 1 --replicas 1 --cwd /data/sdv2/dgraph/data/z --log_dir /data/sdv2/dgraph/data/z --my dl01:5080
do bulk load:
/data/sdv2/dgraph/opt/dgraph bulk
–files a.rdf
–schema a.schema
–format rdf
–map_shards 15
–reducers 3
–reduce_shards 3
–num_go_routines 3
–store_xids
–logtostderr
–v 10
–log_dir log
–ignore_errors
–zero dl01:5080
three dir created, I exec du -sh out, got 360MB , much less than a.rdf:751MB
out/1/p,out/2/p,out/3/p,
copy to corresponding alpha’ work directory:
cp -r /data/sdv2/dgraph/home/out/0/p /data/sdv2/dgraph/data/0 && cp -r /data/sdv2/dgraph/home/out/1/p /data/sdv2/dgraph/data/1 && cp -r /data/sdv2/dgraph/home/out/2/p /data/sdv2/dgraph/data/2
start alpha:
/data/sdv2/dgraph/opt/dgraph alpha --idx 1 --lru_mb 2048 --zero dl01:5080 --port_offset 1 --cwd /data/sdv2/dgraph/data/0 --log_dir /data/sdv2/dgraph/data/0
/data/sdv2/dgraph/opt/dgraph alpha --idx 2 --lru_mb 2048 --zero dl01:5080 --port_offset 2 --cwd /data/sdv2/dgraph/data/1 --log_dir /data/sdv2/dgraph/data/1
/data/sdv2/dgraph/opt/dgraph alpha --idx 3 --lru_mb 2048 --zero dl01:5080 --port_offset 3 --cwd /data/sdv2/dgraph/data/2 --log_dir /data/sdv2/dgraph/data/2
/data/sdv2/dgraph/opt/dgraph-ratel -addr dl01:5080
test in http://ip:8000/?local
the test query sentence is the same with When shard of 1:
{
#Type System lose efficacy, can’t work correctly
q(func:type(“Entity”)){
count(uid)
}
}
{
“data”: {
“q”: [
{
“count”: 0
}
]
},
}
{
#Type System lose efficacy, can’t work correctly
q(func:eq(id,“Q103”)){
expand(all)
}
}
{ #return nothing,but node Q103 has dgraph.type Entity, and has attribute id,name,desc
“data”: {
“q”:
},
extensions …
}
{
#return nothing
q(func:eq(id,“Q103”)){
Tiel {
name
}
}
}
{
#return nothing , node “Q1860” was lost
q(func:eq(id,“Q1860”)){
id
name
}
}