Loading Wikidata into dgraph

Has anybody tried loading the Wikidata RDF dumps into dgraph?

The files are free to download here

The format is an RDF compatible one called Turtle / TTL

However when I attempt to do dgraphloader, I hit the following error:

root@261358fbb2df:/dgraph# dgraphloader -r wikidata-20170508-all-BETA.ttl.gz

Dgraph version   : v0.7.6
Commit SHA-1     : 5f7eb75
Commit timestamp : 2017-05-01 14:19:52 +1000
Branch           : release/v0.7.6


Processing wikidata-20170508-all-BETA.ttl.gz
2017/05/10 19:27:46 main.go:135: Error while parsing RDF: Invalid input: @ at lexText, on line:1 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

For those curious, the first lines of the ttl file look like this:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology-beta#> .
@prefix wdata: <https://www.wikidata.org/wiki/Special:EntityData/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wdref: <http://www.wikidata.org/reference/> .
@prefix wdv: <http://www.wikidata.org/value/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix p: <http://www.wikidata.org/prop/> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix psn: <http://www.wikidata.org/prop/statement/value-normalized/> .
@prefix pq: <http://www.wikidata.org/prop/qualifier/> .
@prefix pqv: <http://www.wikidata.org/prop/qualifier/value/> .
@prefix pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/> .
@prefix pr: <http://www.wikidata.org/prop/reference/> .
@prefix prv: <http://www.wikidata.org/prop/reference/value/> .
@prefix prn: <http://www.wikidata.org/prop/reference/value-normalized/> .
@prefix wdno: <http://www.wikidata.org/prop/novalue/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .

wikibase:Dump a schema:Dataset,
                owl:Ontology ;
        cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
        schema:softwareVersion "0.0.5" ;
        schema:dateModified "2017-05-08T23:00:01Z"^^xsd:dateTime ;
        owl:imports <http://wikiba.se/ontology-1.0.owl> .

wdata:Q22 a schema:Dataset ;
        schema:about wd:Q22 ;
        schema:version "480270117"^^xsd:integer ;
        schema:dateModified "2017-04-30T16:35:36Z"^^xsd:dateTime ;
        wikibase:sitelinks "223"^^xsd:integer ;
        wikibase:statements "89"^^xsd:integer ;
        wikibase:identifiers "22"^^xsd:integer .

wd:Q22 a wikibase:Item ;
        rdfs:label "Scotland"@en-gb ;
        skos:prefLabel "Scotland"@en-gb ;
        schema:name "Scotland"@en-gb ;
        rdfs:label "Scotland"@en ;
        skos:prefLabel "Scotland"@en ;
        schema:name "Scotland"@en ;
        rdfs:label "Écosse"@fr ;
        skos:prefLabel "Écosse"@fr ;
        schema:name "Écosse"@fr ;
        rdfs:label "Scozia"@it ;
        skos:prefLabel "Scozia"@it ;
        schema:name "Scozia"@it ;

Can you convert them to RDF format first before loading? There should be online converters from TTL to RDF.

Only problem is that the unzipped ttl file is > 50 GB.

It would suck if I did the whole conversion only to find out it still won’t work.

Wondering if anybody had any success at this so far.

You should be able to find programs which can convert them while gzipped, and output as gzipped.

I will attempt that,

in the meantime, it does appear that loading ttl is supported in dgraph generally, for example:

https://github.com/dgraph-io/dgraph/issues/550

So I wonder what is going wrong with this one?

We don’t directly support ttl. We support RDF nquad format. They are similar but different.

I have loaded a copy of Wikidata into my instance of dgraph. I looked at two different options. The first one is that there are two places you can get rdf files in the right nquad format.

  1. http://tools.wmflabs.org/wikidata-exports/rdf/exports.html
  2. Index of /wikidatawiki/entities/ (look for the files ending in .nt.gz or .nt.bz2)

The option I ended up taking was I downloaded the JSON output from the 2nd link above, parsing it, and then using the Go client library to ingest them.

@zb1I have used the following link http://www.easyrdf.org/converter

How were you able to parse the JSON output to rdf?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.