Badger data access vs Neo4j

stellarspot · August 7, 2019, 3:31pm

“Introducing Badger: A fast key-value store written purely in Go” blog post describes why it was decided to write completely new key value storage for Dgraph which is better suited for SSD and stores values separated from keys (WiscKey).

As I understand the query to Badger has logarithmic time cost.

Neo4j claims that they have the Node Store that allows speedy lookup by ID, by calculating the offset in the node store.

Would not it be better to compose a storage for Dgraph similar to Neo4j with constant time access?

Is it possible to use the similar to Neo4j approach in distributed data storage?

amanmangal · August 7, 2019, 9:36pm

Will you point us to the documentation of Neo4J?
We keep optimizing badger where we see the opportunity. In general, I think badger is highly optimized and works really well for our use case for Dgraph by providing efficient lookups for the data and index that we are looking for.

stellarspot · August 8, 2019, 5:01am

What it a Node Record?

Records are format we represent Neo4j’s nodes and relationships on disk. It’s always 14 bytes fixed size for nodes and points on the first relationship and property.

How is an node-record implemented?

There is the node-record on disk. It is loaded by the NodeStore and represented as NodeRecord instance in Neo4j. These NodeRecords are then used to load information about the node into a NodeImpl object.

Why is a Node Record relevant to Neo4j?

Fixed size blocks allow direct, fast access with the internal id, e.g. record # 1000 is found at position 14000 (1000 x 14). Whole regions of the store files are mapped into memory. The operating system makes portions of a file available in memory and takes care of syncing to disk. So we can access node records even faster. The node record is the database structure (starting point) for the graph element of a node.

stellarspot · August 8, 2019, 5:58am

Disk Storage

Neo4j’s storage is organized in record-based files per data structure – nodes, relationships, properties, labels, and so on. Each node and relationship record block is directly addressable by its id.

stellarspot · August 21, 2019, 2:32pm

The approach to data storage, chosen in Neo4j has one very useful consequence: since all records are strictly of the same size, accessing records by identifiers is pretty cheap, because doesn’t require any associative mapping from identifiers to record locations (hash table, tree or something else), identifiers just play as indexes in “arrays” of records.

Topic		Replies	Views
Releasing BadgerDB v2.0 - Dgraph Blog Blog	3	817	February 4, 2020
Queries and Storage Questions Dgraph	2	857	November 28, 2018
Noob advancement into DGraph learning Users	2	645	November 7, 2017
My (very) short experience with embedding Badger Badger	0	739	August 27, 2021
Why we choose Badger over RocksDB in Dgraph - Dgraph Blog Blog	8	1678	April 14, 2021

Badger data access vs Neo4j

Related topics