Highlights
- JanusGraph is not a native graph database.
- JanusGraph is not self contained and relies on third-party solutions such as different (mostly NoSQL) storage backends.
- If JanusGraph is used with Cassandra and HBase, then it is a distributed database but It won’t have ACID transactions.
- If JanusGraph is used with BerkleyDB, then it has ACID transactions but it won’t be distributed.
Native GraphQL support
- Dgraph : Yes - Only DB to natively support GraphQL resulting in capacity to process GraphQL queries in parallel with high performance
- JanusGraph : No - JanusGraph query language is Gremlin. Reference
Distributed Graph database
- Dgraph : Distributed with the ability to use the same query everywhere as if querying a single database
- JanusGraph : JanusGraph is distributed only with Apache Cassandra and Apache HBase. Note that BerkeleyDB JE is a non-distributed database. HBase gives preference to consistency at the expense of yield, and Cassandra gives preference to availability at the expense of harvest Reference
Distributed ACID Transactions
-
Dgraph :
- Supported and Jepsen-tested
- Synchronous replication with immediate consistency meaning any client can read the latest write.
- Open Source
- Reference
- JanusGraph: JanusGraph transactions are not necessarily ACID. They can be so configured on BerkeleyDB, but they are not generally so on Cassandra or HBase, where the underlying storage system does not provide serializable isolation or multi-row atomic writes and the cost of simulating those properties would be substantial. Reference
Sharding
-
Dgraph :
- Predicate-based sharding . Avoids N+1 problem and network broadcasts when running a query in high fanout scenarios. This ensures low-latency query execution, irrespective of the size of the cluster or the number of intermediate results. Reference
- Consistent production level latencies and consistent queries Reference
- Automatic sharding
- Sharding a single predicate on the roadmap
-
JanusGraph
- When JanusGraph is deployed on a cluster of multiple storage backend instances, the graph is partitioned across those machines. By default, JanusGraph uses a random partitioning strategy that randomly assigns vertices to machines.
- When the graph is small or accommodated by a few storage instances, it is best to use random partitioning for its simplicity. As a rule of thumb, one should strongly consider enabling explicit graph partitioning and configure a suitable partitioning heuristic when the graph grows into the 10s of billions of edges.
- Reference
Consistent Replication
- Dgraph : Synchronous replication across all replicas
-
JanusGraph :
- Only Hbase has native support for strong consistency at row level (Reference). Even so JanusGraph documentation explains use of locks for data consistency on HBase here.
- Cansandra has specific configurations for replication. In general, higher levels are more consistent and robust but have higher latency. Reference
Linearizable Reads
- Dgraph : Strong (sequential) consistency across clients Reference
-
JanusGraph :
- Apache Cassandra or Apache HBase are both eventual consistency storage backends that means JanusGraph must obtain locks in order to ensure consistency. Because of the additional steps required to acquire a lock when committing a modifying transaction, locking is a fairly expensive way to ensure consistency and can lead to deadlock when very many concurrent transactions try to modify the same elements in the graph. Reference
- JanusGraph first persists all graph mutations to the storage backend. If the primary persistence into the storage backend succeeds but secondary persistence into the indexing backends or the logging system fail, the transaction is still considered to be successful because the storage backend is the authoritative source of the graph. This can create inconsistencies with the indexes and logs. To automatically repair such inconsistencies, JanusGraph can maintain a transaction write-ahead log which is enabled through the configuration. Reference
Correctness and durability testing
- Dgraph: Jepsen-tested
- JanusGraph: It is not Jepsen-tested.
High availability
-
Dgraph:
- Yes, HA Cluster Setup is explained here
- HA Cluster setup is available in Community Edition.
-
JanusGraph :
- High availability depends on the backend configuration. Both Hbase and Cassandra can be highly available.
- If an instance fails, i.e. is not properly shut down, JanusGraph considers it to be active and expects its participation in cluster-wide operations which subsequently fail because this instances did not participate in or did not acknowledge the operation. In this case, the user must manually remove the failed instance record from the cluster and then retry the operation. Reference
Transparent data encryption
- Dgraph: Yes, database files are encrypted at rest with a user-specified key
- JanusGraph: This depends on the backend storage system. Hbase and Oracle Berkley DB have encryption at rest options. Although it is not documented how they can be used with JanusGraph. Reference for HBase, and for Berkeley DB.
Query languages
-
Dgraph:
- GraphQL
- GraphQL± (Variation of GraphQL supporting advanced features.)
- JanusGraph: Gremlin Query Language
Management of runaway queries
-
Dgraph :
- Context cancellation which works across clients and servers. So, a context cancellation at the client level would automatically cancel the query at all involved servers
- OpenCensus integration , which allows distributed tracing all the way from app to Dgraph cluster and back.
- Open standards for query context cancellation and tracking
-
JanusGraph :
- There is nothing in Gremlin Server that will list running queries. As for cancellation, according to standard TinkerPop semantics a
Traversal
should respect a request for interruption on a thread. These semantics are enforced by the TinkerPop process test suite. That said, it is still up to the graph provider to properly allow for that behavior. Reference
- There is nothing in Gremlin Server that will list running queries. As for cancellation, according to standard TinkerPop semantics a
Backups
-
Dgraph :
- Binary format
- Both full and incremental backups to files, S3 and Google storage via Minio
- Live backups with no downtime
- Reference
-
JanusGraph :
- JanusGraph acts as an abstraction layer on top of the storage backends and defers to the storage backends for administrative best practices. As a result, there is a lack of centralized documentation on backend administrative tasks. Reference
- Cassandra offers Snapshot, incremental, and commit-log backups. Reference
- Hbase backup offerings are summarized here
Pricing and Free trial
-
Dgraph :
- Open source version is under Apache 2.0, so free to use and modify.
- Enterprise version pricing is based on the number of instances of Dgraph, not the number of cores / RAM / Disk, etc…
-
JanusGraph :
- Open Source under the Apache 2 license
Appropriate as primary database to build apps/data platform on
- Dgraph : Dgraph is a general-purpose database with a graph backend.
- JanusGraph : The use case is determined by the storage backend, JanusGraph is a graph engine not a graph database.
Open source
-
Dgraph :
- Yes, Apache 2.0. GitHub
- Enterprise features are NOT Apache 2.0. But, users can still read the source
- Dgraph open source version and enterprise version provide the same performance . They’re only different in that enterprise version has more features
- Dgraph supports many open standards, like Grpc, Protocol Buffers, Go contexts, Open Census integration for distributed tracing.
- JanusGraph : Open Source under the Apache 2 license
Protocols
-
Dgraph:
- HTTP/HTTPS
- gRPC
- Protocol Buffers
-
JanusGraph :
- HTTP/HTTPS
- WebSockets
- Reference
Point in time recovery
- Dgraph: On the roadmap
- JanusGraph JanusGraph does not provide point in time recovery. It can be configured to keep a write-ahead log. Reference
Multi-region deployments
- Dgraph : Yes
- JanusGraph: Depends on the storage system used. Cassandra has multi-region deployment. Reference
SQL migration tool
- Dgraph : Yes
- JanusGraph: No. There are some suggestions on the resources to do this with Cassandra here.
Authentication and authorization
- Dgraph :
- JanusGraph: HTTP Basic authentication and authentication over websocket. Reference
Drivers
-
Dgraph :
- Dgraph’s drivers use gRPC not REST
- Any GraphQL compatible client can be used
- Dgraph’s supported drivers are the same as Neo4J’s supported drivers: Java, JavaScript , Go, Python, .Net)
- Dgraph’s unofficial drivers are: Rust, Dart, Elixir
- Reference
-
JanusGraph:
- A list of TinkerPop drivers is available on TinkerPop’s homepage.
- In addition to drivers, there exist query languages for TinkerPop that make it easier to use Gremlin in different programming languages like Java, Python, or C#.
Multi-database features
- Dgraph: Multi-Tenancy on the roadmap
- JanusGraph: Edge Label Multiplicity
Graph Database As A Service (DBaaS)
- Dgraph: Hosted solution launching in mid-year 2020
- JanusGraph: No
Query execution plans
- Dgraph: Query planning on the roadmap
- JanusGraph: With JanusGraphManager, you can define a property in your configuration that defines how to access a graph.
Support for graph algorithms
-
Dgraph:
- Shortest k-paths
- Edge traversal limit to determine cycles in graphs
- Others requested from community listed here
- JanusGraph: JanusGraph doesn’t talk about graph algorithms but one could follow this Gremlin recipe for shortest-path for example.
Apache Spark integration
- Dgraph: No
- JanusGraph: Users can leverage Apache Hadoop and Apache Spark to configure JanusGraph for distributed graph processing. Reference
Kafka integration
- Dgraph: On the roadmap
- JanusGraph: There are no official plugins but there are some integrations done by the community. Here is an example with Hbase.
Import/export
-
Dgraph:
- Using BulkLoader or LiveLoader, Dgraph can read the data as is with no modification needed
- Supported data formats are JSON and RDF
- Exporting database is explained here
-
JanusGraph:
- Export: GraphML or GraphSon, Gremlin I/O library
- Import: Bulkloading, Gremlin I/O library