TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9’s KernelExtensionFactory is included as well.
a little rant on Neo4j node/relationship ids
In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Never!
You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API.
As long as you use node ids only inside e.g. a request of an application there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you.
UUIDs
Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need.
automatic UUID assignments
For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way.
I’ve implemented a UUIDTransactionEventHandler that performs the following tasks:
- populate a uuid property for each new node or relationship
- reject a transaction if a manual modification of a uuid is attempted, either assigment or removal
[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/java/org/neo4j/extension/uuid/UUIDTransactionEventHandler.java” language=”java” startloc=20 stoploc=72]
setting up using KernelExtensionFactory
There are two remaining tasks for full automation of UUID assignments:
- we need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID
- we need to register UUIDTransactionEventHandler with the graph database
Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use:
[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/resources/META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory” language=”java”]
KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle.
For our UUID project we return a instance of UUIDLifeCycle:
[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/java/org/neo4j/extension/uuid/UUIDLifeCycle.java” language=”java”]
Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl.
looking up nodes or relationships for uuid
For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/<myuuid> the node’s internal id returned.
build system and testing
For building the project, Gradle is used; build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here.
final words
A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.
7 replies on “assigning UUIDs to Neo4j nodes and relationships”
Hey, great post!
Using UUIDs as external stable identifiers is often the right thing to do.
There’s one variant to this approach that avoids the UUID index lookup. Automatically assign the UUID as above but instead use pairs of (neo4j node/relationship id and UUID) as external stable identifiers. To load a node, try to load it by id and then check if it has the correct id. If not, treat it as non-existing. Using this checksumming index lookups can be avoided at the cost of having to maintain more complex external stable identifiers.
Cheers!
Hi Stefan,
thanks for your feedback. Indeed that’s a good idea to store a combo of Neo4j’s node/rel id plus uuid in external systems and prevent the index overhead. Great catch!.
Cheers,
Stefan
Hi,
first of all: cool post.
Suppose I only want special nodes to get UUIDs (e.g. user ids).
I came up with the following simple solution:
Create a static counter in our class and take an integer as the UUID for each node (always increment the id). This variable can then be synchronized by a special lock to make sure, that no inconsistencies occur.
The problem is: What do we do when a user deletes his account? Then, I basically have an unused id. Keeping track of these unused ids in a static list pobably the only way to avoid any gaps.
What do you think about this approach?
Hi Jan, I guess the best solution would be to introduce a config option holding a list of labels requiring UUIDs. Labels are a new feature in Neo4j 2.0. If a new node carrying one of the uuid-able labels, the TransactionEventHandler triggers UUID generation. What do you think?
Not sure, but having a global lock might hit you performance wise in case of a lot concurrent additions of new nodes.
Hi Stefan,
thanks for your answer.
I think your solution is very elegant and easy to maintain.
As you pointed out, a global lock would slow the performance down. So actually for each type of id (e.g. user id, comment id) 1 lock would be needed. Your suggestion sounds very good. Thanks.
Hi Stefan, I’m trying this with neo4j v2.1.2 and it fails with “Failed to commit transaction Transaction(7, owner:”qtp2018697538-66″)[STATUS_NO_TRANSACTION,Resources=1], transaction rolled back —> Transaction handler failed.” every time. Is this plugin still supported, because it’s really useful?
Thanks,
Mat
Hi Mat,
the current codebase is still based on Neo4j 1.x. I will provide fixes to play nicely with Neo4j 2.x, however I’m rather busy these days.
Cheers,
Stefan