Categories
Uncategorized

assigning UUIDs to Neo4j nodes and relationships

TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9’s KernelExtensionFactory is included as well.

a little rant on Neo4j node/relationship ids

In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Never!

You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API.

As long as you use node ids only inside e.g. a request of an application there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you.

UUIDs

Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need.

automatic UUID assignments

For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way.

I’ve implemented a UUIDTransactionEventHandler that performs the following tasks:

  • populate a uuid property for each new node or relationship
  • reject a transaction if a manual modification of a uuid is attempted, either assigment or removal

[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/java/org/neo4j/extension/uuid/UUIDTransactionEventHandler.java” language=”java” startloc=20 stoploc=72]

setting up using KernelExtensionFactory

There are two remaining tasks for full automation of UUID assignments:

  • we need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID
  • we need to register UUIDTransactionEventHandler with the graph database

Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use:

[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/resources/META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory” language=”java”]

KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle.

For our UUID project we return a instance of UUIDLifeCycle:

[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/java/org/neo4j/extension/uuid/UUIDLifeCycle.java” language=”java”]

Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl.

looking up nodes or relationships for uuid

For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/<myuuid> the node’s internal id returned.

build system and testing

For building the project, Gradle is used;  build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here.

final words

A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.