assigning UUIDs to Neo4j nodes and relationships

TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9’s KernelExtensionFactory is included as well.

a little rant on Neo4j node/relationship ids

In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Never!

You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API.

As long as you use node ids only inside e.g. a request of an application there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you.

UUIDs

Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need.

automatic UUID assignments

For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way.

I’ve implemented a UUIDTransactionEventHandler that performs the following tasks:

  • populate a uuid property for each new node or relationship
  • reject a transaction if a manual modification of a uuid is attempted, either assigment or removal
 * 
    *
  • generates UUID properties for each new node and relationship
  • *
  • rejects any modification to pre-existing uuids
  • *
*/ public class UUIDTransactionEventHandler implements TransactionEventHandler { public static final String UUID_PROPERTY_NAME = "uuid"; public static final String UUID_INDEX_NAME = "uuid"; private final TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(); private final GraphDatabaseService graphDatabaseService; private Index nodeUuidIndex; private RelationshipIndex relationshipUuidIndex; public UUIDTransactionEventHandler(GraphDatabaseService graphDatabaseService) { this.graphDatabaseService = graphDatabaseService; } @Override public Object beforeCommit(TransactionData transactionData) throws Exception { checkForUuidDeletion(transactionData.removedNodeProperties(), transactionData); checkForUuidAssignment(transactionData.assignedNodeProperties()); checkForUuidDeletion(transactionData.removedRelationshipProperties(), transactionData); checkForUuidAssignment(transactionData.assignedRelationshipProperties()); initIndexes(); populateUuidsFor(transactionData.createdNodes(), nodeUuidIndex); populateUuidsFor(transactionData.createdRelationships(), relationshipUuidIndex); return null; } private void initIndexes() { if (nodeUuidIndex == null) { IndexManager indexManager = graphDatabaseService.index(); nodeUuidIndex = indexManager.forNodes(UUID_INDEX_NAME); } if (relationshipUuidIndex==null) { IndexManager indexManager = graphDatabaseService.index(); relationshipUuidIndex = indexManager.forRelationships(UUID_INDEX_NAME); } } @Override public void afterCommit(TransactionData data, java.lang.Object state) { } @Override public void afterRollback(TransactionData data, java.lang.Object state) { }

setting up using KernelExtensionFactory

There are two remaining tasks for full automation of UUID assignments:

  • we need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID
  • we need to register UUIDTransactionEventHandler with the graph database

Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use:

org.neo4j.extension.uuid.UUIDKernelExtensionFactory

KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle.

For our UUID project we return a instance of UUIDLifeCycle:

package org.neo4j.extension.uuid;

import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.event.TransactionEventHandler;
import org.neo4j.graphdb.index.IndexManager;
import org.neo4j.kernel.lifecycle.LifecycleAdapter;

import java.util.concurrent.ScheduledFuture;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

/**
 * handle the setup of auto indexing for UUIDs and registers a {@link UUIDTransactionEventHandler}
 */
class UUIDLifeCycle extends LifecycleAdapter {

    private TransactionEventHandler transactionEventHandler;
    private GraphDatabaseService graphDatabaseService;
    private IndexManager indexManager;
    private ScheduledFuture scheduledFuture;

    UUIDLifeCycle(GraphDatabaseService graphDatabaseService) {
        this.graphDatabaseService = graphDatabaseService;
        this.indexManager = graphDatabaseService.index();
    }

    @Override
    public void start() throws Throwable {
        transactionEventHandler = new UUIDTransactionEventHandler(graphDatabaseService);
        graphDatabaseService.registerTransactionEventHandler(transactionEventHandler);

        scheduledFuture = new ScheduledThreadPoolExecutor(1).scheduleWithFixedDelay(new Runnable() {
            @Override
            public void run() {
                try {
                    indexManager.forNodes(UUIDTransactionEventHandler.UUID_INDEX_NAME);
                    indexManager.forRelationships(UUIDTransactionEventHandler.UUID_INDEX_NAME);
                    scheduledFuture.cancel(false);
                } catch (IllegalArgumentException e) {
                    // thrown if index creation fails while startup is still in progress
                }

            }
        }, 10, 200, TimeUnit.MILLISECONDS);
    }

    @Override
    public void stop() throws Throwable {
        graphDatabaseService.unregisterTransactionEventHandler(transactionEventHandler);
    }

}

Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl.

looking up nodes or relationships for uuid

For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/<myuuid> the node’s internal id returned.

build system and testing

For building the project, Gradle is used;  build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here.

final words

A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.

7 thoughts on “assigning UUIDs to Neo4j nodes and relationships

  1. Stefan Plantikow

    Hey, great post!

    Using UUIDs as external stable identifiers is often the right thing to do.

    There’s one variant to this approach that avoids the UUID index lookup. Automatically assign the UUID as above but instead use pairs of (neo4j node/relationship id and UUID) as external stable identifiers. To load a node, try to load it by id and then check if it has the correct id. If not, treat it as non-existing. Using this checksumming index lookups can be avoided at the cost of having to maintain more complex external stable identifiers.

    Cheers!

  2. Stefan Armbruster Post author

    Hi Stefan,

    thanks for your feedback. Indeed that’s a good idea to store a combo of Neo4j’s node/rel id plus uuid in external systems and prevent the index overhead. Great catch!.

    Cheers,
    Stefan

  3. Jan

    Hi,

    first of all: cool post.
    Suppose I only want special nodes to get UUIDs (e.g. user ids).
    I came up with the following simple solution:
    Create a static counter in our class and take an integer as the UUID for each node (always increment the id). This variable can then be synchronized by a special lock to make sure, that no inconsistencies occur.

    The problem is: What do we do when a user deletes his account? Then, I basically have an unused id. Keeping track of these unused ids in a static list pobably the only way to avoid any gaps.

    What do you think about this approach?

  4. Stefan Armbruster Post author

    Hi Jan, I guess the best solution would be to introduce a config option holding a list of labels requiring UUIDs. Labels are a new feature in Neo4j 2.0. If a new node carrying one of the uuid-able labels, the TransactionEventHandler triggers UUID generation. What do you think?

    Not sure, but having a global lock might hit you performance wise in case of a lot concurrent additions of new nodes.

  5. Jan

    Hi Stefan,

    thanks for your answer.
    I think your solution is very elegant and easy to maintain.
    As you pointed out, a global lock would slow the performance down. So actually for each type of id (e.g. user id, comment id) 1 lock would be needed. Your suggestion sounds very good. Thanks.

  6. mat

    Hi Stefan, I’m trying this with neo4j v2.1.2 and it fails with “Failed to commit transaction Transaction(7, owner:”qtp2018697538-66″)[STATUS_NO_TRANSACTION,Resources=1], transaction rolled back —> Transaction handler failed.” every time. Is this plugin still supported, because it’s really useful?
    Thanks,
    Mat

  7. Stefan Armbruster Post author

    Hi Mat,
    the current codebase is still based on Neo4j 1.x. I will provide fixes to play nicely with Neo4j 2.x, however I’m rather busy these days.
    Cheers,
    Stefan

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>