getting insight in Neo4j’s JMX beans when running embedded

February 20th, 2014 No comments

Neo4j exposes a lot of valuable information via JMX. Sometimes you want to gain some insight in some JMX beans when running Neo4j in embedded mode. For this it’s crucial to have the neo4j-jmx-[version].jar file on the classpath.

The following short Groovy script demos this by using the @Grab annotation to fetch dependencies:

For a full reference of all available JMX beans, see the Neo4j manual.

Categories: Uncategorized Tags:

quick tooling tip for hacking Cypher statements – Linux only

February 7th, 2014 No comments

When developing Cypher statements for a Neo4j based application there are multiple ways to do this.

A lot of people (including myself) love the new Neo4j browser shipped with 2.0 and subsequent releases. This is a nicely built locally running web application running in your browser. At the top users can easily type their Cypher code and see results after executing, either in tabular form or as a visualization enabling to click through.

Neo4j 2.0 Browser

Another way is to use the command line and either go with neo4j-shell or use the REST interface by a command line client like cURL or more conveniently httpie (which I’ve previously blogged about).

Typically while building a Cypher statement you take a lot of cycles to hack a little bit, test if it runs, hack a little bit, test, …. This cycle can be improved by automating execution as soon as the file containing the cypher statement has hanged.

Linux comes with a kernel feature called inotify that reports file system changes to applications. On Ubuntu/Debian there is a package called inotify-hookable available offering a convenient way to set up tracking for a specific file or directory and take a action triggered by a change in the file/directory.

Assume you want to quickly develop a complex cypher statement in $HOME/myquery.cql. Set up monitoring using:

inotify-hookable -c ~/myquery.cql -c "(~/neo4j-enterprise-2.0.1/bin/neo4j-shell < ~/myquery.cql)"

Using your text editor of choice open $HOME/myquery.cql and change your code. After saving the statement will be automatically executed and you get instantly feedback.

Categories: Uncategorized Tags: , , ,

using remote shell combined with Neo4j embedded

January 17th, 2014 No comments

Neo4j can be deployed in multiple ways. Either you can run it as a server in a separate process, just like a classic database, or you can use embedded mode where your application controls the lifecycle of the graph database. Both, embedded and server mode can be used to setup a HA scenario with Neo4j enterprise edition.

In cases where Neo4j is used in embedded mode, there is often a demand for having a maintenance channel to the database, e.g. for fixing wrong data. Nothing simpler than that, there’s an easy way to enable the remote shell together with embedded mode, see a example written in groovy:

The trick is to

  1. have the neo4j-shell-<version>.jar on your classpath and
  2. pass in the config option remote_shell_enabled='true'

With this in place you can use the bin/neo4j-shell from your Neo4j distribution and access your embedded instance.

 

Categories: Uncategorized Tags: ,

indexing in Neo4j – an overview

December 19th, 2013 2 comments

Neo4j as a graph database features indexing as the preferred way to find start points for graph traversals. Over the years multiple different indexing approach have been added. The goal of this article is to give an overview on this to avoid confusion esp. for those who just recently got started with Neo4j.

A graph database using a property graph model stores its data in nodes, relationships and properties. In Neo4j 2.0 this model was amended with labels.

no indexes in the beginning

In the very early days of Neo4j there was no index. The only way to walk through the graph was by linking `interesting` things to the reference node. The reference node or “node 0″ acted as a global entry point. Up till versions 1.9.x the GraphDatabaseService had a deprecated getReferenceNode method being a historic relict cleaned up in 2.0.

manual indexes

The Neo4j hackers realized at some point that users don’t want to take the error prone and cumbersome way to find start points for graph traversals via the reference node. At this point a feature called ‘manual indexing’ appeared on the plate. This was back in the days before 1.0 – at a dark age without Cypher and server mode. The only way to speak to your graph was using the Java API. Therefore manual indexing was to be performed by Java API. The main entry point is calling graphDatabaseService.index() to get access to the IndexManager, see here for an example. Any index operation has to be done explicitly and manually. This approach enabled abuse of indexes as well. As a general pattern the index should be seen mainly as a lookup service and not as a secondary datastore. In general the index should not contain any information not residing in the graph itself.

Querying manual indexes was added to Cypher, so to access a manual index you use

START n=node:Person(name='abc') RETURN n

With `node` you refer to a index on nodes, `Person` refers to the index named Person and `name` is the property within the index. With manual indexes you can index relationships as well. Indexing relationships is however a rare use case.

A pretty nice option for manual indexes is the fact that you can pass in options when the index is first created. This allows to configure a index for fulltext indexing or choose different analyzers, see http://docs.neo4j.org/chunked/stable/indexing-create-advanced.html.

automatic indexes

In Neo4j 1.4 a new feature was introduced: auto indexing. Under the hoods it’s a manual index with a fixed name (node_auto_index, relationship_auto_index) combined with a TransactionEventHandler that mirrors changes on a set of configured property names to the index. Typically auto indexing is setup in neo4j.properties. This approach removes lot of burden from manually mirroring your property changes to the index and it allows Cypher statement to implicitly modify the index.

START n=node:node_auto_index(name='abc') RETURN n

From Cypher perspective there is no difference to manual indexes aside that you have to use the predefined index names (node_auto_index here).

It’s important to know that a change to auto index configuration will not trigger reindexing of existing datasets. A commonly used trick is to set a property to its current value which forces reindexing.

Another shortcoming is that the configuration of property keys to be indexed is global. Assume you have persons with a name property and cities with a name property. Any query to n=node_auto_index(name='abc') can potentially return both persons and cities. Therefore you should choose distinct property keys for different semantics.

schema indexes

On of the most shiny new features in Neo4j are schema indexes. Schema indexes `feel` a lot like indexes as we’re used to from relational world. A schema index is declared based on a label for a certain property.

CREATE INDEX ON :Person(name);

The above statement will create a index for the name property on all nodes carrying the Person label. Very convenient is the fact that the index will automatically be populated with preexisting data.

Queries do no longer have to explicitly use a index, it’s more the behaviour we know from SQL. When there is a index that can make a query more performant it will use. Assume a query like

MATCH (p:Person {name: 'Stefan'}) RETURN p

In case of no index being set up this will look up all Person nodes and check if their name property matches Stefan. If a index is present it will be used transparently.

Constraints are used almost the same way as schema indexes. E.g. to ensure uniqueness on the name property for nodes having the Person label use

CREATE CONSTRAINT ON (p:Person) ASSERT person.name IS UNIQUE

Currently schema indexes cannot be spawned over multiple properties but you can have multiple indexes for the same label. In case you want to do combined searches, it’s workaround to aggregate into a combined property. E.g. if you have firstName and lastName and want to do a combined lookup you might introduce a property name consisting of firstName + lastName and index only the name property.

Schema indexes are way more simple to use compared to manual/autoindexes – so anyone starting with Neo4j should mainly look at schema indexes. To make a clear point on this, the reference manual mentions manual and auto index in a section called ‘legacy indexes’.

 

Categories: Uncategorized Tags:

running Neo4j graphgists locally with docker.io

November 25th, 2013 2 comments

Neo4j has a excellent tool for documenting graph models called graphgists. As the name suggests graphgists are typically stored as github gists in asciidoc format. Additionally to the regular asciidoc you can embed executable cypher and a Neo4j console in a graphgist. For most people it’s perfectly fine hosting their graphgists at github or dropbox. If you want to keep your graphgists private just use a secret gist.

However there are companies with an higher demand for security and privacy that don’t want to expose their stuff onto public networks. Graphgists itself is javascript based and works locally. For the console part it by default connects to http://console.neo4j.org to do the graph operations.

One of my colleagues at Neo Technology recently pointed me to docker.io, a nice LXC based tool to create, maintain, run and share lightweight containers. To get familiar with docker I’ve decided to set up a small project to make graphgists and Neo4j console available in a docker container. This approach allows to handle with your graphgists 100% locally – nothing leaves the container.

How the docker container is built up

Docker containers are assembled by a cookbook called a dockerfile. It specifies the container you want to inherit from and then issue couple of commands to apply your customizations. In my case we need to install a servlet container (tomcat7 here). Neo4j console’s source repo is https://github.com/neo4j-contrib/rabbithole. Based on a recent change it now allows to build a war file ready for deployment into servlet containers. Of course we could have installed maven and clone the repo and exec mvn war:war to build neo4j console. I’ve decided to provide and use a prebuilt war file located at bintray, this removes the need for downloaded a massive number of dependencies for the in-container maven installation. Neo4j console’s war file is deployed as console.war and therefore available in the console context of tomcat.

The graphgist repo is cloned into the location of tomcat’s root context and the location of CONSOLE_URL_BASE is adopted to the locally available neo4j console. Finally tomcat is started as a service. Here’s the full Dockerfile:

FROM quintenk/jdk7-oracle
MAINTAINER Stefan Armbruster 
# make sure the package repository is up to date

RUN update-java-alternatives -s java-7-oracle
RUN apt-get update && apt-get -y install tomcat7 git-core curl

# set JAVA_HOME for tomcat to oracle jdk 7
RUN sed -i -e 's/#\(JAVA_HOME\)=\(.*\)\/.*$/\1=\2\/java-7-oracle/' /etc/default/tomcat7

# web rabbithole and place it under context /console
RUN curl -L -o /var/lib/tomcat7/webapps/console.war "http://dl.bintray.com/sarmbruster/generic/rabbithole-2.0.0-RC1.war"

# place graphgists in root context
RUN rm -rf /var/lib/tomcat7/webapps/ROOT
RUN git clone https://github.com/neo4j-contrib/graphgist.git /var/lib/tomcat7/webapps/ROOT
# adopt CONSOLE_URL_BASE to point to locally installed rabbithole
RUN sed -i -e "s/\(var CONSOLE_URL_BASE =\).*/\1 '\/console';/" /var/lib/tomcat7/webapps/ROOT/js/console.js; 

# fire tomcat up
CMD service tomcat7 start && tail -f /var/log/tomcat7/catalina.out
EXPOSE 8080

How to use the docker container

Of course you need to install docker locally. The procedure differs among operating system and is documented here. Next is to pull the preconfigured container and run it:

sudo docker pull sarmbruster/neo4j_graphgist
sudo docker run -d -v <absolute_path_for_local_gists>:/var/lib/tomcat7/webapps/ROOT/gists:ro -p 8080:8080 sarmbruster/neo4j_graphgist

This procedure might take some time on first invocation – a good candidate for having a nice espresso.

The second command maps a local directory (it’s crucial to use absolute path) into a in-container directory for accessing the gists. Port 8080 is mapped to the in-container port 8080.

When done your local graphgist setup is finished point your browser to http://localhost:8080. To create a gist, open your favourite text editor and save a <myname>.adoc file in the local gist directory used above when starting docker. For some samples of graphgist files, see https://github.com/neo4j-contrib/graphgist/tree/master/gists.Pointing the browser to http://localhost:8080?myname (without .adoc) should render your graphgist.

Using docker ps and docker stop <containerId> can be used to stop the graphgist containter.

closing words

Since I’m absolutely new to docker there might be better and more elegant ways to achieve locally running graphgists. I’m looking forward to read your comments and feedback on this.

 

Categories: Uncategorized Tags: , , ,

for your convenience: command line cypher the comfortable way with httpie

November 24th, 2013 No comments

Today a tweet from @rafacm drove my attention to httpie, a cURL tool for humans. Using cURL is cumbersome and noisy, httpie makes it fairly easily usable.

install httpie

Httpie is python based therefore it’s trivial to install using easy_install/pip. On my Ubuntu 13.10 box I’ve used:

sudo pip install --upgrade httpie

It fetches the package and installs it locally.

running cypher statements

Running cypher statements is much more easy than using cURL. In cURL you have to manually assemble a json snippet to pass in, deal with content types. httpie makes it easy. The following code snippets are intended to be run on a single line, for readability I’ve splitted them.

To use the old-style non-transactional cypher endpoint just use

http -b -j localhost:7474/db/data/cypher
 query="START n=node(*) return n limit 2"

for non-parameterized queries and

http -b -j localhost:7474/db/data/cypher 
  query="MATCH (m:Movie {title:{title}}) return m" 
  params:='{"title":"The Matrix"}'

for parameterized queries. Httpie sets the ‘Accept’ header to json if option -j is used. In case  of -j httpie assembles the key-value request items internally into json format as well. Since the params contains a json map it needs to be assigned with “:=” instead of just “=”.

For the transactional Cypher endpoint in Neo4j 2.0 httpie can be used like this:

http -b -j localhost:7474/db/data/transaction/commit 
 statements:='[{"statement": "MATCH (m:Movie {title:{title}}) return m", 
  "parameters": {"title":"The Matrix"} }]'
Categories: Uncategorized Tags: , , ,

some experiments with ratpack and neo4j

September 25th, 2013 No comments

Back in May this year I’ve attended the Gr8conf in Copenhagen. As always this conference added couple of things to my personal “take-a-look-at-this” list. The most exciting thingy for me was ratpack, a lean toolkit for building web applications on the JVM. Ratpack is powered by Netty and provides an event driven network engine as opposed to classic servlet based containers like Tomcat or Jetty which bind threads to requests. In high load scenarios with a huge number of concurrent requests the thread based model suffers from thread blocking wheres Ratpack is almost non blocking. To get familiar with Ratpack I’ve decided to implement a server component for Neo4j based on Ratpack. The first goal was to have a cypher endpoint, just like the standard Neo4j offers. Secondary goals were some more features:

  • support for multiple output formats: json, html, csv, message pack
  • ability to get a list of currently running queries and a button to abort each one individually. This is IMHO a feature lacking in classic Neo4j server. Esp. people getting started with cypher tend to write queries that run very long and there is currently now way to abort them.

For the future I’d like to add some more features:

  • transactional cypher endpoint
  • tbd (if you have ideas, please send a comment)

The goal is by far not to create a full fledged alternative to the existing Neo4j server. This project should focus on maximum throughput and ease of use for a cypher-only server component. To get started I’ve cloned https://github.com/ratpack/example-ratpack-gradle-groovy-app. You’ll find my code at https://github.com/sarmbruster/neo4j-ratpack.

Handling Requests

In ratpack you either write inline handlers in src/ratpack/ratpack.groovy or, for more complex cases, write a handler class derived from AbstractHandler and register that in ratpack.groovy.

Ratpack features Google Guice as well, so we can register e.g. a GraphDatabaseService as injectable component. See Neo4jModule, we’re exposing and configuring a GraphDatabaseService, a Cypher ExecutionEngine, a guard (see below) and a QueryRegistry. Other components can refer to them using the @Inject constructor annotation.

The core piece of code is CypherHandler, it parses the cypher command and parameters out of the request, runs it and renders the result depending on the requested content type.

Terminate Queries

From tech perspective this was the most interesting part to write. Neo4j can be run with a optional guard. Since this feature is not part of the public API it is not officially documented and might therefore be changed without further notice – be warned. To enable the guard feature a config option execution_guard_enabled needs to be set to true. However you can get access to the guard by calling ((GraphDatabaseAPI)graphDb).dependencyResolver.resolveDependency(Guard.class). In neo4j-ratpack the guard is exposed as a guice component so any ratpack handler can just inject it.

Each query is registered with a QueryRegistry. Part of that process is setting up a VetoGuard that throws an exception based on a boolean flag. In case of an exception the query is aborted.

Load Tests

Next step was running some load tests to a standard Neo4j server and neo4j-ratpack in order to compare the performance of the server components. All tests were run on my ThinkPad x230 (i7-3520M, 2.9GHz, 16 GB RAM, Ubuntu 13.04). For simplicity load generation and the server itself were running on the same machine – which is by far not perfect, but a starting point.

The intention of these load tests is not measuring Neo4j itself – it focusses on the server component only.

Using jmeter I’ve run a cypher query

START person=node:person(firstName={firstName}) 
WITH person 
ORDER BY person.lastName LIMIT 10 
MATCH (uniCity)<-[:IS_LOCATED_IN]-(uni)<-[studyAt:STUDY_AT]-(person), 
    (company)<-[worksAt:WORKS_AT]-(person)-[:IS_LOCATED_IN]->(personCity), 
    (company)-[:IS_LOCATED_IN]->(companyCountry) 
RETURN person.firstName, person.lastName, person.birthday, person.creationDate, person.gender, person.browserUsed, person.locationIP, personCity.name, uni.name, studyAt.classYear, uniCity.name, company.name, worksAt.workFrom,companyCountry.name

with different parameters against a graph db consisting of 1.6M nodes, 7M relationships and 7M properties. Kudos to my colleague Alex who helped me setting up the dataset based from the LDBC project he’s involved with.

Exactly the same graph.db was used by both Neo4j server and neo4j-ratpack. No specific JVM tuning parameters were set. I’ve run the load test with a increasing number of concurrent threads and focussed on observing throughput and latency. The following diagrams were created using a python matplot script orginating from http://www.metaltoad.com/blog/plotting-your-load-test-jmeter. Please note, the latency is displayed in green on logarithmic axis, throughput is in blue on linear axis (ranges are different for the diagrams).

neo4jserver_jdk7

.
ratpack_jdk7

 

We’re observing a increasing rate of errors when going beyond 25k threads. Since the loadgenerator is colocated with the system to test this seems to be point where jmeter’s own memory and CPU consumption influences the system under test too much – so we’ll disregard the range above 25k.

The most interesting finding is that with ratpack the latency remains nearly constant in the range of [2.5k - 10k] threads whereas the standard neo4j server shows increasing latency. At 2.5k threads ratpack shows fully saturated CPU that’s why throughput decreases. With more or faster CPU we could improve both, latency and throughput. The explanation for the difference observed can be found in the different threading model. Neo4j server uses internally jetty which does blocking IO in opposite to ratpack using Netty. To verify this, I’ve taken threaddumps with yourkit:

threading telemetry of neo4j server

threading telemetry of neo4j server

threading telemetry of neo4j-ratpack

threading telemetry of neo4j-ratpack

It’s interesting to see that Neo4j server uses 10 worker threads per core (40 in total on my laptop). Most of the time, most of them are in blocked status indicated by the red color. Ratpack on the other side has 8 worker threads being mostly in ‘green’ aka runnable status. So ratpack indeed uses non blocking IO.

Conclusion

For cypher-only use cases with high concurrency requirements using ratpack instead of neo4j server might be an interesting alternative. However be aware, ratpack is bleeding edge, the current version is 0.9-SNAPSHOT.

 

Categories: Uncategorized Tags: , , ,

nice addition back in Neo4j 1.9.1: closeable ExecutionResult

September 16th, 2013 No comments

When using Cypher from Java code one instantiates a ExecutionEngine and calls execute to get a instance of ExecutionResult. ExecutionResult is an Iterable and therefore provides access to an iterator() method. Up to Neo4j 1.9 it is recommended to fully consume the iterator until hasNext() returns null, otherwise it’s not guaranteed that all resources are freed up again.

Since Neo4j 1.9.1 ExecutionResult implements ResourceIterable as well. This means the iterator has a close() method to free up bound resources without completely consuming the iterator.

I guess a lot of Neo4j users might not have explored that small but very helpful addition yet, so I think it’s worth mentioning.

Categories: Uncategorized Tags: ,

assigning UUIDs to Neo4j nodes and relationships

August 20th, 2013 7 comments

TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9′s KernelExtensionFactory is included as well.

a little rant on Neo4j node/relationship ids

In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Never!

You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API.

As long as you use node ids only inside e.g. a request of an application there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you.

UUIDs

Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need.

automatic UUID assignments

For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way.

I’ve implemented a UUIDTransactionEventHandler that performs the following tasks:

  • populate a uuid property for each new node or relationship
  • reject a transaction if a manual modification of a uuid is attempted, either assigment or removal
 * 
    *
  • generates UUID properties for each new node and relationship
  • *
  • rejects any modification to pre-existing uuids
  • *
*/ public class UUIDTransactionEventHandler implements TransactionEventHandler { public static final String UUID_PROPERTY_NAME = "uuid"; public static final String UUID_INDEX_NAME = "uuid"; private final TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(); private final GraphDatabaseService graphDatabaseService; private Index nodeUuidIndex; private RelationshipIndex relationshipUuidIndex; public UUIDTransactionEventHandler(GraphDatabaseService graphDatabaseService) { this.graphDatabaseService = graphDatabaseService; } @Override public Object beforeCommit(TransactionData transactionData) throws Exception { checkForUuidDeletion(transactionData.removedNodeProperties(), transactionData); checkForUuidAssignment(transactionData.assignedNodeProperties()); checkForUuidDeletion(transactionData.removedRelationshipProperties(), transactionData); checkForUuidAssignment(transactionData.assignedRelationshipProperties()); initIndexes(); populateUuidsFor(transactionData.createdNodes(), nodeUuidIndex); populateUuidsFor(transactionData.createdRelationships(), relationshipUuidIndex); return null; } private void initIndexes() { if (nodeUuidIndex == null) { IndexManager indexManager = graphDatabaseService.index(); nodeUuidIndex = indexManager.forNodes(UUID_INDEX_NAME); } if (relationshipUuidIndex==null) { IndexManager indexManager = graphDatabaseService.index(); relationshipUuidIndex = indexManager.forRelationships(UUID_INDEX_NAME); } } @Override public void afterCommit(TransactionData data, java.lang.Object state) { } @Override public void afterRollback(TransactionData data, java.lang.Object state) { }

setting up using KernelExtensionFactory

There are two remaining tasks for full automation of UUID assignments:

  • we need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID
  • we need to register UUIDTransactionEventHandler with the graph database

Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use:

org.neo4j.extension.uuid.UUIDKernelExtensionFactory

KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle.

For our UUID project we return a instance of UUIDLifeCycle:

package org.neo4j.extension.uuid;

import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.event.TransactionEventHandler;
import org.neo4j.graphdb.index.IndexManager;
import org.neo4j.kernel.lifecycle.LifecycleAdapter;

import java.util.concurrent.ScheduledFuture;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

/**
 * handle the setup of auto indexing for UUIDs and registers a {@link UUIDTransactionEventHandler}
 */
class UUIDLifeCycle extends LifecycleAdapter {

    private TransactionEventHandler transactionEventHandler;
    private GraphDatabaseService graphDatabaseService;
    private IndexManager indexManager;
    private ScheduledFuture scheduledFuture;

    UUIDLifeCycle(GraphDatabaseService graphDatabaseService) {
        this.graphDatabaseService = graphDatabaseService;
        this.indexManager = graphDatabaseService.index();
    }

    @Override
    public void start() throws Throwable {
        transactionEventHandler = new UUIDTransactionEventHandler(graphDatabaseService);
        graphDatabaseService.registerTransactionEventHandler(transactionEventHandler);

        scheduledFuture = new ScheduledThreadPoolExecutor(1).scheduleWithFixedDelay(new Runnable() {
            @Override
            public void run() {
                try {
                    indexManager.forNodes(UUIDTransactionEventHandler.UUID_INDEX_NAME);
                    indexManager.forRelationships(UUIDTransactionEventHandler.UUID_INDEX_NAME);
                    scheduledFuture.cancel(false);
                } catch (IllegalArgumentException e) {
                    // thrown if index creation fails while startup is still in progress
                }

            }
        }, 10, 200, TimeUnit.MILLISECONDS);
    }

    @Override
    public void stop() throws Throwable {
        graphDatabaseService.unregisterTransactionEventHandler(transactionEventHandler);
    }

}

Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl.

looking up nodes or relationships for uuid

For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/<myuuid> the node’s internal id returned.

build system and testing

For building the project, Gradle is used;  build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here.

final words

A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.

Categories: Uncategorized Tags: , , ,

configuring a Neo4j GraphDatabaseService via Spring

August 13th, 2013 2 comments

Starting with Neo4j 1.9 the constructors for EmbeddedGraphDatabase and HighlyAvailableGraphDatabase were deprecated. The recommended way to instantiate those from Java is by using factory classes as documented in the reference manual. In the case of HighlyAvailableGraphDatabase use a different factory class HighlyAvailableGraphDatabaseFactory.

If you’re using Spring for wiring components or Spring Data Neo4j then it’s up to the container to build up the GraphDatabaseService for you. Doing this with factories is a little bit more tricky but I’ve create two gists.

Creating a EmbeddedGraphDatabase

Creating a HighlyAvailableGraphDatabase

Categories: Uncategorized Tags: