some experiments with ratpack and neo4j

Back in May this year I’ve attended the Gr8conf in Copenhagen. As always this conference added couple of things to my personal “take-a-look-at-this” list. The most exciting thingy for me was ratpack, a lean toolkit for building web applications on the JVM. Ratpack is powered by Netty and provides an event driven network engine as opposed to classic servlet based containers like Tomcat or Jetty which bind threads to requests. In high load scenarios with a huge number of concurrent requests the thread based model suffers from thread blocking wheres Ratpack is almost non blocking. To get familiar with Ratpack I’ve decided to implement a server component for Neo4j based on Ratpack. The first goal was to have a cypher endpoint, just like the standard Neo4j offers. Secondary goals were some more features:

  • support for multiple output formats: json, html, csv, message pack
  • ability to get a list of currently running queries and a button to abort each one individually. This is IMHO a feature lacking in classic Neo4j server. Esp. people getting started with cypher tend to write queries that run very long and there is currently now way to abort them.

For the future I’d like to add some more features:

  • transactional cypher endpoint
  • tbd (if you have ideas, please send a comment)

The goal is by far not to create a full fledged alternative to the existing Neo4j server. This project should focus on maximum throughput and ease of use for a cypher-only server component. To get started I’ve cloned https://github.com/ratpack/example-ratpack-gradle-groovy-app. You’ll find my code at https://github.com/sarmbruster/neo4j-ratpack.

Handling Requests

In ratpack you either write inline handlers in src/ratpack/ratpack.groovy or, for more complex cases, write a handler class derived from AbstractHandler and register that in ratpack.groovy.

Ratpack features Google Guice as well, so we can register e.g. a GraphDatabaseService as injectable component. See Neo4jModule, we’re exposing and configuring a GraphDatabaseService, a Cypher ExecutionEngine, a guard (see below) and a QueryRegistry. Other components can refer to them using the @Inject constructor annotation.

The core piece of code is CypherHandler, it parses the cypher command and parameters out of the request, runs it and renders the result depending on the requested content type.

Terminate Queries

From tech perspective this was the most interesting part to write. Neo4j can be run with a optional guard. Since this feature is not part of the public API it is not officially documented and might therefore be changed without further notice – be warned. To enable the guard feature a config option execution_guard_enabled needs to be set to true. However you can get access to the guard by calling ((GraphDatabaseAPI)graphDb).dependencyResolver.resolveDependency(Guard.class). In neo4j-ratpack the guard is exposed as a guice component so any ratpack handler can just inject it.

Each query is registered with a QueryRegistry. Part of that process is setting up a VetoGuard that throws an exception based on a boolean flag. In case of an exception the query is aborted.

Load Tests

Next step was running some load tests to a standard Neo4j server and neo4j-ratpack in order to compare the performance of the server components. All tests were run on my ThinkPad x230 (i7-3520M, 2.9GHz, 16 GB RAM, Ubuntu 13.04). For simplicity load generation and the server itself were running on the same machine – which is by far not perfect, but a starting point.

The intention of these load tests is not measuring Neo4j itself – it focusses on the server component only.

Using jmeter I’ve run a cypher query

with different parameters against a graph db consisting of 1.6M nodes, 7M relationships and 7M properties. Kudos to my colleague Alex who helped me setting up the dataset based from the LDBC project he’s involved with.

Exactly the same graph.db was used by both Neo4j server and neo4j-ratpack. No specific JVM tuning parameters were set. I’ve run the load test with a increasing number of concurrent threads and focussed on observing throughput and latency. The following diagrams were created using a python matplot script orginating from http://www.metaltoad.com/blog/plotting-your-load-test-jmeter. Please note, the latency is displayed in green on logarithmic axis, throughput is in blue on linear axis (ranges are different for the diagrams).

neo4jserver_jdk7

.
ratpack_jdk7

 

We’re observing a increasing rate of errors when going beyond 25k threads. Since the loadgenerator is colocated with the system to test this seems to be point where jmeter’s own memory and CPU consumption influences the system under test too much – so we’ll disregard the range above 25k.

The most interesting finding is that with ratpack the latency remains nearly constant in the range of [2.5k – 10k] threads whereas the standard neo4j server shows increasing latency. At 2.5k threads ratpack shows fully saturated CPU that’s why throughput decreases. With more or faster CPU we could improve both, latency and throughput. The explanation for the difference observed can be found in the different threading model. Neo4j server uses internally jetty which does blocking IO in opposite to ratpack using Netty. To verify this, I’ve taken threaddumps with yourkit:

threading telemetry of neo4j server
threading telemetry of neo4j server
threading telemetry of neo4j-ratpack
threading telemetry of neo4j-ratpack

It’s interesting to see that Neo4j server uses 10 worker threads per core (40 in total on my laptop). Most of the time, most of them are in blocked status indicated by the red color. Ratpack on the other side has 8 worker threads being mostly in ‘green’ aka runnable status. So ratpack indeed uses non blocking IO.

Conclusion

For cypher-only use cases with high concurrency requirements using ratpack instead of neo4j server might be an interesting alternative. However be aware, ratpack is bleeding edge, the current version is 0.9-SNAPSHOT.

 

nice addition back in Neo4j 1.9.1: closeable ExecutionResult

When using Cypher from Java code one instantiates a ExecutionEngine and calls execute to get a instance of ExecutionResult. ExecutionResult is an Iterable and therefore provides access to an iterator() method. Up to Neo4j 1.9 it is recommended to fully consume the iterator until hasNext() returns null, otherwise it’s not guaranteed that all resources are freed up again.

Since Neo4j 1.9.1 ExecutionResult implements ResourceIterable as well. This means the iterator has a close() method to free up bound resources without completely consuming the iterator.

I guess a lot of Neo4j users might not have explored that small but very helpful addition yet, so I think it’s worth mentioning.

assigning UUIDs to Neo4j nodes and relationships

TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9’s KernelExtensionFactory is included as well.

a little rant on Neo4j node/relationship ids

In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Never!

You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API.

As long as you use node ids only inside e.g. a request of an application there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you.

UUIDs

Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need.

automatic UUID assignments

For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way.

I’ve implemented a UUIDTransactionEventHandler that performs the following tasks:

  • populate a uuid property for each new node or relationship
  • reject a transaction if a manual modification of a uuid is attempted, either assigment or removal

[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/java/org/neo4j/extension/uuid/UUIDTransactionEventHandler.java” language=”java” startloc=20 stoploc=72]

setting up using KernelExtensionFactory

There are two remaining tasks for full automation of UUID assignments:

  • we need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID
  • we need to register UUIDTransactionEventHandler with the graph database

Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use:

[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/resources/META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory” language=”java”]

KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle.

For our UUID project we return a instance of UUIDLifeCycle:

[getgit repoid=neo4j-uuid userid=sarmbruster path=”src/main/java/org/neo4j/extension/uuid/UUIDLifeCycle.java” language=”java”]

Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl.

looking up nodes or relationships for uuid

For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/<myuuid> the node’s internal id returned.

build system and testing

For building the project, Gradle is used;  build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here.

final words

A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.

configuring a Neo4j GraphDatabaseService via Spring

Starting with Neo4j 1.9 the constructors for EmbeddedGraphDatabase and HighlyAvailableGraphDatabase were deprecated. The recommended way to instantiate those from Java is by using factory classes as documented in the reference manual. In the case of HighlyAvailableGraphDatabase use a different factory class HighlyAvailableGraphDatabaseFactory.

If you’re using Spring for wiring components or Spring Data Neo4j then it’s up to the container to build up the GraphDatabaseService for you. Doing this with factories is a little bit more tricky but I’ve create two gists.

Creating a EmbeddedGraphDatabase

[gist id=”6028963″]

Creating a HighlyAvailableGraphDatabase

[gist id=”6222698″]

how to use Cypher over JDBC from Groovy

Groovy has a very convenient to use API for accessing databases over JDBC. My colleagues at Neo Technology brought up a JDBC driver for Cypher. It’s very easy to bring these two together. The only ugly thing here is that you have to use a class called “Sql” to emit Cypher statements.

[gist id=”6094267″]

Dependency management is done by using the @Grab annotations. However there is some tweaking required. Not all required libraries are found on Maven central, so we need to add to repos, one for neo4j and one for restlet. Since we need to register a JDBC driver, the dependencies must be configured on system classloader level.

Update:

Neo4j Cypher JDBC driver also allows the usage of parameterized Cypher which know JDBC users by the term “prepared statement”. This is shows in l. 16. Since JDBC does not support named parameters you have to use numbers for the parameters, starting with “{1}” and provide a list of parameter values.

With this small example you can access your Neo4j server very easily.

 

 

upgrading an old Neo4j database using Groovy

In a project I’m involved with there is still a very old Neo4j 1.0 database used. Now this database should be used with an up-to-date version of Neo4j (1.7.2 as of this writing).

Following Neo4j docs, upgrades are done incrementally with every x.y in-between version by starting up and shutting the DB. For upgrading, a config parameter allow_store_upgrade=true must be set.

I’ve found manually downloading each intermediate version too boring and hacked a short groovy script helping with upgrading the datastore, see https://gist.github.com/3011606.

The script must be configured with the right database directory and for the desired target version of Neo4j uncomment the matching @Grab annotation. So when going from 1.0 – 1.8 this script must be called 8 times, each time with the next subsequent @Grab activated.

For those who wonder what the @Grab annotation does: it accesses a maven repository, downloads the dependencies behinde the scene and adds them to the class path.

The upgrade itself is trivial, just fire up an EmbeddedGraphDatabase with allow_store_upgrade=true and shut it down afterwards.

update on Grails Neo4j GORM plugin

The milestone release 1.0.0.M2 of the  Neo4j Grails GORM plugin was published a couple of days ago. The plugin provides a GORM compliant implementation backed by a Neo4j datastore. This means you can switch any Grails application to use Neo4j by simply exchanging the GORM plugin used.

Plugin documentation can be found at http://springsource.github.com/grails-data-mapping/neo4j/manual/index.html http://projects.spring.io/grails-data-mapping/neo4j/index.html. There is also a very minimal demo application available at http://neo4j-grails-demo.herokuapp.com/, see https://github.com/sarmbruster/neo4jsample for the source code. The demo app consists of three trivial domain classes with scaffolding controllers – nothing more for now.

Since there is currently the Neo4j Challange in progress, I’ve decided to participate there. As already stated the neo4jsample demo application is very minimal but it its intention is to serve as a starting point for your own Grails application using Neo4j as datastore backend. If you want support this project in the neo4j challenge, please send a tweet.

project setup for Grails with customized plugins using git submodules

Starting with it’s initial version a couple of years ago, Grails comes with a very nice predefined project structure, e.g. domain classes go into grails-app/domain. Every artefact has its well defined location. Almost every Grails application doing a little more than a simple “Hello world” will use at some extend one or more of the 500+ available plugins. The easiest way is to add plugins by using

The plugin gets downloaded from the central repository and added into your application, either in application.properties or into grails-app/conf/BuildConfig.groovy (default for Grails 2.0). In both cases the application contains only a reference to the installed version number.

This approach works very nice as long as you don’t have to change anything inside the plugin. I’ve had multiple times the necessity to modify external plugins, due to fixing bugs or due to some special requirements that are not yet covered by the existing released version. To deal with this, there are a couple of approaches:

1) most naive:

The most simple thing to do is just opening the plugin’s source files using your IDE and make your modifications. This has some real downsides:

  • your changes will be lost whenever you install a new upstream version of the plugin
  • since Grails unpacks installed plugins by default into $HOME/.grails/<grailsversion>/projects/bullseye2/plugins/<pluginname>, your changes are not part of any SCM and only reside locally.

I’ve warned you! Don’t do this!

2) build customized plugins in a seperated location

  • download the sources of the desired plugin, unpack it outside your main application
  • modify the plugins
  • create ‘your’ version: grails package-plugin
  • switch to your application and install the generated plugin: grails install-plugin <zipfile>

This approach annoys me, since you need to repeat the ‘package-plugin’/’install-plugin’ cycle for each an every change in the plugin. So inline-plugins to the rescue…

3) use inline plugins and copy the plugin’s sources into your application’s repo

Grails has the ability to use ‘inline plugin’. There are a couple of nice blog posts covering this, so I’m just using Burt Beckwith’s slides as a reference. You basically unzip the plugin’s zip artefact into some folder inside your application (e.g. plugins folder), reference it in grails-app/conf/BuildConfig.groovy and add the extracted files to your application’s repo.
In that case you’ve basically created a diverging copy of the plugin. Whenever there’s a new release of the plugin, integrating this back into your app might become a pita. You need to reapply your changes on top of the version of the plugin. That’s doable, but requires some advanced git technique, is error prone and can consume a lot of time – I know what I’m talking about here, trust me! Another major downside is that your local fixes and improvements are not easy to contribute back to upstream.

4) use inline plugins with git submodule

At that point, I’m expecting that you’re already using git for your project. If not, NOW is the time to do this and familiarize yourself with git. It’s worth it, promised!

  • create a distinct folder inside your project, e.g. ‘plugins’ that will contain all customized plugins
  • Find out the location of the plugin’s scm. A lot of plugins host their sources now on github. If so, fork this repository on github. If it’s using svn, you could mirror it to github, see a nice blog post for this. Basically you have now a forked plugin available on github.
  • clone the forded repo into your project, I’m using the spring-security-ui plugin here as example:

    The nice thing is that your application’s repo contains now a reference to the plugin’s repo and it even remembers the sha-id of the plugin you’re currently using.
  • now add plugins/grails-spring-security-ui as a inline plugin by adding to grails-app/conf/BuildConfig.groovy
  • push your changes
  • last but not least ask your collaborators on the project to pull your changes, let them do git submodule update --init. This command is only required once for each working copy.

The really nice thing about this setup is that you can easily share your plugin changes to the upstream repository by a pullrequest. And the other way is also pretty easy: when the upstream author accepts your pullrequest and/or adds a some new functionality you can directly consume this by going to your plugin’s directory and use

NB: this results in a to-be-committed change in the upstream repo since the sha-id of the referenced repo has changed.

some notes
  • whenever you commit something to a submodule the parent repo will have a non-empty status since your copy now references another another sha-id.
  • choose the URL of the custom plugin repo carefully. If you’re the only developer make changes in the plugin, you might use the repo’s public address and override locally the push url to your private URL by

    Now there’s a difference in fetch and push URLs:

    Your collaborators will only get the public fetch URL for both and won’t be able to commit to your private repo (that’s why we’re calling it private, right?).

    The other scenario is when you want multiple developers commit to the plugin repo. In this case you need to grant them access to your private repo (or even set up a github organization for that) and use the private URL in the ‘git submodule add’ command.

Contributing your changes back to the plugin upstream repo is very well explained at Fork a repo.

Conclusion

With the approach explained you’re able to customize any plugin and track the plugin changes from your applications repo. Contributing your bugfixes and improvements back to the upstream author is a piece of cake now as well as benefiting from upstream changes.

I’d be greedly waiting for your thoughts and for discussion on this setup. Combining this with git-flow looks like a kind of best practice for Grails projects – at least for me.

embedding the git/hg/svn’s revision number inside a grails application

When your application goes to production, you should be prepared for handling bugs. Users and customers will find them and hopefully report them. I’ve been multiple times in a situation where a well written bug report including a stacktrace doesn’t help me much because the stacktrace doesn’t match the current development state of the code. Therefore it simplifies life if the bug report contains a build number or a SCM revision number. With this small recipe, you can easily add this information to your application.

Find out your current SCM revision

Every version control system does this differently, so here’s a short summary of what command might be used for what SCM:

SCM command to get revision information
Mercurial hg id -i -n -b -t
Git git rev-parse HEAD
Subversion svnversion

If your SCM of choice is not listed and you know the command, please let me know by posting a comment.

Create a template that contains the revision information

For the following, I’ll use mercurial, it should be easy to adopt this to the SCM of choice. Using a Gant-Script scripts/_Events.groovy you could hook into the build process and easily generate a Grails template that just contains the output of one of the above commands.

This recreated grails-app/views/_version.gsp upon each build, so we’re sure to have always the most recent revision ids there. It is crucial that the file containing the revision information is itself not under version control. To exclude this file you have to list it in .hgignore, .gitignore or use svn propedit svn:ignore.

Using the _version.gsp template

The final step is to embed the previously created _version.gsp somewhere in your code. For my usecase, I want to have the revision ids available on each and every page, so I’ll put it into the layout template grails-app/views/layout/main.gsp:

Be sure not to omit the leading “/” in the g:render tag, otherwise _version.gsp gets not found in case of a deeper nested view path.

a perfect team: Grails Taggable plugin and JQuery Tagit

In a recent Grails project the customer asked for support of a tagging functionality for some domain classes. In order not to clutter the tagspace too much auto-complete should be available when editing the tags.

In fact there is a very simple and elegant solution for this. On the application’s side, there’s the Grails Taggable plugin available. For the frontend side JQuery has a nice plugin called  Tagit plugin caring about editing tags and auto-completion. Both play together very well – showing this is the intention of this post.

Setting up the ‘to-be-tagged’ domain class

After installing the taggable plugin the usual way using

setting up the ‘to-be-tagged’ domain classes is fairly trivial, the only thing left is to add ‘implements Taggable’ to the ‘to-be-tagged’ domain classes, e.g.

By marking the domain class with the taggable interface it gets injected some methods for manipulating its tags on instance level

  • addTag,
  • addTags,
  • getTags,
  • parseTags,
  • removeTag,
  • setTags

as well as some new methods on class level:

  • getAllTags,
  • getTotalTags,
  • countByTag.

Setting up JQuery Tagit in the Grails application

Since the Tagit plugin depends on JQueryUI for autocompletion, lets first install this in the application. Using the resources plugin for managing our css/js/image resources is also a good idea:

Download an unzip the latest version of tagit (1.5 when authoring this post) to some temporary location. We basically need to copy three files contained in the zip:

  • tagit/js/tagit.js to <grailsapp>/web-app/js
  • one of tagit/css/tagit-<yourchoice>.css and tagit/css/ui-anim_basic_16x16.gif to <grailsapp>/web-app/css

Since we’ve added resources to our project let’s declare them as a module for the resources plugin. For more information on this concept, checkout the docs of the Grails resources plugin.

Editing the view

Next thing is the frontend. To get started, let’s generate a controller and views for the given domain class:

For simplicity, we’ll only care about the edit view for the rest of this post. The code of the generated view is the starting point. First we need to make use of the declared resources and second we need to render the already existing tags and apply the tagit plugin.

$(function() { $(“ul[name=’tags’]”).tagit({select:true, tagSource: “${g.createLink(action: ‘tags’)}”}); });

Tags

  • ${it}

L.3 adds the js/css resources. L.9 is requires some explanation, we’re decorating the ul tag having an attribute name='tags' with the tagit widget. The parameter select=true is crucial since it passes back the chosen tags upon form submission as a multivalued select-box. Specifying a tagSource URL provides auto-completion. In l.15-19 we’re displaying the existing tags in a simple unordered list. N.B. the name attribute mirrors the name of the select-box being created.

Editing the controller

The ProductController must be modified to store given tags upon form submission and to provide auto-completion. For storing tags, the update action requires a single line change:

tags isn’t a real property it is not covered by l.69 and an explicit call to setTags() is necessary.

For auto-completion a new action tags is introduced:

JQuery-UI auto-completion passes the partially entered tag in request parameter term. The code above searches for all tags starting with the given term and returns their name as JSON.

Result & Conclusion

With these very few lines of code a comfortable user interface for tagging with auto-completion could be established. Kudos to the authors of the JQuery Tagit and Grails Taggable plugins. These two plugins are a perfect match.