Categories
Uncategorized

Cypher gotchas: multiple-match vs comma operator

Even as a long term Neo4j user with a 10y+ experience I’ve stumbled over something being new to me. Therefore I thought it’s worth sharing.

Categories
Uncategorized

Cypher fun: finding the position of an element in an array

At a onsite workshop with a new potential customer in wonderful Zurich I was challenged with a requirement I’ve never had in the last few years with Cypher.

Since I cannot disclose any details of the use case or the data used let’s create an artificial example with a similar structure:

  • a restaurant has several rooms
  • each room has various tables
  • each table is occupied with 0 to 6 guests

The goal of the query is to find the most occupied table for each room. As an example, I’ve created a Neo4j console at http://console.neo4j.org/r/ufmpwi.

The challenging here is that we want to do a kind of local filtering: for a given room we need the most occupied table. Calling the reduce function to the rescue. The idea is to run the reduce with 3 state variables:

  • the index of the highest occupation so far
  • the current index (aka iteration number)
  • the value of highest occupation so far
RETURN reduce(x=[0,0,0], i IN [1,2,2,5,2,1] | 
   CASE WHEN i>x[2] THEN [x[1],x[1]+1,o] ELSE [x[0], x[1]+1,x[2]] END 
)[0]

Reduce allows just one single state variable to be used but we can use a three element array instead (which is one variable 🙂 ). When the current element is larger than the maximum so far (aka the 3rd element of the state), we update the first element to the current position. The second element (current index) is always incremented. If the current element is not larger then the maximum so far, we just increment the current count (2nd element) and keep the other values:

The full query is:

MATCH (:Restaurant)-[:HAS_ROOM]->(room)-[:HAS_TABLE]->(table)
OPTIONAL MATCH (guest)-[:SITS_AT]->(table)
WITH room, table, count(guest) AS guestsAtTable
WITH room, collect(table) AS tables, collect(guestsAtTable) AS occupation
RETURN room, tables[reduce(x=[0,0,0], o IN occupation | 
   CASE WHEN o>x[2]
      THEN [x[1], x[1]+1,o]
      ELSE [x[0], x[1]+1,x[2]] 
   END 
)[0]] AS mostOccupied, tables, occupation

The first line is pretty much obvious. Since there might be tables without guests OPTIONAL MATCH is required in line 2.

Cypher does not allow you to do direct aggregations of aggregations. Using multiple WITH helps here. In line 3 we first calculate counts of guests per table. Line 4 returns one line per room with two collections – one holding the tables, the other their occupation. Note that both collections do have the same order. Finally from line 5 on the reduce function is applied to find the most occupied table in each room.

Categories
Uncategorized

quick tooling tip for hacking Cypher statements – Linux only

When developing Cypher statements for a Neo4j based application there are multiple ways to do this.

A lot of people (including myself) love the new Neo4j browser shipped with 2.0 and subsequent releases. This is a nicely built locally running web application running in your browser. At the top users can easily type their Cypher code and see results after executing, either in tabular form or as a visualization enabling to click through.

Neo4j 2.0 Browser

Another way is to use the command line and either go with neo4j-shell or use the REST interface by a command line client like cURL or more conveniently httpie (which I’ve previously blogged about).

Typically while building a Cypher statement you take a lot of cycles to hack a little bit, test if it runs, hack a little bit, test, …. This cycle can be improved by automating execution as soon as the file containing the cypher statement has hanged.

Linux comes with a kernel feature called inotify that reports file system changes to applications. On Ubuntu/Debian there is a package called inotify-hookable available offering a convenient way to set up tracking for a specific file or directory and take a action triggered by a change in the file/directory.

Assume you want to quickly develop a complex cypher statement in $HOME/myquery.cql. Set up monitoring using:

inotify-hookable -c ~/myquery.cql -c "(~/neo4j-enterprise-2.0.1/bin/neo4j-shell < ~/myquery.cql)"

Using your text editor of choice open $HOME/myquery.cql and change your code. After saving the statement will be automatically executed and you get instantly feedback.

Categories
Uncategorized

running Neo4j graphgists locally with docker.io

Neo4j has a excellent tool for documenting graph models called graphgists. As the name suggests graphgists are typically stored as github gists in asciidoc format. Additionally to the regular asciidoc you can embed executable cypher and a Neo4j console in a graphgist. For most people it’s perfectly fine hosting their graphgists at github or dropbox. If you want to keep your graphgists private just use a secret gist.

However there are companies with an higher demand for security and privacy that don’t want to expose their stuff onto public networks. Graphgists itself is javascript based and works locally. For the console part it by default connects to http://console.neo4j.org to do the graph operations.

One of my colleagues at Neo Technology recently pointed me to docker.io, a nice LXC based tool to create, maintain, run and share lightweight containers. To get familiar with docker I’ve decided to set up a small project to make graphgists and Neo4j console available in a docker container. This approach allows to handle with your graphgists 100% locally – nothing leaves the container.

How the docker container is built up

Docker containers are assembled by a cookbook called a dockerfile. It specifies the container you want to inherit from and then issue couple of commands to apply your customizations. In my case we need to install a servlet container (tomcat7 here). Neo4j console’s source repo is https://github.com/neo4j-contrib/rabbithole. Based on a recent change it now allows to build a war file ready for deployment into servlet containers. Of course we could have installed maven and clone the repo and exec mvn war:war to build neo4j console. I’ve decided to provide and use a prebuilt war file located at bintray, this removes the need for downloaded a massive number of dependencies for the in-container maven installation. Neo4j console’s war file is deployed as console.war and therefore available in the console context of tomcat.

The graphgist repo is cloned into the location of tomcat’s root context and the location of CONSOLE_URL_BASE is adopted to the locally available neo4j console. Finally tomcat is started as a service. Here’s the full Dockerfile:

[getgit userid=”sarmbruster” repoid=”docker_neo4j_graphgist” path=”Dockerfile” language=”bash”]

How to use the docker container

Of course you need to install docker locally. The procedure differs among operating system and is documented here. Next is to pull the preconfigured container and run it:

sudo docker pull sarmbruster/neo4j_graphgist
sudo docker run -d -v <absolute_path_for_local_gists>:/var/lib/tomcat7/webapps/ROOT/gists:ro -p 8080:8080 sarmbruster/neo4j_graphgist

This procedure might take some time on first invocation – a good candidate for having a nice espresso.

The second command maps a local directory (it’s crucial to use absolute path) into a in-container directory for accessing the gists. Port 8080 is mapped to the in-container port 8080.

When done your local graphgist setup is finished point your browser to http://localhost:8080. To create a gist, open your favourite text editor and save a <myname>.adoc file in the local gist directory used above when starting docker. For some samples of graphgist files, see https://github.com/neo4j-contrib/graphgist/tree/master/gists.Pointing the browser to http://localhost:8080?myname (without .adoc) should render your graphgist.

Using docker ps and docker stop <containerId> can be used to stop the graphgist containter.

closing words

Since I’m absolutely new to docker there might be better and more elegant ways to achieve locally running graphgists. I’m looking forward to read your comments and feedback on this.

 

Categories
Uncategorized

for your convenience: command line cypher the comfortable way with httpie

Today a tweet from @rafacm drove my attention to httpie, a cURL tool for humans. Using cURL is cumbersome and noisy, httpie makes it fairly easily usable.

install httpie

Httpie is python based therefore it’s trivial to install using easy_install/pip. On my Ubuntu 13.10 box I’ve used:

sudo pip install --upgrade httpie

It fetches the package and installs it locally.

running cypher statements

Running cypher statements is much more easy than using cURL. In cURL you have to manually assemble a json snippet to pass in, deal with content types. httpie makes it easy. The following code snippets are intended to be run on a single line, for readability I’ve splitted them.

To use the old-style non-transactional cypher endpoint just use

http -b -j localhost:7474/db/data/cypher
 query="START n=node(*) return n limit 2"

for non-parameterized queries and

http -b -j localhost:7474/db/data/cypher 
  query="MATCH (m:Movie {title:{title}}) return m" 
  params:='{"title":"The Matrix"}'

for parameterized queries. Httpie sets the ‘Accept’ header to json if option -j is used. In case  of -j httpie assembles the key-value request items internally into json format as well. Since the params contains a json map it needs to be assigned with “:=” instead of just “=”.

For the transactional Cypher endpoint in Neo4j 2.0 httpie can be used like this:

http -b -j localhost:7474/db/data/transaction/commit 
 statements:='[{"statement": "MATCH (m:Movie {title:{title}}) return m", 
  "parameters": {"title":"The Matrix"} }]'
Categories
Uncategorized

nice addition back in Neo4j 1.9.1: closeable ExecutionResult

When using Cypher from Java code one instantiates a ExecutionEngine and calls execute to get a instance of ExecutionResult. ExecutionResult is an Iterable and therefore provides access to an iterator() method. Up to Neo4j 1.9 it is recommended to fully consume the iterator until hasNext() returns null, otherwise it’s not guaranteed that all resources are freed up again.

Since Neo4j 1.9.1 ExecutionResult implements ResourceIterable as well. This means the iterator has a close() method to free up bound resources without completely consuming the iterator.

I guess a lot of Neo4j users might not have explored that small but very helpful addition yet, so I think it’s worth mentioning.

Categories
Uncategorized

how to use Cypher over JDBC from Groovy

Groovy has a very convenient to use API for accessing databases over JDBC. My colleagues at Neo Technology brought up a JDBC driver for Cypher. It’s very easy to bring these two together. The only ugly thing here is that you have to use a class called “Sql” to emit Cypher statements.

[gist id=”6094267″]

Dependency management is done by using the @Grab annotations. However there is some tweaking required. Not all required libraries are found on Maven central, so we need to add to repos, one for neo4j and one for restlet. Since we need to register a JDBC driver, the dependencies must be configured on system classloader level.

Update:

Neo4j Cypher JDBC driver also allows the usage of parameterized Cypher which know JDBC users by the term “prepared statement”. This is shows in l. 16. Since JDBC does not support named parameters you have to use numbers for the parameters, starting with “{1}” and provide a list of parameter values.

With this small example you can access your Neo4j server very easily.