This autumn I’ve attended couple of conferences. It’s always interesting to learn about new tools popping up. One that took my attention is Apache Zeppelin. Zeppelin is a notebook style application focussing on data analytics. I’ll show in this article how Zeppelin can be used to access data residing in your favourite graph database Neo4j.
Downloading & Installing Zeppelin
Download the most recent binary Zeppelin package with all interpreters included from https://zeppelin.apache.org/download.html and extract the tgz archive. Start up Zeppelin using `bin/zeppelin-daemon.sh start` and connect your webbrowser to `http://localhost:8080`
Connect Zeppelin to Neo4j
Bring up a Neo4j (>= 3.0) instance, for simplicity we’ll run it on localhost as well, e.g. using docker: `docker run –rm -p 7474:7474 -p 7687:7687 neo4j:3.0.7-enterprise`. Be sure to set up a password – I’m using the most famous one “123” in this example – choose a better one for your own instances.
Use the “+Create” button on the upper right corner. First select “jdbc” in interpreter group and fill out the values according to this screenshot – you can delete the other options.
Please note that you don’t need to download the neo4j-jdbc driver manually, Zeppelin will fetch it from maven central if you supply `org.neo4j:neo4j-jdbc-bolt:3.0.1` for artifact.
Setting up a Zeppelin note using neo4j
Now it’s time to access Neo4j from a Zeppelin note. Create a new note by choosing “Create new note” from the “Notebook” drop down at the header line. You need to go to the gear-wheel icon on the top right “Interpreter bindings”. Be sure to enable “neo4j %jdbc”:
When done, you can run a Cypher query using the following notation:
%neo4j match (actor:Person)-[:ACTED_IN]->(m:Movie) with m, count(*) as actors return toString(actors) as actorCount, toString(count(m)) as movieCount order by actorCount
Some notes on this:
- you need to have
%neo4on the beginning to indicate we’re using the previously configured neo4j interpreter
- for a reason I need to investigate, Zeppelin does not display numeric values returned from Neo4j. Therefore you have to convert numeric results using
See the result:
Zeppelin does not a feature to display nodes and relationships, so a graph view would be nice. If I find some time I wanted to investigate on this – maybe we can come up with a even more rich integration.
Thanks to a feedback tweet on this post (thanks Andrea !), I’ve learned that there’s a pending pull request for a better Neo4j integration in Zeppelin. I have not yet tried it, but there is some network style visualization based on sigma.js included. Database access to Neo4j is not relying on JDBC as in my approach, but uses the bolt driver directly under the hoods.