deep dive on fulltext indexing with Neo4j

In a previous blog post I’ve explained the differences of the different types of indexes being available in Neo4j. A common requirement for a lot of projects is the usage of fulltext indexes. With current versions of Neo4j (2.1.5 as of now) this can only be accomplished with the usage of manual indexes.

In this article I want to explain how you can use language specific analyzers for fulltext indexing and how to do regex searches for those.

When looking at the reference manual on fulltext indexing there is the notion of providing a custom analyzer class by specifying a config parameter analyzer upon index creation. It’s value is the full class name of the analyzer. There are two ways to create a manual index this, either using java api

or using REST API (using the wonderful httpie http command line client)

Lucene provides an optional set of language specific analyzers. These analyzers have some knowledge on the language their operating on and use that for word stemming, see for details on the internals of the GermanAnalyzer. As an example the German word for houses “Häuser” is stemmed to its singular form “Haus”. Consequently a query for “Haus” retrieves all both, occurrences of “Haus” and “Häuser”.

The language specific analyzers are residing in an optional jar file called lucene-analyzers-3.6.2.jar that is not shipping by default with Neo4j. Therefore copy lucene-analyzers-3.6.2.jar into Neo4j’s plugins folder.

When trying e.g. to use Lucene’s GermanAnalyzer using

you get back a HTTP status 500. The log files show up a strange exception java.lang.InstantiationException: The reason for this exception is that Neo4j tries to instantiate the analyzer class using a noarg default constructor. Unfortunately Lucene’s language specific analyzers don’t have such a constructor, see javadocs. The solution for this is write a thin analyzer class with a default constructor. Internally that class uses the Lucene provided analyzer as a delegate.

In order to simplify the process of setting this up I’ve create a small project on github called neo4j-fti. It contains the mentioned wrappers in package org.neo4j.contrib.fti.analyzers for all languages having a lucene analyzer. It also provides a kernel extension to Neo4j to automatically create fulltext indexes by a config option. In you need to set:

Additionally this project features an example how to use regular expression for search an index. Using Java API you need to pass a Lucene RegexQuery based on a Term holding your regular expression. The RegexQuery class isn’t part of lucene-core either, so be sure to have lucene-queries in your Neo4j’s plugins folder as well. This example is exposed in a unmanaged extension using the following code snippet:

Assuming a index named fulltext_de has been configured using the German analyzer (see above), use the following code using httpie again to create a node, add it to the fulltext index and perform a regular expression index query:

4 thoughts on “deep dive on fulltext indexing with Neo4j

  1. luigi

    Hello alo!

    I downloaded Httpiee, got HTTP/1.1 500 Server Error though:
    could you please help?

    thank you!

    gg4u-2:httpie gg4u$ http -v -j localhost:7474/db/data/index/node name=topic config:='{“analyzer”:”org.apache.lucene.analysis.en”}’
    POST /db/data/index/node HTTP/1.1
    Accept: application/json
    Accept-Encoding: gzip, deflate
    Content-Length: 74
    Content-Type: application/json; charset=utf-8
    Host: localhost:7474
    User-Agent: HTTPie/0.8.0

    “config”: {
    “analyzer”: “org.apache.lucene.analysis.en”
    “name”: “topic”

    HTTP/1.1 500 Server Error
    Cache-Control: must-revalidate,no-cache,no-store
    Connection: close
    Content-Length: 0
    Content-Type: text/html; charset=ISO-8859-1
    Date: Tue, 19 May 2015 04:01:10 GMT
    Server: Jetty(9.2.4.v20141103)

  2. luigi

    also add that :
    http localhost:7474/regex/fulltext_de/description/h.*s

    does not look to exist :
    http localhost:7474/regex/
    (error 404)
    am I missing smtg?

Leave a Reply

Your email address will not be published. Required fields are marked *