Grails Searchable plugin: fighting down OOME when rebuilding the search index

In one of my projects we store blob data (word documents, pdfs,…) in a Grails domain class. This data should all be indexed by a search engine and provide a convenient search interface. Pretty easy to solve using the Grails Searchable plugin – customer satisfied.

The project went live, everything was fine the first time. Until a certain point, we found that rebuilding the index throwed OutOfMemoryException (OOME). Nothing unusual in the java world,  increasing -Xmx helped out for some time.

With a growing set of data we reached a point where -Xmx comes near the size of physical RAM in the machine. Plugging more RAM is not an option here, so I’ve decided to tackle down the cause of this.

After some debugging session I’ve found out that rebuilding the index is performed in batches. Grails Searchable uses the Compass Framework under the hood. Compass¬† uses a default batch size of 200. For whatever reason the Grails searchable plugin superseeds that to a fixed (!) value of 5000. This means that during indexing 5000 database rows (each one holding a megabyte-sized blob) are read into RAM and then stored in the index! As long as there are no blobs in the database 5000 might be a good value, the larger the batch size the faster indexing will work.

The best solution here would be a configuration parameter for the fetchCount, I’ve filed an JIRA for this. Until this is fixed, thers a workaround: reconfigure the compassGpsDevice bean in the application’s resources.groovy like this

Mission Accomplished.

2 thoughts on “Grails Searchable plugin: fighting down OOME when rebuilding the search index

  1. Peter Ledbrook

    Wouldn’t it be better not to index blob columns, or at least index them separately? Or does this happen even if the associated property isn’t indexed (or stored)?

    By the way, keep an eye on the Elastic Search plugin – hopefully there will be some news in time for Groovy & Grails Exchange.

  2. Stefan Armbruster Post author

    The blob’s contents should be in the index. But I think it would not matter, IMHO the trouble comes from hibernate loading as much as instances at once. These will consume RAM independently if blob goes to the index or not.
    Regarding Elastic Search: Eagerly waiting for Groovy & Grails Exchange – CU next Thursday.

Leave a Reply

Your email address will not be published. Required fields are marked *