At a onsite workshop with a new potential customer in wonderful Zurich I was challenged with a requirement I’ve never had in the last few years with Cypher.
Since I cannot disclose any details of the use case or the data used let’s create an artificial example with a similar structure:
- a restaurant has several rooms
- each room has various tables
- each table is occupied with 0 to 6 guests
The goal of the query is to find the most occupied table for each room. As an example, I’ve created a Neo4j console at http://console.neo4j.org/r/ufmpwi.
The challenging here is that we want to do a kind of local filtering: for a given room we need the most occupied table. Calling the reduce function to the rescue. The idea is to run the reduce with 3 state variables:
- the index of the highest occupation so far
- the current index (aka iteration number)
- the value of highest occupation so far
RETURN reduce(x=[0,0,0], i IN [1,2,2,5,2,1] |
CASE WHEN i>x THEN [x,x+1,o] ELSE [x, x+1,x] END
Reduce allows just one single state variable to be used but we can use a three element array instead (which is one variable 🙂 ). When the current element is larger than the maximum so far (aka the 3rd element of the state), we update the first element to the current position. The second element (current index) is always incremented. If the current element is not larger then the maximum so far, we just increment the current count (2nd element) and keep the other values:
The full query is:
OPTIONAL MATCH (guest)-[:SITS_AT]->(table)
WITH room, table, count(guest) AS guestsAtTable
WITH room, collect(table) AS tables, collect(guestsAtTable) AS occupation
RETURN room, tables[reduce(x=[0,0,0], o IN occupation |
CASE WHEN o>x
THEN [x, x+1,o]
ELSE [x, x+1,x]
)] AS mostOccupied, tables, occupation
The first line is pretty much obvious. Since there might be tables without guests
OPTIONAL MATCH is required in line 2.
Cypher does not allow you to do direct aggregations of aggregations. Using multiple
WITH helps here. In line 3 we first calculate counts of guests per table. Line 4 returns one line per room with two collections – one holding the tables, the other their occupation. Note that both collections do have the same order. Finally from line 5 on the reduce function is applied to find the most occupied table in each room.