Bulk loader still OOM during reduce phase

Yeah, that structure was a result of trying to import a document database (mongo) with one (flattened) document per node.

I managed to get the predicate count down to around 60k but it seems there are quite a few cases where we were using dictionaries so there are a few cases of this left… since it’s getting harder and harder to identify those cases reliably and since this is still an evaluation, I left those as-is.

Unfortunately, I am now running in to new sorts of issues caused by the choice of data transformation.

I won’t detail them as I have pretty much decided to abandon the flattened-document approach and am going to start working on a more node-intensive solution.

However, I was wondering if there was any literature on how to best organize data to maximize the performance of queries that contain many conditions.