Advanced MongoDB Query Batching with DataLoader and Sift

Last week we dove into the gloriously efficient world of batching GraphQL queries with DataLoader.

In all of the examples we explored, the queries being batched were incredibly simple. For the most part, our queries consisted of _id lookups followed by a post-processing step that matched each result to each _id being queried for.

This simplicity is great when working with examples, but the real world is a much more complicated, nuanced place.

Let’s dive into how we can work with more complicated MongoDB queries, and use sift.js to map those queries back to individual results in a batched set of query results.

A More Complicated Example

Instead of simply querying for patients or beds by _id, let’s set up a more complicated example.

Imagine that we’re trying to find if a patient has been in a particular set of beds in the past week. Our query might look something like this:


return Beds.find({
    patientId: patient._id,
    bedCode: { $in: bedCodes },
    createdAt: { $gte: moment().subtract(7, "days").toDate() }
});

If this query were used as a resolver within our GraphQL patient type, we would definitely need to batch this query to avoid N + 1 inefficiencies:


{
    patients {
        name
        recentlyInBed([123, 234, 345]) {
            bedCode
            createdAt
        }
    }
}

Just like last time, our new query would be executed once for every patient returned by our patients resolver.

Batching with DataLoader

Using DataLoader, we can write a loader function that will batch these queries for us. We’d just need to pass in our patient._id, bedCodes, and our createdAt date:


return loaders.recentlyInBedLoader({
    patientId: patient._id,
    bedCodes,
    createdAt: { $gte: moment().subtract(7, "days").toDate()
});

Now let’s implement the recentlyInBedLoader function:


export const recentlyInBedLoader = new DataLoader(queries => {
    return Beds.find({ $or: queries }).then(beds => {
        // TODO: ???
    });
});

Because we passed our entire MongoDB query object into our data loader, we can execute all of our batched queries simultaneously by grouping them under a single $or query operator.

But wait, how do we map the results of our batched query batch to the individual queries we passed into our loader?

We could try to manually recreate the logic of our query:


return queries.map(query => {
    return beds.filter(bed => {
        return query.patientId == bed.patientId &&
            _.includes(query.bedCodes, bed.bedCode) &&
            query.createdAt.$gte <= bed.createdAt;
    });
});

This works, but it seems difficult to manage, maintain, and test. Especially for more complex MongoDB queries. There has to be a better way!

Sift.js to the Rescue

Sift.js is a library that lets you filter in-memory arrays using MongoDB query objects. This is exactly what we need! We can rewrite our loader function using sift:


export const recentlyInBedLoader = new DataLoader(queries => {
    return Beds.find({ $or: queries }).then(beds => {
        return queries.map(query => sift(query, beds));
    });
});

Perfect!

Now we can write loaders with arbitrarily complex queries and easily map the batched results back to the individual queries sent into the loader.

Sift can actually be combined with DataLoader to write completely generic loader functions that can be used to batch and match any queries of any structure against a collection, but that’s a post for another day.