Batching GraphQL Queries with DataLoader

Written by Pete Corey on Aug 14, 2017.

One of the biggest drawbacks of an out-of-the-box GraphQL solution is its tendency to make ridiculous numbers of N+1 queries. For example, consider the following GraphQL query:


{
    patients {
        name
        bed {
            code
        }
    }
}

We’re trying to grab all of the patients in our system, and for each patient, we also want their associated bed.

While that seems simple enough, the resulting database queries are anything but. Using the most obvious resolvers, our GraphQL server would ultimate make N+1 queries, where N represents the number of patients in our system.


const resolvers = {
    Query: {
        patients: (_root, _args, _context) =>  Patients.find({})
    },
    Patient: {
        bed: ({ bedId }, _args, _context) => Beds.findOne({ _id: bedId })
    }
};

Our application first queries for all patients (Patients.find), and then makes a Beds.findOne query for each patient it finds. Thus, we’ve made N (bed for patients) +1 (patients) queries.

This is unfortunate.

We could easily write a traditional REST endpoint that fetches and returns this data to the client using exactly two queries and some post-query transformations:


return Patients.find({}).then(patients => {
    return Beds.find({ _id: { $in: _.map(patients, 'bedId') } }).then(beds => {
        let bedsById = _.keyBy(beds, '_id');
        return patients.map(patient => {
            return _.extend({}, patient, {
                bed: bedsById[patient.bedId]
            });
        });
    });
});

Despite its elegance, the inefficiency of the GraphQL solution make it a no-go for many real-world applications.

Thankfully, there’s a solution! 🎉

Facebook’s dataloader package is the solution to our GraphQL inefficiency problems.

DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.

There are many fantastic resources for learning about DataLoader, and even on using DataLoader in an Apollo-based project. For that reason, we’ll skip some of the philosophical questions of how and why DataLoader works and dive right into wiring it into our Apollo server application.

All we need to get DataLoader working in our application is to create our “batch”, or “loader” functions and drop them into our GraphQL context for every GraphQL request received by our server:


import loaders from "./loaders";
...
server.use('/graphql', function(req, res) {
    return graphqlExpress({
        schema,
        context: { loaders }
    })(req, res);
});

Continuing on with our current patient and bed example, we’ll only need a single loader to batch and cache our repeated queries against the Beds collection.

Let’s call it bedLoader and add it to our loaders.js file:


export const bedLoader = new DataLoader(bedIds => {
    // TODO: Implement bedLoader
});

Now that bedLoader is being injected into our GraphQL context, we can replace our resolvers’ calls to Beds.findOne with calls to bedLoader.load:


const resolvers = {
    Patient: {
        bed: ({ bedId }, _args, { loaders }) => loaders.bedLoader.load(bedId)
    }
};

DataLoader will magically aggregate all of the bedId values that are passed into our call to bedLoader.load, and pass them into our bedLoader DataLoader callback.

Our job is to write our loader function so that it executes a single query to fetch all of the required beds, and then returns them in order. That is, if bedIds is [1, 2, 3], we need to return bed 1 first, bed 2 second, and bed 3 third. If we can’t find a bed, we need to return undefined in its place:


export const bedLoader = new DataLoader(bedIds => {
    return Beds.find({ _id: { $in: bedIds } }).then(beds => {
        const bedsById = _.keyBy(beds, "_id");
        return bedIds.map(bedId => bedsById[bedId]);
    });
});

That’s it!

Now our system will make a single query to grab all of the patients in our system. For every patient we find, our bed resolver will fire and that patient’s bedId into our bedLoader DataLoader. Our bedLoader DataLoader will gather all of our bedId values, make a single query against the Beds collection, and return the appropriate bed to the appropriate bed resolver.

Thanks to DataLoader we can have the elegance of a GraphQL approach, combined with the efficiency and customizability of the manual approach.

What if Elixir were Homoiconic?

Written by Pete Corey on Aug 7, 2017.

Because of it’s fantastically powerful macro system, Elixir is sometimes mistakenly referred to as a homoiconic programming language.

That being said, let’s put on our day-dreaming hats and think about what Elixir would look like if it were homoiconic.

What is Homoiconicity?

Before we start throwing around the word “homoiconic” and exploring how it applies to Elixir, let’s take the time to talk about what it means.

Boiled down to its essence, “homoiconic” when referring to programming languages means that “code is data”. That is, the code used to express a program is written using the data structures of that language.

The archetypal homoiconic family of programming languages is the Lisp family. The Lisp family includes languages like Common Lisp, Scheme, Clojure, and so on.

In most Lisps, list data structures are represented by values within sets of parentheses, separated by spaces:


(1 2 3)

Similarly, programs are represented by keywords and values within sets of parentheses, separated by spaces. Here’s an example of a function that calculates the Fibonacci sequence written in Scheme:


(define (fib n)
    (cond
      ((= n 0) 0)
      ((= n 1) 1)
      (else
        (+ (fib (- n 1))
           (fib (- n 2))))))

If we view this code through a homoiconic lens, we can see that it’s really just a set of nested lists. At its highest level, we’re looking at a list of three elements. The first element is the keyword define, while the second and third arguments are new lists.

This code is data, and this data is code.

Going deeper down the rabbit hole, we could write code (read: data) that takes code (read: data) as an argument and outputs new code (read: data). This type of function would be referred to as a macro.

Not only does homoiconicity give us powerful metaprogramming tools, but it’s also sublimely beautiful.

Is Elixir Homoiconic?

The Elixir programming language is not homiconic. Elixir programs aren’t written using data structures from the language itself. That being said, Elixir does have an incredibly powerful macro system that gives us many of the benefits of a truly homoiconic language.

Macros operate on Elixir’s abstract syntax tree (AST), which is basically a data structure that represents the structure of a given piece of Elixir code.

To visualize that idea, here’s a simple piece of Elixir code followed by its AST equivalent:


if (foo) do
  bar
end

{:if, [context: Elixir, import: Kernel],
 [{:foo, [], Elixir}, [do: {:bar, [], Elixir}]]}

Much of Elixir’s syntax is actually constructed with macros that operate directly on these ASTs. In fact, if itself is a macro and is replaced at compile-time with a case statement!

We can generate an AST for any piece of Elixir code using quote:


ast = quote do
  if (foo) do
    bar
  end
end

We can then use Macro.to_string to convert our AST back into printable code:


ast
|> Macro.to_string
|> IO.puts

This would result in our original if statement being printed to the console.

If Elixir Were Homoiconic…

If Elixir were homoiconic, we would essentially be writing these abstract syntax trees by hand, bypassing the lexing and parsing phase of Elixir compilation.

Let’s quickly break down Elixir’s AST structure so we can better understand what we would be writing.

Elixir ASTs, unlike Lisp programs which are composed of nested lists, are composed of nested tuples. Each tuple contains three parts: the name of the function being called, any necessary metadata related to the function call, any any arguments being passed into that function.


{:if, [context: Elixir, import: Kernel],
 [{:foo, [], Elixir}, [do: {:bar, [], Elixir}]]}

Using our previous example of an if statement, we can see that the first tuple is calling the :if function with two arguments: {:foo, [], Elixir}, and [do: {:bar, [], Elixir}].

This type of representation of an Elixir program is very similar to a Lisp, because a Lisp is essentially a textual representation of a program’s AST!


Using this newfound way of writing Elixir code, let’s write a basic GenServer module:


{:defmodule, [],
 [{:__aliases__, [], [:Stack]},
  [do: {:__block__, [],
    [{:use, [],
      [{:__aliases__, [], [:GenServer]}]},
     {:def, [],
      [{:handle_call, [],
        [:pop, {:_from, [], Elixir},
         [{:|, [],
           [{:h, [], Elixir},
            {:t, [], Elixir}]}]]},
       [do: {:{}, [],
         [:reply, {:h, [], Elixir},
          {:t, [], Elixir}]}]]},
     {:def, [],
      [{:handle_cast, [],
        [{:push, {:item, [], Elixir}}, {:state, [], Elixir}]},
       [do: {:noreply,
         [{:|, [], [{:item, [], Elixir}, {:state, [], Elixir}]}]}]]}]}]]}

Beautiful, isn’t it? No, I guess not.

In case you can’t grok what’s going on in the above code, it’s simply the basic implementation of a stack using GenServer as described by the Elixir documentation:


defmodule Stack do
  use GenServer

  def handle_call(:pop, _from, [h | t]) do
    {:reply, h, t}
  end

  def handle_cast({:push, item}, state) do
    {:noreply, [item | state]}
  end
end

It turns out that vanilla Elixir syntax is much easier to understand than our homoiconic representation.

Final Thoughts

If this has shown us anything, it’s that homoiconicity is something special.

It takes considerable upfront design work on the behalf of a language designer to create a homoiconic language that’s pleasant to use.

That being said, Elixir’s built-in macro system lets us take advantage of many of the benefits of a truly homoiconic language, while still giving us a syntax that is easy to use and understand.

Offline GraphQL Mutations with Redux Offline and Apollo

Written by Pete Corey on Jul 31, 2017.

Last week we started a deep dive into adding offline support to a React application using a GraphQL data layer powered by Apollo.

Thanks to out of the box features provided by Apollo Client, and a little extra help provided by Redux Offline and Redux Persist, we’ve managed to get our Apollo queries persisting through page loads and network disruptions.

Now let’s turn our attention to mutations.

How do we handle Apollo mutations made while our client is disconnected from the server? How do we store data and mutations locally and later sync those changes to the server once we regain connectivity?

Defining the Problem

Last week, we dealt with mostly infrastructure-level changes to add offline support for our Apollo queries. Adding support for offline mutations requires a more hands-on approach.

To help explain, let’s pretend that we’re building a survey application. After a user has filled out the questions in a survey, they can submit the survey to the server through a completeSurvey mutation:


mutation completeSurvey($surveyId: ID!, $answers: [String]) {
    completeSurvey(surveyId: $surveyId, answers: $answers) {
        _id
        answers
        completedAt
    }
}

We’re passing this mutation into a component and calling it as you would any other mutation in an Apollo-based application:


onCompleteSurvey = () => {
    let surveyId = this.props.data.survey._id;
    let answers = this.state.answers;
    this.props.completeSurvey(surveyId, answers);
};

export default graphql(gql`
    ...
`, {
    props: ({ mutate }) => ({
        completeSurvey: (surveyId, answers) => mutate({
            variables: { surveyId, answers } 
        })
    })
})(Survey);

Unfortunately, this mutation will fail if the client attempts to submit their survey while disconnected from the server.

To make matters worth, we can’t even capture these failures by listening for a APOLLO_MUTATION_ERROR action in a custom reducer. Network-level errors are swallowed before an APOLLO_MUTATION_INIT is fired and results in an exception thrown by your mutation’s promise.

This is a problem.

Defining Success

Now that we’ve defined our problem, let’s try to define what a solution to this problem would look like.

“Offline support” is an amorphous blob of features, held together by a fuzzy notion of what the system “should do” while offline, and torn apart by what’s “actually possible”. What does it mean for our mutations to support network disruptions? Getting down to the details, what exactly should happen in our application when a user attempts to submit a survey while offline?

In our situation, it would be nice to mark these surveys as “pending” on the client. Once the user reconnects to the server, any pending surveys should automatically be completed in order via completeSurvey mutations. In the meantime, we could use this “pending” status to indicate the situation to the user in a friendly and meaningful way.

Now that we know what a successful offline solution looks like, let’s build it!

Enter Redux Offline

When it came to supporting offline queries, Redux Offline largely worked under the hood. None of the components within our application needed any modifications to support offline querying.

Unfortunately, that’s not the case with offline mutations.

To support offline mutations through Redux Offline, we’ll need to wrap all of our mutations in plain old Redux actions. These actions should define a meta field that Redux Offline uses to reconcile the mutations with the server, once reconnected.

Let’s add offline support to our completeSurvey mutation.


First, we’ll set up the Redux action and an action creator that we’ll use to complete our survey:


export const COMPLETE_SURVEY = 'COMPLETE_SURVEY';

export const completeSurvey = (survey, answers) => {
    const mutation = gql`
        mutation completeSurvey($surveyId: ID!, $answers: [String]) {
            completeSurvey(surveyId: $surveyId, answers: $answers) {
                _id
                answers
                completedAt
            }
        }
    `;
    return {
        type: COMPLETE_SURVEY,
        payload: { ...survey },
        meta: {
            offline: {
                effect: { mutation, variables: { surveyId: survey._id, answers } }
            }
        }
    };
};

The offline effect of this action contains our completeSurvey Apollo mutation, along with the surveyId and answers variables needed to populate the mutation.


To tell Redux Offline how to handle this offline effect object, we’ll need to add an effect callback to the Redux Offline configuration we previously defined in our store:


offline({
    ...config,
    ...,
    effect: (effect, action) => {
        return client.mutate({ ...effect }).then(({ data }) => data);
    }
})

At this point, we’ve instructed Redux Offline to manually trigger an Apollo mutation whenever we dispatch an action with an offline effect.

If Redux Offline detects that the client is disconnected from the server, it will throw this effect into a queue to be retried later.

Let’s refactor our Survey component to use our new action!


Now that our COMPLETE_SURVEY action is completed, we’ll inject a dispatcher for our new action into our React component and use it to replace our direct call to the completeSurvey mutation:


onCompleteSurvey = () => {
    let surveyId = this.props.data.survey._id;
    let answers = this.state.answers;
    this.props.completeSurvey(surveyId, answers);
};

export default connect(null, { completeSurvey })(Survey);

Now, instead of manually triggering the completeSurvey mutation through Apollo, we’re dispatching our COMPLETE_SURVEY action, which contains all of the information needed for Redux Offline to trigger our mutation either now or at some point in the future.


Once dispatched, if Redux Offline detects an active connection to the server, it will immediately carry out the effect associated with our COMPLETE_SURVEY action. This would kick off our completeSurvey mutation, and all would be right in the world.

However, if Redux Offline detects that it’s disconnected from the server, it stores the action in a queue it refers to as its “outbox” (offline.outbox in the Redux store). Once we reconnect to the server, Redux Offline works through its outbox in order, carrying out each queued mutation.

Now that we’ve refactored our survey application to manage submissions through an action managed by Redux Offline, we’ve successfully added partial offline support to our application.

Now we need to indicate what’s happening to the user.

Managing Offline Data Locally

In order to inform the user that their survey is in a “pending” state, we’ll need to store these updated surveys somewhere on the client until their status is reconciled with the server.

Our first instinct might be to keep our updated surveys where we’re keeping the rest of our application’s data: in Apollo’s store. This would let us neatly retrieve our data through Apollo queries! Unfortunately, it’s very difficult to directly and arbitrarily update the contents of our Apollo store.

Instead, let’s store the surveys in a new section of our Redux store (under the surveys key) that lives along side our Apollo store.

Creating and populating our new store with pending surveys is actually incredibly easy. First things first, let’s create a new reducer that listens for our COMPLETE_SURVEY action and stores the corresponding survey in our new store:


export default (state = [], action) => {
    switch (action.type) {
        case COMPLETE_SURVEY:
            return [...state, action.payload];
        default:
            return state;
    }
};

If you remember our action creator function, you’ll remember that the payload field of our COMPLETE_SURVEY action contains the entire survey object. When we handle the COMPLETE_SURVEY action, we simple concatenate that survey into our list of already completed surveys.

Next, we’ll need to wire this new reducer into our Redux store:


import SurveyReducer from "...";

export const store = createStore(
    combineReducers({
        surveys: SurveyReducer,
        ...
    }),
    ...

Perfect. Now every survey submitted while offline will be added to the surveys array living in our Redux store.

Displaying Pending Surveys

We can display these pending surveys to our users by subscribing to the surveys field of our Redux store in our React components:


export default connect(state => {
    return {
        pendingSurveys: state.survey
    };
})(PendingSurveyList);

Now we can access this tangential list of pendingSurveys from our components props and render them just as we would any other survey in our application:


render() {
    let pending = this.props.pendingSurveys;
    return (
        

Pending Surveys:

{pending.map(survey => <Survey survey={survey}/>)}
); }

When we render this component, our users will see their list of pending surveys that have yet to be submitted to the server.

Great! 🎉

Manually Managing Survey State

Unfortunately, there’s a problem with this solution. Even if we’re online when we submit our survey, it will be added to our surveys store and shown as a pending survey in our UI.

This makes sense.

Because we’re using the COMPLETE_SURVEY action to handle all survey completions, our action will fire on every survey we submit. These surveys pile up in our surveys list and never get removed. Because we’re persisting and rehydrating our store to localStorage, these surveys will persist even through page reloads!

We need a way to remove surveys from our surveys store once they’ve been submitted to the server.

Thankfully, Redux Offline has a mechanism for handling this.

Let’s make a new action called COMPLETE_SURVEY_COMMIT. We can instruct Redux Offline to dispatch this action once our mutation has been executed by specifying it in the commit field of the meta.offline portion of our action creator function:


meta: {
    offline: {
        effect: { mutation, variables: { surveyId: survey._id, answers } },
        commit: { type: COMPLETE_SURVEY_COMMIT, meta: { surveyId: survey._id } }
    }
}

Now we need to update our surveys reducer to remove a survey from our surveys store whenever a COMPLETE_SURVEY_COMMIT action is handled:


switch (action.type) {
    ...
    case COMPLETE_SURVEY_COMMIT:
        return _.chain(state).clone().filter(survey => survey._id !== action.meta.surveyId).value();
}

That’s it!

Now our application is adding surveys to the surveys store when they’re submitted (or marked as submitted while offline), and removing them once our completeSurvey mutation is successfully executed.

With that, we’ve achieved our definition of success.

If submitted offline, surveys will go into a “pending” state, which is visible to the user, and will eventually be synced with the server, in order, once a connection is re-established.

Credit Where Credit is Due

With a little bit of up-front planning and elbow grease, we’ve managed to add support for offline mutations to our React and Apollo-powered application. Combined with support for offline querying, we’ve managed to build out a reasonably powerful set of offline functionality!

To get a more wholistic understanding of the overall solution described here, be sure to check out Manur’s fantastic “Redux Offline Examples” project on Github. The apollo-web project, in particular, was a major inspiration for this post and an invaluable resource for adding feature rich offline support to my Apollo application.

He even includes more advanced features in his apollo-web project, such as reconciling locally generated IDs with server-generated IDs after a sync. Be sure to give the project a read through if you’re hungry for more details.

Thanks Manur, Apollo, Redux Offline, and Redux Persist!