Sorting By Ownership With MongoDB

Written by Pete Corey on Nov 16, 2015.

I sometimes find myself coming up against constraints or limitations imposed upon my software either through the tools that I’m using, or by a limited understanding of how to use those tools. In these situations we’re always given two options:

  1. Bend or reshape your solution to fit the constraint
  2. Maintain your design and overcome the limitation

A perfect example of this would be something as seemingly simple as sorting a collection of documents by ownership using MongoDB.

Let’s say we have a huge collection of documents in our database. An example document would look something like this:

{
  ownerId: "XuwWcLue9zom8DqEA",
  name: "Foo"
  ...
}

Each document is owned by a particular user (denoted by the ownerId field). On the front-end, we want to populate a table with these documents. The current user’s documents should appear first, secondarily sorted by the document’s name field, and all other documents should follow, sorted by their name.

Sorting by Ownership is Hard

There are a couple things going on here that make this a difficult problem. First thing’s first, “ownership” is a computed value. You can’t determine if a document belongs to a user until you receive some input from the user; specifically their ID.

Unfortunately, while there are tools that let us attach computed values to our documents, we can’t search or sort on those fields at a database level. This also means that we can’t paginate our data off of those calculated fields.

The second issue is the size of our imaginary collection. If our collection were smaller, we could just pull everything into memory and (painfully) sort the documents ourselves:

Collection.find({}).sort(function(a, b) {
  if (a.ownerId === Meteor.userId()) {
    if (b.ownerId === Meteor.userId()) {
      return a.name < b.name ? -1 :
             a.name == b.name ? 0 : 1;
    }
    else {
      return -1;
    }
  }
  else if (b.ownerId === Meteor.userId()) {
    return 1;
  }
  else {
    return a.name < b.name ? -1 :
           a.name == b.name ? 0 : 1;
  }
});

Unfortunately, we have a very large number of documents, so pulling them all down into memory at once is unfeasible. This means that we need to sort and paginate our data in the database. See issue #1.

This leaves us with two options as application developers:

  1. Change our application design to better fall in line with the restrictions MongoDB imposes upon us. For example, we could show two separate tables - one of documents we own sorted by name, and another of documents we don’t own sorted by name.
  2. Fight back!

Let’s choose option #2.

Encoding Ownership In The Document

The fundamental problem that we’re facing here is that everything we want to sort on needs to live on the document we’re sorting. This means that if we want to sort on ownership, ownership for each user needs to be encoded into each document. This can be a little mind-bending to consider.

At first, you may be thinking that ownership is already encoded through the ownerId field. Unfortunately, ownerId only tells us the owner’s ID, not whether the current user’s ID matches that ID. We need to somehow store that calculation on the document to be able to use it in an actionable way.

One way to do this is to create a field on the document when it’s created. The value of this field is the owner’s ID. Within that field we store a simple object that holds an ownership flag:

{
  …
  "XuwWcLue9zom8DqEA": {
    "owner": 1
  }
}

This object can be inserted into each document automatically using a variety of hooking or data management techniques. Here’s how you would implement it if you were using matb33:collection-hooks:

Documents.before.insert(function(userId, doc) {
  doc[userId] = {
    owner: 1
  };
  return doc;
});

This seems a little unconventional, but it opens up the path to our goal: sorting by ownership. Check out how we would construct our sorting query:


var sort = [
  [this.userId + ".owner", -1],
  ["name", 1]
];

Documents.find({
  ...
}, {
  sort: sort,
  ...
});

Using this query, all documents we own will be returned first, sorted by their name, followed by all documents we don’t own, sorted by their name. Victory!

Don’t Pollute the Document

There is a downside to the above approach.

By encoding the ownership calculation into the document itself, we’re polluting the document. This new nested object has no real purpose, other than to get around a technical limitation, and in many ways is just a duplication of the information held by ownerId.

A better solution would give us this same functionality without polluting the document. Thankfully, we can leverage the power of MongoDB aggregations to accomplish just that.

Our aggregation will operate in two steps. The first step will be to calculate the ownership flag and add it to each document we’re sorting. The second step is to sort our documents, first by this ownership flag and next by the document’s name.

We’ll use the $cond operator to calculate a new owned flag on each document by comparing the value of ownerId to the current user’s ID (which is passed into our aggregation). This calculated value is set on each returned document during the projection stage of our aggregation pipeline. Check it out:

Documents.aggregate([
    {
        $project: {
            owned: {$cond: [{$eq: ["$ownerId", this.userId]}, 1, 0]},
            name: "$name"
            ...
        }
    },
    {
        $sort: {
            owned: -1,
  name: 1
        }
    }
]);

We’re using Mongo’s aggregation framework within our Meteor application using the meteorhacks:aggregation package. Be sure to check out Josh Owen’s great article about using meteorhacks:aggregation to power your publications.

By building the owned field on the fly in our aggregation, we get all of the benefits of encoding our ownership information into the document, with none of the downsides of permanently polluting the document with this information.

Don’t Let the Tool Use You

Every tool we use comes with a certain set of limitations and constraints. Sometimes these constraints exists for very good reasons, and trying to work around them can lead to very serious performance issues or security vulnerabilities. Other times, these constraints are just limitations of the technologies we’re using, or limitations in our understanding.

Originally, we thought MongoDB was the problem. By exploring alternative solutions and building a deeper understanding of the tool, we realized that we could use MongoDB to solve the problem!

When you’re facing limitations imposed by your tools, don’t immediately concede. Always try to understand why the limitation exists, and how you can (or can’t) overcome it.

Why I Can't Wait For ES6 Proxies

Written by Pete Corey on Nov 9, 2015.

Full ES6 support is just around the corner. In fact, nearly all of ES6 is available to us through compilers like Babel that transpile ES6 syntax into ES5 code. Unfortunately, one of the ES6 features I’m most excited about can’t be implemented in ES5. What feature is that? Proxies, of course!

Proxies make some incredibly exciting things possible. Imagine a Meteor method like the one below:

Meteor.methods({
  foo: function(bar) {
    return Bars.remove(bar._id);
  }
});

As I’ve talked about in the past, this method exposes our application to a serious security vulnerability. A user can pass in an arbitrary MongoDB query object in the _id field of bar like this:

Meteor.call("foo", {_id: {$gte: ""}});

This would delete all of the documents from our Bars collection. Uh oh! Imagine if we could automatically detect and prevent that from happening, and instead throw an exception that tells the client:

Meteor.Error: Tried to access unsafe field: _id

Our _id field would be accessible only after we check it:

Meteor.methods({
  foo: function(bar) {
    check(bar, {
      _id: String
    });
    return Bars.remove(bar._id);
  }
});

Any attempts to access a field on a user-provided object will throw an exception unless it’s been explicitly checked for safety. If this were possible, it could be used to prevent entire categories of security vulnerabilities!

With proxies, we can make this happen.

What is a Proxy?

An ES6 Proxy is basically a middleman between an object, and the code trying to access that object. When we wrap an object with a proxy, we can oversee (and interfere with) every action taken on that object.

Proxies do this overseeing through “traps”. A trap is just a callback that’s called whenever a certain action is taken on the proxy object. For example, a get trap is triggered any time a piece of code tries to get the value of a field on the proxy. Likewise, a set trap is triggered any time you try to set the value of a field.

In the above example, our proxy sees that we’re trying to access _id on the bar object, but because it knows that check hasn’t been called on that field yet, it throws an exception. If we had checked the field, the proxy would have let _id’s value pass through.

A rough sketch of this kind of proxy would look something like this:

CheckProxy = {
  get: function(target, field) {
    if (!target ||
        !target.__checked ||
        !target.__checked[field]) {
      throw new Error("Tried to access unsafe field: " + field);
    }
    return target[field];
  }
};

But how does the proxy know when a field has been checked? We have to explicitly tell the proxy that each field has been checked after we’ve determined that it’s safe to use. One way to do this is through a custom set trap:

CheckProxy = {
  ...
  set: function(target, field, value) {
    if (field == "__checked") {
      if (!target.__checked) {
        target.__checked = {};
      }
      target.__checked[value] = true;
    }
    else {
      target[field] = value;
    }
    return true;
  }
};

If we wanted to use our proxy as-is, there would be a good amount of manual work involved. We’d have to instantiate a new proxy object for each one of our object arguments, and then explicitly notify the proxy after each check:

Meteor.methods({
  foo: function(bar) {
    bar = new Proxy(bar, CheckProxy);
    check(bar, {
      _id: String
    });
    bar.__checked = "_id";
    return Bars.remove(bar._id);
  }
});

This is too much work! It wouldn’t take long to lose diligence and fall back to not checking arguments at all.

Thankfully, we can hide all of this manual work through the magic of monkey patching.

The first thing we’ll do is patch our check method to tell our proxy whenever we check a field on an object:

_check = check;
check = function(object, fields) {
  if (object instanceof Object) {
    Object.keys(fields).forEach(function(field) {
      object.__checked = field;
    });
  }
  _check.apply(this, arguments);
};

Next, we just have to patch Meteor.methods to automatically wrap each Object argument in a proxy:

_methods = Meteor.methods;
Meteor.methods = function(methods) {
  _.each(methods, function(method, name, obj) {
    obj[name] = function() {
      _.each(arguments, function(value, key, obj) {
        if (value instanceof Object) {
          obj[key] = new Proxy(value, CheckProxy);
        }
        else {
          obj[key] = value;
        }
      });
      method.apply(this, arguments);
    };
  });
  _methods.apply(this, arguments);
};

Whew, this is getting dense!

Thankfully, that’s all the patching we have to do. Now, we can revert back to our original method and still reap all of the benefits of automatic check enforcement for all object fields throughout all of our Meteor methods.

Shortcomings

ES6 Proxies are currently only supported in Firefox, which means that what I described above currently isn’t possible. Until proxy support comes to V8, Node.js, and finally Meteor, all we can do is wait and dream.

The implementation I described here is fairly unsophisticated. It only works when accessing fields within the first layer of an object. It also pollutes the provided object with a __checked field, which may wreak inadvertent havoc. In future versions of this idea, both of these issues could easily be solved.

I hope this post has given you a taste of the awesome power of proxies. Fire up your Firefox console and start experimenting!

Meteor Space Camp

Written by Pete Corey on Nov 2, 2015.

Late last month I had the chance to fulfill a childhood dream; I went to Space Camp! No, not that kind of space camp… I went to Meteor Space Camp!

Thanks to the hard work of Josh Owens and a handful of very generous sponsors and organizers, I had the opportunity to spend a weekend with fifty other passionate Meteorites in a beautiful cabin nestled in the Great Smoky Mountains.

When it came time for talks, I used the opportunity to discuss something near and dear to my heart - software security! I gave a quick presentation on the importance of always checking your arguments in your Meteor applications. We looked at example methods, publications, and collection validators and dove into how they could be exploited by malicious users. I capped things off with a quick demo (that didn’t completely go as planned) to make things more real.

I’ll be sure to post a video of the talk once it’s available, but in the meantime check out the slides for a quick teaser.

Being able to put faces to people I’ve met in the community was an invaluable experience. I feel like we solidified some real friendships, and made great new connections with awesome people in the community. The value of meeting people face to face and talking about something you love really can’t be underrated. If you didn’t make it out to Space Camp this year, read Katie Reed’s fantastic write-up to get a taste of what you missed.