Rewriting History

Written by Pete Corey on Sep 12, 2016.

If you’ve been following our blog, you’ll notice that we’ve been writing lots of what we’re calling “literate commit” posts.

The goal of a literate commit style post is to break down each Git commit into a readable, clear explanation of the code change. The idea is that this chronological narrative helps tell the story of how a piece of software came into being.

Combined with tools like git blame and git log you can even generate detailed histories for small, focused sections of the codebase.

But sometimes generating repositories with this level of historical narrative requires something that most Git users warn against: rewriting history.

Why Change the Past

It’s usually considered bad practice to modify a project’s revision history, and in most cases this is true. However, there are certain situations where changing history is the right thing to do.

In our case, the main artifact of each literate commit project is not the software itself; it’s the revision history. The project serves as a lesson or tutorial.

In this situation, it might make sense to revise a commit message for clarity. Maybe we want to break a single, large commit into two separate commits so that each describes a smaller piece of history. Or, maybe while we’re developing we discover a small change that should have been included in a previous commit. Rather than making an “Oops, I should have done this earlier” commit, we can just change our revision history and include the change in the original commit.

It’s important to note that in these situations, it’s assumed that only one person will be working with the repository. If multiple people are contributing, editing revision history is not advised.

In The Beginning…

Imagine that we have some boilerplate that we use as a base for all of our projects. Being good developers, we keep track of its revision history using Git, and possibly host it on an external service like GitHub.

Starting a new project with this base might look something like this:


mkdir my_project
cd my_project
git clone https://github.com/pcorey/base .
git remote remove origin
git remote add origin https://github.com/pcorey/my_project

We’ve cloned base into the my_project directory, removed it’s origin pointer to the base repository, and replaced it with a pointer to a new my_project repository.

Great, but we’re still stuck with whatever commits existed in the base project before we cloned it into my_project. Those commits most likely don’t contribute to the narrative of this specific project and should be changed.

One solution to this problem is to clobber the Git history by removing the .git folder, but this is the nuclear option. There are easier ways of accomplishing our goal.

The --root flag of the git rebase command lets us revise every commit in our project, including the root commit. This means that we can interactively rebase and reword the root commits created in the base project:


git rebase -i --root master

reword f784c6a First commit
# Rebase f784c6a onto 5d85358 (1 command(s))

Using reword tells Git that we’d like to use the commit, but we want to modify its commit message. In our case, we might want to explain the project we’re starting and discuss the base set of files we pulled into the repository.

Splicing in a Commit

Next, let’s imaging that our project has three commits. The first commit sets up our project’s boilerplate. The second commit adds a file called foo.js, and the third commit updates that file:


git log --online

1d5f372 Updated foo.js
873641e Added foo.js
b3065c9 Project setup

What if we forgot to create a file called bar.js after we created foo.js. For maximum clarity, we want this file to be created in a new commit following 873641e. How would we do it?

Once again, interactive rebase comes to the rescue. While doing a root rebase, we can mark 873641e as needing editing:


git rebase -i --root master

pick b3065c9 Project setup
edit 873641e Added foo.js
pick 1d5f372 Updated foo.js

After rebasing, our Git HEAD will point to 873641e. Our git log looks like this:


git log --online

1d5f372 Updated foo.js
873641e Added foo.js

We can now add bar.js and commit the change:


touch bar.js
git add bar.js
git commit -am "Added bar.js"

Reviewing our log, we’ll see a new commit following 873641e:


git log --online

58f31fd Added bar.js
41817a4 Added foo.js
81df941 Project setup

Everything looks good. Now we can continue our rebase and check out our final revision history:


git rebase --continue
git log --oneline

b8b7b18 Updated foo.js
58f31fd Added bar.js
41817a4 Added foo.js
81df941 Project setup

We’ve successfully injected a commit into our revision history!

Revising a Commit

What if we notice a typo in our project that was introduced by our boilerplate? We don’t want to randomly include a typo fix in our Git history; that will distract from the overall narrative. How would we fix this situation?

Once again, we’ll harness the power of our interactive root rebase!


git rebase -i --root master

edit b3065c9 Project setup
pick 873641e Added foo.js
pick 1d5f372 Updated foo.js

After starting the rebase, our HEAD will point to the first commit, b3065c9. From there, we can fix our typo, and then amend the commit:


vim README.md
git add README.md
git commit --amend

Our HEAD is still pointing to the first commit, but now our fixed typo is included in the set of changes!

We can continue our rebase and go about our business, pretending that the typo never existed.


git rebase --continue

With Great Power

Remember young Time Lord, with great power comes great responsibility.

Tampering with revision history can lead to serious losses for your project if done incorrectly. It’s recommended that you practice any changes you plan to make in another branch before attempting them in master. Another fallback is to reset hard to origin/master if all goes wrong:


git reset --hard origin/master

While changing history can be dangerous, it’s a very useful skill to have. When you want your history to be the main artifact of your work, it pays to ensure it’s as polished and perfected as possible.

Phoenix Todos - The User Model

This post is written as a set of Literate Commits. The goal of this style is to show you how this program came together from beginning to end.

Each commit in the project is represented by a section of the article. Click each section's header to see the commit on Github, or check out the repository and follow along.

Written by Pete Corey on Sep 7, 2016.

Create Users Table

Let’s focus on adding users and authorization to our Todos application. The first thing we’ll need is to create a database table to hold our users and a corresponding users schema.

Thankfully, Phoenix comes with many generators that ease the process of creating things like migrations and models.

To generate our users migration, we’ll run the following mix command:


mix phoenix.gen.model User users email:string encrypted_password:string

We’ll modify the migration file the generator created for us and add NOT NULL restrictions on both the email and encrypted_password fields:


add :email, :string, null: false
add :encrypted_password, :string, null: false

We’ll also add an index on the email field for faster queries:


create unique_index(:users, [:email])

Great! Now we can run that migration with the mix ecto.migrate command.

priv/repo/migrations/20160901141548_create_user.exs

+defmodule PhoenixTodos.Repo.Migrations.CreateUser do + use Ecto.Migration + + def change do + create table(:users) do + add :email, :string, null: false + add :encrypted_password, :string, null: false + + timestamps + end + + create unique_index(:users, [:email]) + end +end

Creating the Users Model

Now that we’re created our users table, we need to create a corresponding User model. Phoenix actually did most of the heavy lifting for us when we ran the mix phoenix.gen.model command.

If we look in /web/models, we’ll find a user.ex file that holds our new User model. While the defaults generated for us are very good, we’ll need to make a few tweaks.

In addition to the :email and :encrypted_password fields, we’ll also need a virtual :password field.


field :password, :string, virtual: true

:password is virtual because it will be required by our changeset function, but will not be stored in the database.

Speaking of required fields, we’ll need to update our @required_fields and @optional_fields attributes to reflect the changes we’ve made:


@required_fields ~w(email password)
@optional_fields ~w(encrypted_password)

These changes to @required_fields break our auto-generated tests against the User model. We’ll need to update the @valid_attrs attribute in test/models/user_test.ex and replace :encrypted_password with :password:


@valid_attrs %{email: "user@example.com", password: "password"}

And with that, our tests flip back to green!

test/models/user_test.exs

+defmodule PhoenixTodos.UserTest do + use PhoenixTodos.ModelCase + + alias PhoenixTodos.User + + @valid_attrs %{email: "user@example.com", password: "password"} + @invalid_attrs %{} + + test "changeset with valid attributes" do + changeset = User.changeset(%User{}, @valid_attrs) + assert changeset.valid? + end + + test "changeset with invalid attributes" do + changeset = User.changeset(%User{}, @invalid_attrs) + refute changeset.valid? + end +end

web/models/user.ex

+defmodule PhoenixTodos.User do + use PhoenixTodos.Web, :model + + schema "users" do + field :email, :string + field :password, :string, virtual: true + field :encrypted_password, :string + + timestamps + end + + @required_fields ~w(email password) + @optional_fields ~w(encrypted_password) + + @doc """ + Creates a changeset based on the `model` and `params`. + + If no params are provided, an invalid changeset is returned + with no validation performed. + """ + def changeset(model, params \\ :empty) do + model + |> cast(params, @required_fields, @optional_fields) + end +end

Additional Validation

While the default required/optional field validation is a good start, we know that we’ll need additional validations on our User models.

For example, we don’t want to accept email addresses without the "@" symbol. We can write a test for this in our UserTest module:


test "changeset with invalid email" do
  changeset = User.changeset(%User{}, %{
    email: "no_at_symbol",
    password: "password"
  })
  refute changeset.valid?
end

Initially this test fails, but we can quickly make it pass by adding basic regex validation to the :email field in our User.changeset function:


|> validate_format(:email, ~r/@/)

We can repeat this process for all of the additional validation we need, like checking password length, and asserting email uniqueness.

test/models/user_test.exs

... - alias PhoenixTodos.User + alias PhoenixTodos.{User, Repo} ... end + + test "changeset with invalid email" do + changeset = User.changeset(%User{}, %{ + email: "no_at_symbol", + password: "password" + }) + refute changeset.valid? + end + + test "changeset with short password" do + changeset = User.changeset(%User{}, %{ + email: "email@example.com", + password: "pass" + }) + refute changeset.valid? + end + + test "changeset with non-unique email" do + User.changeset(%User{}, %{ + email: "email@example.com", + password: "password", + encrypted_password: "encrypted" + }) + |> Repo.insert! + + assert {:error, _} = User.changeset(%User{}, %{ + email: "email@example.com", + password: "password", + encrypted_password: "encrypted" + }) + |> Repo.insert + end end

web/models/user.ex

... |> cast(params, @required_fields, @optional_fields) + |> validate_format(:email, ~r/@/) + |> validate_length(:password, min: 5) + |> unique_constraint(:email, message: "Email taken") end

Hashing Our Password

You might have noticed that we had to manually set values for the encrypted_password field for our "changeset with non-unique email" test to run. This was to prevent the database from complaining about a non-null constraint violation.

Let’s remove those lines from our test and generate the password hash ourselves!

:encrypted_password was an unfortunate variable name choice. Our password is not being encrypted and stored in the database; that would be insecure. Instead we're storing the hash of the password.

We’ll use the comeonin package to hash our passwords, so we’ll add it as a dependency and an application in mix.exs:


def application do
  [...
   applications: [..., :comeonin]]
end

defp deps do
  [...
   {:comeonin, "~> 2.0"}]
end

Now we can write a private method that will update the our :encrypted_password field on our User model if its given a valid changeset that’s updating the value of :password:


defp put_encrypted_password(changeset = %Ecto.Changeset{
  valid?: true,
  changes: %{password: password}
}) do
  changeset
  |> put_change(:encrypted_password, Comeonin.Bcrypt.hashpwsalt(password))
end

We’ll use pattern matching to handle the cases where a changeset is either invalid, or not updating the :password field:


defp put_encrypted_password(changeset), do: changeset

Isn’t that pretty? And with that, our tests are passing once again.

mix.exs

... applications: [:phoenix, :phoenix_html, :cowboy, :logger, :gettext, - :phoenix_ecto, :postgrex]] + :phoenix_ecto, :postgrex, :comeonin]] end ... {:cowboy, "~> 1.0"}, - {:mix_test_watch, "~> 0.2", only: :dev}] + {:mix_test_watch, "~> 0.2", only: :dev}, + {:comeonin, "~> 2.0"}] end

mix.lock

-%{"connection": {:hex, :connection, "1.0.4"}, +%{"comeonin": {:hex, :comeonin, "2.5.2"}, + "connection": {:hex, :connection, "1.0.4"}, "cowboy": {:hex, :cowboy, "1.0.4"},

test/models/user_test.exs

... email: "email@example.com", - password: "password", - encrypted_password: "encrypted" + password: "password" }) ... email: "email@example.com", - password: "password", - encrypted_password: "encrypted" + password: "password" })

web/models/user.ex

... |> unique_constraint(:email, message: "Email taken") + |> put_encrypted_password end + + defp put_encrypted_password(changeset = %Ecto.Changeset{ + valid?: true, + changes: %{password: password} + }) do + changeset + |> put_change(:encrypted_password, Comeonin.Bcrypt.hashpwsalt(password)) + end + defp put_encrypted_password(changeset), do: changeset end

Final Thoughts

Things are starting to look very different from our original Meteor application. While Meteor tends to hide complexity from application developers by withholding code in the framework itself, Phoenix expects developers to write much of this boilerplate code themselves.

While Meteor’s methodology lets developers get off the ground quickly, Phoenix’s philosophy of hiding nothing ensures that there’s no magic in the air. Everything works just as you would expect; it’s all right in front of you!

Additionally, Phoenix generators ease most of the burden of creating this boilerplate code.

Now that our User model is in place, we’re in prime position to wire up our front-end authorization components. Check back next week to see those updates!

Querying Non-Existent MongoDB Fields

Written by Pete Corey on Sep 5, 2016.

We were recently contacted by one of our readers asking about a security vulnerability in one of their Meteor applications.

They noticed that when they weren’t authenticated, they were able to pull down a large number of documents from a collection through a publication they thought was protected.

The Vulnerability

The publication in question looked something like this:


Meteor.publish("documents", function() {
  return Documents.find({ 
    $or: [
      { userId: this.userId },
      { sharedWith: this.userId }
    ]
  });
});

When an unauthenticated user subscribes to this publication, their userId would be undefined, or null when it’s translated into a MongoDB query.

This means that the query passed into Documents.find would look something like this:


{
  $or: [
    { userId: null },
    { sharedWith: null }
  ]
}

The $or query operator means that if either of these sub-queries match a document, that document will be returned to the client.

The first sub-query, { userId: null }, mostly likely won’t match any documents. It’s unlikely that a Document will be created without a userId, so there will be no Documents with a null or nonexistent userId field.

The second sub-query is more interesting. Due to a quirk with how MongoDB handles null equality queries, the { sharedWith: null } sub-query will return all documents who’s sharedWith field is either null, or unset. This means the query will return all unshared documents.

An un-authenticated user subscribing to the "documents" publication would be able to view huge amounts of private data.

This is a bad thing.

Fixing the Query

There are several ways we can fix this publication. The most straight-forward is to simply deny unauthenticated users access:


Meteor.publish("documents", function() {
  if (!this.userId) {
    throw new Meteor.Error("permission-denied");
  }
  ...
});

Another fix would be to conditionally append the sharedWith sub-query if the user is authenticated:


Meteor.publish("documents", function() {
  let query = {
    $or: [{ userId: this.userId }]
  };
  if (this.userId) {
    query.$or.push({ sharedWith: this.userId });
  }
  return Documents.find(query);
});

This will only add the { sharedWith: this.userId } sub-query if this.userId resolves to a truthy value.

Final Thoughts

This vulnerability represents a larger misunderstanding about MongoDB queries in general. Queries searching for a field equal to null will match on all documents who’s field in question equals null, or who don’t have a value for this field.

For example, this query: { imaginaryField: null } will match on all documents in a collection, unless they have a value in the imaginaryField field that is not equal to null.

This is a very subtle, but very dangerous edge case when it comes to writing MongoDB queries. Be sure to keep it in mind when designing the MongoDB queries used in your Meteor publications and methods.