Broccoli - Starting the Build and Eating Your Greens

This is the second part of a 6 part series on how we rebuilt the new Ember Guides from the ground up over the course of six months, converting it into an Ember app in the process. If you want to see the first part in this series check it out here and you can keep track of the posts by following the RSS feed at the top of the page.

Initial experiments

In the very early stages of the conversations about upgrading the Ember Guides to be a fully-fledged Ember app, Ryan Tablada (a.k.a @rtablada) pointed me towards an experiment that he had started to get the ball rolling. It was called broccoli-blog-api and was designed to:

translate a directory (or set of directories) of Markdown documents into a static JSONAPI

Having worked extensively with Broccoli many years ago (before ember-cli was the official build system for Ember), I thought to myself "What's the worst that could happen" and jumped straight into the code. The thing about Broccoli is that it's almost the opposite of "riding a bike" and you very quickly forget everything about it if you haven't been using it for a while 😣

Why Broccoli or JSON:API

Anyone who has been following Ember for any reasonable amount of time knows that Ember Data works great with JSONAPI, and if your backend already speaks JSON:API and follows the spec you are essentially ready to go! If you have ever needed to integrate a hand-rolled, bespoke API's endpoints with Ember Data you know that it is essentially just a process of translating things into JSON:API in Javascript before it goes into Ember Data. If you're using JSON:API upfront things are a lot easier to deal with, and you get to make use of the simplicity of Ember Data.

Broccoli is an asset pipeline that deals very effectively with the file system. It is all Just Javascript™️, so it is in theory quite easy to work with. One of the issues that makes Broccoli more challenging to work with is the lack of documentation, or at least that used to be the case. Over the last few months Oli Griffiths has been very active in the Broccoli community and has recently published a Broccoli Tutorial. There is also much work going on behind the scenes to make Broccoli more straight-forward to work with and a much more powerful tool, for example Oli is currently working on an experiment to bring Broccoli 1.x support to ember-cli which will (hopefully) make life much better for Windows developers. Jen Weber is also working on updating the ember-cli documentation so it should soon be a bit easier to know how to get started adding to ember-cli with Broccoli 🎉

Having made these original decisions, we ultimately decided to build something called broccoli-static-site-json which as you can see has very similar goals to broccoli-blog-api:

Simple Broccoli plugin that parses collections of markdown files and exposes them as JSON:API documents in the output tree, under the specified paths. It also supports the use of front-matter to define meta-data for each markdown file.

Since the early days of broccoli-static-site-json things have gotten a tiny bit more complicated (more flexibility usually means more complexity) but to understand the basics of how effective Broccoli has been for this use case we can go back and look at the files at the very first commit on the 7 Nov 2017. We are going to go into more detail below but if you want to follow along you can find the main index file here.

The main plugin

The simple early experiment of the broccoli-static-site-json had an index.js file (the only active file at the time) with a total of 119 lines of code, the main active lines making up the build() of the Broccoli plugin just adding up to 50 lines of code, which is definitely small enough for us to deep dive into in this post. 💪

I'm going to give a very brief overview of the structure of a Broccoli plugin and then go into detail of each line of the main build() function.

Structure of a Broccoli plugin

Here is a basic example of a plugin

const Plugin = require('broccoli-plugin');

class BroccoliStaticSiteJson extends Plugin {
  constructor(folder, options) {
    // tell broccoli which "nodes" we're watching
    super([folder], options);
    this.options = {
      folder,
      contentFolder: 'content',
      ...options,
    };
    // don't know what this does
    Plugin.call(this, [folder], {
      annotation: options.annotation,
    });
  }

  build() {}
}

module.exports = BroccoliStaticSiteJson;

This isn't exactly the most basic example of a plugin as it has some of the business logic and API of broccoli-static-site-json exposed. It is not 100% obvious by the above example but it is telling us that if we wanted to use this plugin we would do something like this:

const jsonTree = new StaticSiteJson('input', {
  contentFolder: 'output-jsons',
})

This is just setting the local folder and the contentFolder in the options hash for the StaticSiteJson class and will eventually be how we tell the plugin to look for Markdown files in the input folder and put the output JSON:API files in output-jsons. The contentFolder is optional and will default to content.

When this is used in ember-cli or any other Broccoli pipeline the build() function is called. This is where most of the work happens.

The build() function

Let's show the whole build function and then break it down piece by piece. Note: I've removed some things that aren't necessary for the explanation of this process like a few optional defensive programming steps, I just wanted to make this as easy to follow as possible.

build() {
  // build content folder if it doesn't exist
  if (!existsSync(join(this.outputPath, this.options.contentFolder))) {
    mkdirSync(join(this.outputPath, this.options.contentFolder));
  }

  // build pages file
  if (existsSync(join(this.options.folder, 'pages.yml'))) {
    let pages = yaml.safeLoad(readFileSync(join(this.options.folder, 'pages.yml'), 'utf8'));

    writeFileSync(join(this.outputPath, this.options.contentFolder, 'pages.json'), JSON.stringify(TableOfContentsSerializer.serialize(pages)));
  }

  // build the tree of MD files
  const paths = walkSync(this.inputPaths);

  const mdFiles = paths.filter(path => extname(path) === '.md');

  const fileData = mdFiles.map(path => ({
    path,
    content: readFileSync(join(this.options.folder, path)),
  })).map(file => ({
    path: file.path,
    ...yamlFront.loadFront(file.content),
  }));

  fileData.forEach((file) => {
    const directory = dirname(join(this.outputPath, this.options.contentFolder, file.path));
    if (!existsSync(directory)) {
      mkdirSync(dirname(join(this.outputPath, this.options.contentFolder, file.path)));
    }

    const serialized = ContentSerializer.serialize(file);

    writeFileSync(join(this.outputPath, this.options.contentFolder, `${file.path}.json`), JSON.stringify(serialized));
  });
}

This may seem a bit scary but don't worry we will break it down, and hopefully it will all become clear.

Creating the output folder

The first piece is just a bit of house-cleaning. We want to make sure the output folder exists before we continue and if it doesn't we need to create it:

// build content folder if it doesn't exist
if (!existsSync(join(this.outputPath, this.options.contentFolder))) {
  mkdirSync(join(this.outputPath, this.options.contentFolder));
}

One thing that you will notice right off the bat is that we are using functions like exitsSync(), mkdirSync() and join() which are all native NodeJS functions. You can see where they are coming from if you look at the top of the index.js file to see the require statements:

const { extname, join, dirname } = require('path');
const {
  readFileSync,
  writeFileSync,
  mkdirSync,
  existsSync,
} = require('fs');

you can read more about these functions on the official NodeJS documentation for fs and path

Creating the Table of Contents from the pages file

Before I started building broccoli-static-site-json Ricardo Mendes a.k.a. @locks and Jared Galanis had begun the process of building the Markdown sources directories that would allow us to manage different versions of the Ember Guides more effectively. One of the key aspects of this structure was that it included a pages.yml file that specified the Table of Contents (ToC) for any particular version of the Guides. What we needed to do as part of this process was to parse this YAML file and output a JSON:API based file in the output directory. Here is the code for that:

// build pages file
if (existsSync(join(this.options.folder, 'pages.yml'))) {
  let pages = yaml.safeLoad(readFileSync(join(this.options.folder, 'pages.yml'), 'utf8'));

  writeFileSync(join(this.outputPath, this.options.contentFolder, 'pages.json'), JSON.stringify(TableOfContentsSerializer.serialize(pages)));
}

This snippet first checks to see if the input folder contains a pages.yml file and if it does it loads it using js-yaml. After it loads the data it writes a serialized version of the file to the output folder, and the serialisation is done using jsonapi-serializer with the following serializer definition:

const TableOfContentsSerializer = new Serializer('page', {
  id: 'url',
  attributes: [
    'title',
    'pages',
  ],
  keyForAttribute: 'cammelcase',
});

Building the tree of Markdown files

Next up is the main event, converting a nested structure of markdown files into a nested structure of JSON:API documents. This one will be simpler to follow if we take it in bite-sized chunks, let's start with just getting the Markdown files:

const paths = walkSync(this.inputPaths);

const mdFiles = paths.filter(path => extname(path) === '.md');

This code uses walkSync to list all of the files under the inputPaths (what we passed in as the folder in the constructor), and then we filter that list of paths to find all files that end with .md so that we can find markdown files.

Next it's time to load each of those files into an array:

const fileData = mdFiles.map(path => ({
  path,
  content: readFileSync(join(this.options.folder, path)),
})).map(file => ({
  path: file.path,
  ...yamlFront.loadFront(file.content),
}));

Using Array.map() twice to convert a list of file names into a data structure that contains everything that we need. The first map converts the file names into an array of objects that looks something like this:

[{
  path: '/getting-started/index.md',
  content: `---
            title: Getting Started
            ---
            Getting started with Ember is easy. Ember projects are created ...`
}, {
  path: '/getting-started/quick-start.md',
  content: `---
            title: Quick Start
            ---
            This guide will teach you how to build a simple ...`
}]

As you can see each object remembers the path to the file that created and has the full content of the file loaded. In the second map() function we the use yaml-front-matter to load the optional extra YAML metadata into the object. You can read more about what front-matter is and what it can be used for here.

After the second map() function the fileData array looks like this:

[{
  path: '/getting-started/index.md',
  title: 'Getting Started',
  __content: 'Getting started with Ember is easy. Ember projects are created ...'
}, {
  path: '/getting-started/quick-start.md',
  title: 'Quick Start',
  __content: 'This guide will teach you how to build a simple ...'
}]

This leaves us finally ready to serialise into JSON:API. Next we need to loop over the fileData array and write our JSON files out to disk:

fileData.forEach((file) => {
  const directory = dirname(join(this.outputPath, this.options.contentFolder, file.path));
  if (!existsSync(directory)) {
    mkdirSync(dirname(join(this.outputPath, this.options.contentFolder, file.path)));
  }

  const serialized = ContentSerializer.serialize(file);

  writeFileSync(join(this.outputPath, this.options.contentFolder, `${file.path}.json`), JSON.stringify(serialized));
});

The first thing we do in this function is to make sure that the folder we want to write the file into actually exists. We need to check this on all files because we used walkSync earlier in this process and it is possible to have a very deeply nested folder structure.

Next we serialise the file object using another jsonapi-serializer and write the serialised document to disk. Here is the serialiser definition for the ContentSerializer which is only very slightly more complicated than the one for the Pages in the ToC:

const ContentSerializer = new Serializer('content', {
  id: 'path',
  attributes: [
    '__content',
    'title',
  ],
  keyForAttribute(attr) {
    switch (attr) {
      case '__content':
        return 'content';
      default:
        return attr;
    }
  },
});

in this case we use keyForAttribute() to rename __content to just be content.

Conclusion

I hope you enjoyed this deep-dive into the guts of broccoli-static-site-json. If you are interested in other places that make use of this system you can check out Ember Casper Template which also happens to be what is powering this Blog 🎉

As always you can reach out to me on Twitter, or you can find me on the Ember Community Slack as @real_ate