Skip to content

Leveraging Astro's Content Collections

Astro’s content-focused approach to building websites got a major improvement with their v2 release. If you’re not familiar with Astro, it is web framework geared towards helping developers create content-rich websites that are highly performant. They enable developers to use their favorite UI framework to build components leveraging an islands architecture, and provide the end-user with just the minimal download needed to interact with the site and progressively enhance the site as needed.

Astro is a fantastic tool for building technical documentation sites and blogs because it provides markdown and MDX support out of the box, which enables a rich writing experience when you need more than just your base-markdown. The React Docs leverage MDX help the documentation writers provide the amazing experience we’ve all been enjoying with the new docs.

In Astro v2, they launched Content Collections, which has significantly improved their already impressive developer experience (DX). In this post, we’re going to look into how Astro (and other frameworks) managed content before Content Collections, what Content Collections are, and some of the superpowers Content Collections give us in our websites.

How Content is Managed in Projects?

A little bit of history…

Content management for websites has always been an interesting challenge. The question is typically: where should I store my content and manage it? We have content management systems (CMS), like WordPress, that people have historically and currently use to quickly build out websites. We also have Headless CMS like Contentful and Sanity that enable writers to enter their content, and then bring on developers to build out the site utilizing modern web frameworks to display content. All these solutions have enabled us to manage our content in a meaningful way, especially when the content writers aren’t developers or technical content writers. However, these tools and techniques can be limiting for writers who want to use rich content objects. For example, in the React Docs, they use Sandpack to create interactive code samples. How can we achieve these same results in our projects?

The Power of MDX

This is where MDX comes in. We can create re-usable markdown components that allow writers to progressively enhance their blog posts with interactive elements without requiring them to write custom code into their article. In the example below, we can see the HeaderLink component that allows the writer to add a custom click handler on the link that executes a script.

While this is a simple example, we could expand this to create charts, graphs, and other interactive elements that we normally couldn’t with plain markdown. Most CMS systems haven’t been upgraded to handle MDX yet, so to provide this type of experience, we need to provide a good writing experience in our codebases.

The MDX Experience

Before Content Collections, we had two main approaches for structuring content in our projects.

The first was to write each new document as a markdown or MDX page in our pages directory, and allow the file system router to handling the routing and define pages for us. This makes it easy to map the blog post to a page quickly.

However, this leads to a challenge of clutter where, as our content grows, our directory will grow. This can make it harder to find files or articles unless a clear naming convention is utilized which can be hard to enforce and maintain. It also mixes our implementation details and content documents which can cause some organizational mess.

Screenshot 2023-04-18 200413

The second approach is to store our content in a separate directory, and then create a page to collect the data out of this directory and organize it. This is the approach the React Docs take. This model has the clear advantage that the content and implementation details are separated.

However, in these models, the page responsible for bringing the content together becomes a glue file trying to do file system operations, and joining data, in a logical way. This can be very brittle as any refactor could cause breakage in this model. Astro enables doing this using their Astro.glob API, but it has some limitations we’ll go over a little later.

Screenshot 2023-04-18 200643

So… What Are Content Collections?

Content Collections enable you to better manage content files in your project. They provide a standard for organizing content, validating aspects of the content, and providing some type-saftey features to your content. Content Collections took the best parts of the separate directory approach, similar to the React Docs, and did their best to eliminate all the cons of this approach.

You can leverage Content Collections by simply moving your content into the src/content directory of your project under a folder of the type of content it represents. Is it a blog post? Stick it in blog. Working with a newsletter? Toss it in newsletter. These folders are the “collections”. You can stick either .md or .mdx files in these folders, and those are your “content entries”. Once your content is in this structure, you can now use Astro’s new content APIs to query your data out in a structured way, and start using its superpowers.

Supercharging your Content!

Query Your Content like a Database

Astro’s content API provides two functions: getCollection() and getEntryBySlug() for querying your data. getCollection() has 2 arguments: the collection name and a filter function. This enables you to fetch all the content in a collection and filter to only specific files/entries based on parameters in the files frontmatter of your choosing. getEntryBySlug() takes in the collection name and file slug and returns the specific requested file. What’s particularly meaningful about these functions is that they return content with full TypeScript typings so you can validate your entries. You don’t need to write file system connecting logic and manage it yourself anymore.

Configuring Content Entry Types

Collection entries can be configured to meet specific requirements. In src/content/config.ts, you can define collections and their schemas using Zod and then registering those with the framework as demonstrated below. This is extremely powerful because now Astro can handle validating our markdown to ensure all the required fields are defined, AND it returns those entities in their target format through the content API.

When you used the Astro.glob API, you would get all frontmatter data as strings or numbers requiring you to parse your data for other standard primitives. With this change, you can now put dates into your frontmatter and get them out as date objects via the content API. You can now remove all your previous validation and remapping code and convert it all to Zod types in your collection config. But instead of having to run linters and tests to find the issues, the Astro runtime will let you know about your collection errors as you’re creating them through your IDE, or server runtime.

Content Collection Gotchas

Content collections can only be top-level folders in the src/content directory. This means you can’t nest collections. However, you can organize content within a collection using subdirectories, and use the filtering feature of the content API to create sub-selections. The main use case for this would be i18n translations for a collection. You can place the content collections in a directory for that language, and use the filter function to select those at runtime for display.

The other main "gotcha" is routing. Before, we were leveraging the file based router to handle rendering our pages. But now, there are no explicit routes defined for these pages. In order to get your pages to render properly, you’ll need to leverage Astro’s dynamic route features to generate pages from your entries. If you’re in static mode (the default), you need to define a getStaticPaths() function on your specified catch all route. If you’re in SSR mode, you’ll need to parse the route at runtime, and query for the expected data.

Some Notes on Migrating from File-Based Routing

If you had a project using Astro before v2, you probably want to upgrade to using content collections. Astro has a good guide on how to accomplish this in their docs. There’s two main gotchas to highlight for you.

The first is that layouts no longer need to be explicitly defined in the markdown files. Because you’re shifting content to use a specified layout, this property is unnecessary. However, if you leave it, it will cause the layout to be utilized on the page causing weird double layouting, so be sure to remove these properties from your frontmatter.

The second is that the content API shifts the frontmatter properties into a new data property on the return entries. Before, you might have had a line of code like post.frontmatter.pubDate. This now needs to be post.data.pubDate. Also, if this was a stringified date before, you now need to stringify the date to make it behave properly, e.g. post.data.pubDate.toDateString().

Finally, you can remove any custom types you made before, because now you can get those directly from your collection config.

In summary…

Astro Content Collections are a great way to manage your content and websites, especially if they’re content-focused and rich. I’ve put together some code demonstrating all the patterns and techniques described in this post that you can check out here. At This Dot, we love utilizing the right tool for the right job. Astro is increasingly becoming one of our favorites for content site projects. We use it for our open source projects - framework.dev and starter.dev- and are always considering it for additional projects.