Skip to content

What is a Monorepo and What Are the Advantages for Using It in Your Project?

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

Monorepos are very popular in the tech industry, and many large companies like Microsoft and Google use them. But what exactly is a monorepo, and would it work for your project?

In this article, we are going to explore what the monorepo architecture is, and learn about some of the commonly used tools for monorepos.

Table of contents

What is a monorepo?

A monorepo is a single version controlled repository that contains multiple independent projects. Some monorepos might contain just 2 or 3 projects while others will contain hundreds of projects using different technologies. This differs from the polyrepo structure where the projects are spread out amongst multiple repositories. It is also important to note that a monorepo is not the same as a monolith. A monolith has multiple sub-projects inside one giant project while monorepos have multiple independent projects inside one repository.

Unified build process and CI/CD process

In a monorepo structure, all of CI/CD(Continuous integration/Continuous delivery) lives in the same repository. This provides teams with more flexibility, and allows you to deploy your services together. You can also choose to deploy your projects separately.

Increased cross team communication

In a polyrepo structure, you will have multiple teams working on different projects spread across many repositories. Often times these teams will not be aware of the status of other repositories outside of their main one. With a monorepo structure, teams are more aware of the changes being made to the projects used in the repo and might spot issues in another projects.

When all of your projects are under a single repository, it makes it easier to establish a consistent code style and set of guidelines across all projects. Some of the guidelines can include naming conventions, best practices for code reviews, branching policies, etc.

In situations where there are breaking changes in the main branch, a monorepo can help identify where the breaking change came from and that team would be responsible for fixing it. A monorepo can also help teams discuss different versioning strategies for the projects.

Steeper learning curve and issues of inclusivity for newer developers

Inclusivity of all developers on a team is really important for the success and outcome of a software project. When you have many diverse perspectives and levels it will lead to a stronger finished product. But a monorepo structure could be seen as intimidating to novice developers. It might be their first time seeing a list of independent projects within the same repository and they might not know how to best contribute in a meaningful way especially if it is an open source situation. It can also be difficult for newer developers to understand the git history under a monorepo structure. If you are using a tool like git blame,it will have to sift through a lot of unrelated commits just to generate that information for you. For these reasons, it is important for the team to help newer developers through the monorepo structure and create issues where all levels can meaningfully contribute.

A look into the starter.dev monorepo

Let's take a look at a couple of This Dot Labs open source monorepos and understand why the decision was made to use a monorepo.

Please note: We will not be exploring all of the possible tools used for monorepos. If you are interested in exploring more monorepo tools, then please checkout this curated list.

What is starter.dev?

starter.dev makes it easy to setup a new project with zero configuration involved. Once you install the starter.dev package(npm i @this-dot/create-starter), you can run the npx @this-dot/create-starter command and it will provide you with a list of starter kits to choose from. Each starter kit includes testing, linting, code examples and a basic configuration setup.

starter kit example

starter.dev is an open source project that consists of a documentation website repository and a GitHub showcases repository dedicated to all of the different starter kits.

Why did the team choose a monorepo structure?

When it came time to plan out the project, the team had to decide if they wanted to maintain each kit in isolation or create different folders for each kit under the same repository. The reason for using a monorepo structure is to provide the team the ability to build the CLI(command-line interface) with greater ease and allow the team to still build the kits in isolation.

What are Yarn Workspaces?

The documentation website repository uses Yarn Workspaces which allows you to setup multiple packages and use the yarn install command to install them all at once.

In the root directory, we have starters and packages directories. If we look at the starters directory, we will see all of the starter kits. Each starter kit will have its own package.json, basic configuration, README and code examples setup.

starter.dev repo

Inside the root package.json file, you will notice this "workspaces": [ "packages/*"] key.

{
  "name": "starter.dev",
  "version": "0.1.0",
  "private": true,
  "scripts": {
    "lint": "yarn workspace website lint && yarn workspace @this-dot/create-starter lint",
    "build:cli": "yarn workspace @this-dot/create-starter build",
    "build:website": "yarn workspace website build",
    "start:cli": "yarn workspace @this-dot/create-starter start",
    "start:website": "yarn workspace website preview",
    "dev:website": "yarn workspace website dev",
    "watch:cli": "yarn workspace @this-dot/create-starter watch"
  },
  "workspaces": [
    "packages/*"
  ]
}

That will tell Yarn to import all of the packages that are listed inside of the packages folder.

Unified CI/CD pipeline with Amplify

The starter.dev GitHub showcases repository has a unified CI/CD and uses Amplify. Inside the root directory, there is an Amplify yaml file where each application has their own set of build commands. Each project listed in the monorepo has its own deployment process setup.

version: 1
applications:
  - appRoot: next-react-query-tailwind
    frontend:
      phases:
        preBuild:
          commands:
            - nvm install --lts=gallium
            - yarn install
        build:
          commands:
            - yarn run build
      artifacts:
        baseDirectory: .next
        files:
          - "**/*"
      cache:
        paths:
          - node_modules/**/*

  - appRoot: angular-apollo-tailwind
    frontend:
      phases:
        preBuild:
          commands:
            - nvm install --lts=gallium
            - yarn install
            - yarn generate
        build:
          commands:
            - node -v
            - yarn run build
      artifacts:
        baseDirectory: dist/starter-dev-angular
        files:
          - "**/*"
      cache:
        paths:
          - node_modules/**/*

How does the team manage issues?

If you look at the issues tab, you will notice that all of the developers are following this same naming convention for issues: [starter kit] title for issue.

This works well for a variety of reasons:

  1. Any developer interested in contributing to the project, can see at a quick glance which issues fit their skill and interest level. starter dev issue tab
  2. It will be easier to track the status of the issues on a project board and see which areas need more attention.

Conclusion

We have explored the monorepo structure and talked about a few features. We have also compared it to other architectures like monoliths and polyrepos to see how if differs.

Then we took an in depth look into starter.dev and how it uses Yarn Workspaces to help organize the project.

I hope your enjoyed this article and best of luck on your programming journey.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

You might also like

It's Impossible For This Code to Fail - with Loren Sands-Ramshaw  cover image

It's Impossible For This Code to Fail - with Loren Sands-Ramshaw

Loren Sands-Ramshaw, Developer Relations Engineer at Temporal joins Rob Ocel to talk about reliable application development. They introduce the topic of durable execution and talk about reliability in systems, unraveling common issues developers face and showcase the benefits that durable execution can bring to software development. They also talk about the challenges of traditional programming and the complexities of event-driven architecture. Listen to the full podcast here: https://modernweb.podbean.com/e/modern-web-podcast-s11e19-its-impossible-for-this-code-to-fail/...

The Dangers of ORMs and How to Avoid Them cover image

The Dangers of ORMs and How to Avoid Them

Background We recently engaged with a client reporting performance issues with their product. It was a legacy ASP .NET application using Entity Framework to connect to a Microsoft SQL Server database. As the title of this post and this information suggest, the misuse of this Object-Relational Mapping (ORM) tool led to these performance issues. Since our blog primarily focuses on JavaScript/TypeScript content, I will leverage TypeORM to share examples that communicate the issues we explored and their fixes, as they are common in most ORM tools. Our Example Data If you're unfamiliar with TypeORM I recommend reading their docs or our other posts on the subject. To give us a variety of data situations, we will leverage a classic e-commerce example using products, product variants, customers, orders, and order line items. Our data will be related as follows: The relationship between customers and products is one we care about, but it is optional as it could be derived from customer->order->order line item->product. The TypeORM entity code would look as follows: ` With this example set, let's explore some common misuse patterns and how to fix them. N + 1 Queries One of ORMs' superpowers is quick access to associated records. For example, we want to get a product and its variants. We have 2 options for writing this operation. The first is to join the table at query time, such as: ` Which resolves to a SQL statement like (mileage varies with the ORM you use): ` The other option is to query the product variants separately, such as: ` This example executes 2 queries. The join operation performs 1 round trip to the database at 200ms, whereas the two operations option performs 2 round trips at 100ms. Depending on the location of your database to your server, the round trip time will impact your decision here on which to use. In this example, the decision to join or not is relatively unimportant and you can implement caching to make this even faster. But let's imagine a different situation/query that we want to run. Let's say we want to fetch a set of orders for a customer and all their order items and the products associated with them. With our ORM, we may want to write the following: ` When written out like this and with our new knowledge of query times, we can see that we'll have the following performance: 100ms * 1 query for orders + 100ms * order items count + 100ms * order items' products count. In the best case, this only takes 100ms, but in the worst case, we're talking about seconds to process all the data. That's an O(1 + N + M) operation! We can eagerly fetch with joins or collection queries based on our entity keys, and our query performance becomes closer to 100ms for orders + 100ms for order line items join + 100ms for product join. In the worst case, we're looking at 300ms or O(1)! Normally, N+1 queries like this aren't so obvious as they're split into multiple helper functions. In other languages' ORMs, some accessors look like property lookups, e.g. order.orderItems. This can be achieved with TypeORM using their lazy loading feature (more below), but we don't recommend this behavior by default. Also, you need to be wary of whether your ORM can be utilized through a template/view that may be looping over entities and fetching related records. In general, if you see a record being looped over and are experiencing slowness, you should verify if you've prefetched the data to avoid N+1 operations, as they can be a major bottleneck for performance. Our above example can be optimized by doing the following: ` Here, we prefetch all the needed order items and products and then index them in a hash table/dictionary to look them up during our loops. This keeps our code with the same readability but improves the performance, as in-memory lookups are constant time and nearly instantaneous. For performance comparison, we'd need to compare it to doing the full operation in the database, but this removes the egregious N+1 operations. Eager v Lazy Loading Another ORM feature to be aware of is eager loading and lazy loading. This implementation varies greatly in different languages and ORMs, so please reference your tool's documentation to confirm its behavior. TypeORM's eager v lazy loading works as follows: - If eager loading is enabled for a field when you fetch a record, the relationship will automatically preload in memory. - If lazy loading is enabled, the relationship is not available by default, and you need to request it via a key accessor that executes a promise. - If neither is enabled, it defaults to the behavior ascribed above when we explained handling N+1 queries. Sometimes, you don't want these relationships to be preloaded as you don't use the data. This behavior should not be used unless you have a set of relations that are always loaded together. In our example, products likely will always need product variants loaded, so this is a safe eager load, but eager loading order items on orders wouldn't always be used and can be expensive. You also need to be aware of the nuances of your ORM. With our original problem, we had an operation that looked like product.productVariants.insert({ … }). If you read more on Entity Framework's handling of eager v lazy loading, you'll learn that in this example, the product variants for the product are loaded into memory first and then the insert into the database happens. Loading the product variants into memory is unnecessary. A product with 100s (if not 1000s) of variants can get especially expensive. This was the biggest offender in our client's code base, so flipping the query to include the ID in the insert operation and bypassing the relationship saved us _seconds_ in performance. Database Field Performance Issues Another issue in the project was loading records with certain data types in fields. Specifically, the text type. The text type can be used to store arbitrarily long strings like blog post content or JSON blobs in the absence of a JSON type. Most databases use a technique that stores text fields off rows, which requires a special file system lookup operation to fetch that data. This can make a typical database lookup that would take 100ms under normal conditions to take 200ms. If you combine this problem with some of the N+1 and eager loading problems we've mentioned, this can lead to seconds, if not minutes, of query slowdown. For these, you should consider not including the column by default as part of your ORM. TypeORM allows for this via their hidden columns feature. In this example, for our product description, we could change the definition to be: ` This would allow us to query products without descriptions quickly. If we needed to include the product description in an operation, we'd have to use the addSelect function to our query to include that data in our result like: ` This is an optimization you should be wary of making in existing systems but one worth considering to improve performance, especially for data reports. Alternatively, you could optimize a query using the select method to limit the fields returned to those you need. Database v. In-Memory Fetching Going back to one of our earlier examples, we wrote the following: ` This involves loading our data in memory and then using system memory to perform a group-by operation. Our database could have also returned this result grouped. We opted to perform this operation like this because it made fetching the order item IDs easier. This takes some performance challenges away from our database and puts the performance effort on our servers. Depending on your database and other system constraints, this is a trade-off, so do some benchmarking to confirm your best options here. This example is not too bad, but let's say we wanted to get all the orders for a customer with an item that cost more than $20. Your first inclination might be to use your ORM to fetch all the orders and their items and then filter that operation in memory using JavaScript's filter method. In this case, you're loading data you don't need into memory. This is a time to leverage the database more. We could write this query as follows in SQL: ` This just loads the orders that had the data that matched our condition. We could write this as: ` We constrained this to a single customer, but if it were for a set of customers, it could be significantly faster than loading all the data into memory. If our server has memory limitations, this is a good concern to be aware of when optimizing for performance. We noticed a few instances on our client's implementation where the filter operations were applied in functions that appeared to run the operation in the database but were running the operation in memory, so this was preloading more data into memory than needed on a memory-constrained server. Refer to your ORM manual to avoid this type of performance hit. Lack of Indexes The final issue we encountered was a lack of indexes on key lookups. Some ORMs do not support defining indexes in code and are manually applied to databases. These tend not to be documented, so an out-of-sync issue can happen in different environments. To avoid this challenge, we prefer ORMs that support indexes in code like TypeORM. In our last example, we filtered on the cost of an order item, but the cost field does not contain an index. This leads to a full table scan of our data collection filtered by the customer. The query cost can be very expensive if a customer has thousands of orders. Adding the index can make our query super fast, but it comes at a cost. Each new index makes writing to the database slower and can exponentially increase the size of our database needs. You should only add indexes to fields that you are querying against regularly. Again, be sure you can notate these indexes in code so they can be replicated across environments easily. In our client's system, the previous developer did not include indexes in the code, so we retroactively added the database indexes to the codebase. We recommend using your database's recommended tool for inspection to determine what indexes are in place and keep these systems in sync at all times. Conclusion ORMs can be an amazing tool to help you and your teams build applications quickly. However, they have gotchas for performance that you should be aware of and can identify while developing or during code reviews. These are some of the most common examples I could think of for best practices. When you hear the horror stories about ORMs, these are some of the challenges typically discussed. I'm sure there are more, though. What are some that you know? Let us know!...

How to Resolve Nested Queries in Apollo Server cover image

How to Resolve Nested Queries in Apollo Server

When working with relational data, there will be times when you will need to access information within nested queries. But how would this work within the context of Apollo Server? In this article, we will take a look at a few code examples that explore different solutions on how to resolve nested queries in Apollo Server. I have included all code examples in CodeSandbox if you are interested in trying them out on your own. Prerequisites This article assumes that you have a basic knowledge of GraphQL terminology. Table of Contents - How to resolve nested queries: An approach using resolvers and the filter method - A refactored approach using Data Loaders and Data Sources - What are Data Loaders - How to setup a Data Source - Setting up our schemas and resolvers - Resolving nested queries when microservices are involved - Conclusion How to resolve nested queries: An approach using resolvers and the filter method In this first example, we are going to be working with two data structures called musicBrands and musicAccessories. musicBrands is a collection of entities consisting of id and name. musicAccessories is a collection of entities consisting of the product name, price, id and an associated brandId. You can think of the brandId as a foreign key that connects the two database tables. We also need to set up the schemas for the brands and accessories. ` The next step is to set up a resolver for our Query to return all of the music accessories. ` When we run the following query and start the server, we will see this JSON output: ` ` As you can see, we are getting back the value of null for the brands field. This is because we haven't set up that relationship yet in the resolvers. Inside our resolver, we are going to create another query for the MusicAccessories and have the value for the brands key be a filtered array of results for each brand. ` When we run the query, this will be the final result: ` ` This single query makes it easy to access the data we need on the client side as compared to the REST API approach. If this were a REST API, then we would be dealing with multiple API calls and a Promise.all which could get a little messy. You can find the entire code in this CodeSandbox example. A refactored approach using Data Loaders and Data Sources Even though our first approach does solve the issue of resolving nested queries, we still have an issue fetching the same data repeatedly. Let’s look at this example query: ` If we take a look at the results, we are making additional queries for the brand each time we request the information. This leads to the N+1 problem in our current implementation. We can solve this issue by using Data Loaders and Data Sources. What are Data Loaders Data Loaders are used to batch and cache fetch requests. This allows us to fetch the same data and work with cached results, and reduce the number of API calls we have to make. To learn more about Data Loaders in GraphQL, please read this helpful article. How to setup a Data Source In this example, we will be using the following packages: - apollo-datasource - apollo-server-caching - dataloader We first need to create a BrandAccessoryDataSource class which will simulate the fetching of our data. ` We will then set up a constructor with a custom Dataloader. ` Right below our constructor, we will set up the context and cache. ` We then want to set up the error handling and cache keys for both the accessories and brands. To learn more about how caching works with GraphQL, please read through this article. ` Next, we are going to set up an asynchronous function called get which takes in an id. The goal of this function is to first check if there is anything in the cached results and if so return those cached results. Otherwise, we will set that data to the cache and return it. We will set the ttl(Time to Live in cache) value to 15 seconds. ` Below the get function, we will create another asynchronous function called getByBrand which takes in a brand. This function will have a similar setup to the get function but will filter out the data by brand. ` Setting up our schemas and resolvers The last part of this refactored example includes modifying the resolvers. We first need to add an accessory key to our Query schema. ` Inside the resolver, we will add the accessories key with a value for the function that returns the data source we created earlier. ` We also need to refactor our Brand resolver to include the data source we set up earlier. ` Lastly, we need to modify our ApolloServer object to include the BrandAccessoryDataSource. ` Here is the entire CodeSandbox example. When the server starts up, click on the Query your server button and run the following query: ` Resolving nested queries when microservices are involved Microservices is a type of architecture that will split up your software into smaller independent services. All of these smaller services can interact with a single API data layer. In this case, this data layer would be GraphQL. The client will interact directly with this data layer, and will consume API data from a single entry point. You would similarly resolve your nested queries as before because, at the end of the day, there are just functions. But now, this single API layer will reduce the number of requests made by the client because only the data layer will be called. This simplifies the data fetching experience on the client side. Conclusion In this article, we looked at a few code examples that explored different solutions on how to resolve nested queries in Apollo Server. The first approach involved creating custom resolvers and then using the filter method to filter out music accessories by brand. We then refactored that example to use a custom DataLoader and Data Source to fix the "N+1 problem". Lastly, we briefly touched on how to approach this solution if microservices were involved. If you want to get started with Apollo Server and build your own nested queries and resolvers using these patterns, check out our serverless-apollo-contentful starter kit!...

“Music and code have a lot in common,” freeCodeCamp’s Jessica Wilkins on what the tech community is doing right to onboard new software engineers cover image

“Music and code have a lot in common,” freeCodeCamp’s Jessica Wilkins on what the tech community is doing right to onboard new software engineers

Before she was a software developer at freeCodeCamp, Jessica Wilkins was a classically trained clarinetist performing across the country. Her days were filled with rehearsals, concerts, and teaching, and she hadn’t considered a tech career until the world changed in 2020. > “When the pandemic hit, most of my gigs were canceled,” she says. “I suddenly had time on my hands and an idea for a site I wanted to build.” That site, a tribute to Black musicians in classical and jazz music, turned into much more than a personal project. It opened the door to a whole new career where her creative instincts and curiosity could thrive just as much as they had in music. Now at freeCodeCamp, Jessica maintains and develops the very JavaScript curriculum that has helped her and millions of developers around the world. We spoke with Jessica about her advice for JavaScript learners, why musicians make great developers, and how inclusive communities are helping more women thrive in tech. Jessica’s Top 3 JavaScript Skill Picks for 2025 If you ask Jessica what it takes to succeed as a JavaScript developer in 2025, she won’t point you straight to the newest library or trend. Instead, she lists three skills that sound simple, but take real time to build: > “Learning how to ask questions and research when you get stuck. Learning how to read error messages. And having a strong foundation in the fundamentals” She says those skills don’t come from shortcuts or shiny tools. They come from building. > “Start with small projects and keep building,” she says. “Books like You Don’t Know JS help you understand the theory, but experience comes from writing and shipping code. You learn a lot by doing.” And don’t forget the people around you. > “Meetups and conferences are amazing,” she adds. “You’ll pick up things faster, get feedback, and make friends who are learning alongside you.” Why So Many Musicians End Up in Tech A musical past like Jessica’s isn’t unheard of in the JavaScript industry. In fact, she’s noticed a surprising number of musicians making the leap into software. > “I think it’s because music and code have a lot in common,” she says. “They both require creativity, pattern recognition, problem-solving… and you can really get into flow when you’re deep in either one.” That crossover between artistry and logic feels like home to people who’ve lived in both worlds. What the Tech Community Is Getting Right Jessica has seen both the challenges and the wins when it comes to supporting women in tech. > “There’s still a lot of toxicity in some corners,” she says. “But the communities that are doing it right—like Women Who Code, Women in Tech, and Virtual Coffee—create safe, supportive spaces to grow and share experiences.” She believes those spaces aren’t just helpful, but they’re essential. > “Having a network makes a huge difference, especially early in your career.” What’s Next for Jessica Wilkins? With a catalog of published articles, open-source projects under her belt, and a growing audience of devs following her journey, Jessica is just getting started. She’s still writing. Still mentoring. Still building. And still proving that creativity doesn’t stop at the orchestra pit—it just finds a new stage. Follow Jessica Wilkins on X and Linkedin to keep up with her work in tech, her musical roots, and whatever she’s building next. Sticker illustration by Jacob Ashley....

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Prefer email? hi@thisdot.co