Skip to content

Web Scraping with TypeScript and Node.js

Sometimes you'll find yourself wanting to use a set of data from a website, but the data won't be available as an API or in a downloadable format like CSV. In these cases, you may have to write a web scraper to download individual web pages and extract the data you want from within their HTML. This guide will teach you the basics of writing a web scraper using TypeScript and Node.js, and will note several of the obstacles you might encounter during web scraping.

If you want to skip straight to the finish code example, check it out on GitHub.

Setup

First things first: we need to initialize our project and install the base dependencies. We'll be writing our web scraper in TypeScript and running it as Node.js scripts using ts-node. For simplicity, we'll create an index.ts file in the project root to work from. From the command line, run the following to get started:

mkdir my-web-scraper && cd my-web-scraper # create project directory
git init # initialize new git repository
echo "node_modules" >> .gitignore # do not track node_modules in git
npm init -y # initialize Node.js project
# install dependencies
npm install typescript ts-node
npm install --save-dev @types/node
touch index.ts # create an empty TypeScript file

Node.js doesn't run TypeScript files natively. Rather than use the TypeScript compiler to output new JavaScript files whenever we want to run the script, we'll use ts-node to run the TypeScript files directly. We'll go ahead and add this to our new package.json file as an npm script.

  "scripts": {
    "scrape": "ts-node ./index.ts"
  }

Now, we'll be able to run our scraper from index.ts with the command npm run scrape.

Fetching Websites

In our examples, we'll be using Axios to make our http requests. If you'd prefer something else, like Node Fetch to match the Fetch API until it's ready in Node.js, that's fine too.

npm install axios

Let's create our first function for fetching a given URL and returning the HTML from that page.

import axios from 'axios';

function fetchPage(url: string): Promise<string | undefined> {
  const HTMLData = axios
    .get(url)
    .then(res => res.data)
    .catch((error: AxiosError) => {
      console.error(`There was an error with ${error.config.url}.`);
      console.error(error.toJSON());
    });

  return HTMLData;
}

This function will use Axios to create a promise to fetch a given URL, and return the HTML it gets back as a string. If there's an error, it will log that error to the console, and return undefined instead. Since you're probably going to be running this scraper from your command line throughout development, a healthy number of console.logs will help you make sure the script is running as expected.

Caching Scraped Pages

In the event that you're trying to scrape many, many static web pages in a single script, you might want to cache the pages locally as you download them. This will save you time and headache as you work on your scraper. You're much less likely to annoy the website you're scraping with high traffic and the bandwidth costs associated with it, and your scripts will probably run faster if they aren't limited by your Internet connection.

Let's go ahead and create a .cache folder in the project root. You probably won't want to keep cached files in your git history, so we'll want to add this folder to your .gitignore file.

mkdir .cache
echo ".cache" >> .gitignore

To cache our results, we'll first check if a cached version of the given page already exists. If so, we'll use that. If not, we'll fetch the page and save it to the .cache folder. For filenames, we're just going to base-64 encode the page's URL. If you prefer some other way to generate a unique filename, that's fine too. I've chosen the base-64 encoded URLs because it's easy and very obviously a temporary sort of file. We also have an optional function argument ignoreCache, in case you've built up your cache but want to scrape fresh data anyway.

import { existsSync, mkdirSync } from 'fs';
import { readFile, writeFile } from 'fs/promises';
import { resolve } from 'path';

async function fetchFromWebOrCache(url: string, ignoreCache = false) {
  // If the cache folder doesn't exist, create it
  if (!existsSync(resolve(__dirname, '.cache'))) {
    mkdirSync('.cache');
  }
  console.log(`Getting data for ${url}...`);
  if (
    !ignoreCache &&
    existsSync(
      resolve(__dirname, `.cache/${Buffer.from(url).toString('base64')}.html`),
    )
  ) {
    console.log(`I read ${url} from cache`);
    const HTMLData = await readFile(
      resolve(__dirname, `.cache/${Buffer.from(url).toString('base64')}.html`),
      { encoding: 'utf8' },
    );
    return HTMLData;
  } else {
    console.log(`I fetched ${url} fresh`);
    const HTMLData = await fetchPage(url);
    if (!ignoreCache && HTMLData) {
      writeFile(
        resolve(
          __dirname,
          `.cache/${Buffer.from(url).toString('base64')}.html`,
        ),
        HTMLData,
        { encoding: 'utf8' },
      );
    }
    return HTMLData;
  }
}

Extracting Data with jsdom

Now that we have HTML to work with, we want to extract the relevant data from it. To do this, we will use jsdom, a JavaScript implementation of the DOM. This will let us interact with the downloaded HTML in the exact same way as if we were working in a browser's console, giving access to methods like querySelector.

(If you prefer a syntax more like jQuery's, Cheerio is also a popular option.)

npm install jsdom
npm install --save-dev @types/jsdom

Now let's import jsdom and use it to return the Document object of our HTML string. Just modify the previous fetchFromWebOrCache to turn HTMLData into a DOM object, and return its window.document.

import { JSDOM } from 'jsdom';

async function fetchFromWebOrCache(url: string, ignoreCache = false) {
  // Get the HTMLData from fetching or from cache
  const HTMLData = '<html>...</html>'
  const dom = new JSDOM(HTMLData);
  return dom.window.document;
}

Now that we're working with a Document instead of a string, we've got access to everything we'd have if we were working in the browser console. This makes it much easier to write code that extracts the pieces of a page that we want! For example, let's scrape whatever is on the front page of Hacker News right now. We'll write a function that accepts the Document of the Hacker News front page, finds all of the links, and gives us back the link text and URL as a JavaScript object.

Using your browser's developer tools, you can easily inspect an element on the page with desired data to figure out a selector path. In our example, we can right-click a link and choose Inspect to view it in DevTools. Then we right-click the DOM element, and choose "Copy > Copy selector" in Chrome or "Copy > CSS Selector" in Firefox, for example.

A copied selector will give you a string of text that selects only the element you copied it from in DevTools. And often that is useful! Just throw your selector into document.querySelector('selector'), and you're good to go. But in our case, we want all of the front page links. So we need a broader selector than copy-pasting from DevTools will give us. This is where you'll have to actually read through the HTML, classes, ids, etc., to figure out how to craft the right selector.

Fortunately for us in this example, all of the links on the Hacker News feed have a unique class: titlelink. So we can use document.querySelectorAll('a.titlelink') to get all of them.

// Pass the scraped Document from news.ycombinator.com to this
// function to extract data about front page links.
function extractData(document: Document) {
  const writingLinks: HTMLAnchorElement[] = Array.from(
    document.querySelectorAll('a.titlelink'),
  );
  return writingLinks.map(link => {
    return {
      title: link.text,
      url: link.href,
    };
  });
}

This function is only a simple example, and would be different depending on what you want to get out of a page. When working with jsdom, remember that you're not working with arrays and objects but with NodeLists and Elements. To get useful data out of your selections, you'll often have to do things like convert a NodeList into an array as shown above.

Sometimes you'll have to get creative with your selections. I recently tried to scrape the information from an HTML table on a page with varying numbers of tables and no classes. Because the number of tables was always different, I couldn't reliably select from a list of tables by which number table it was. I had to select every table present on a page, then filter them by the text in the first cell to get precisely the one table I needed!

// Sometimes, web scraping is just hard...
const table: HTMLTableElement = Array.from(
    data.querySelectorAll('table'),
  ).filter(t =>
    t.children[0].children[0].children[0].innerHTML.match(
      /Unique Text in First Cell which IDs the Table/,
    ),
  )[0];

Extracting Data with Regular Expressions

Unfortunately for us, not all pages on the Internet are well-structured and ready for scraping. Sometimes, they don't even try to use HTML tags properly. In these sad cases, you may need to turn to regular expressions (regex) to extract what you need. We won't need to resort to such extreme measures in our example of scraping Hacker News, but it's worth knowing that you might need to do this.

I'll give you a contrived example where you would need some regex, based on another site I recently scraped. Imagine the following badly-done HTML:

<div class="pokemon">
  Name: Pikachu<br />
  Number: 25<br />
  Type: Electric<br />
  Weakness: Ground
</div>

The various data attributes we care about aren't wrapped by their own HTML elements! Everything is just inside a div with some br tags to create line breaks. If I wanted to extract the data from this, I could use regex to find and match the text and patterns I expect to find. This can require trial and error, and I recommend using a tool like regex101 to test the regular expressions you come up with. In this example, we might write the following code:

const rawPokemonHTML = document.querySelector('.pokemon');
const name = rawPokemonHTML.match(/Name: (\w+)/)[0];
const num = rawPokemonHTML.match(/Number: (\d+)/)[0];
// etc...

Saving Data

Once we've extracted our data from the HTML, we'll want to save it. This is basically the same as when we created a cache for the downloaded HTML files.

import { existsSync, mkdirSync } from 'fs';
import { writeFile } from 'fs/promises';
import { resolve } from 'path';

function saveData(filename: string, data: any) {
  if (!existsSync(resolve(__dirname, 'data'))) {
    mkdirSync('data');
  }
  writeFile(resolve(__dirname, `data/${filename}.json`), JSON.stringify(data), {
    encoding: 'utf8',
  });
}

Putting It All Together

Now that we've got all the necessary pieces, we're ready to build our JSON file of Hacker News front page stories. To see all of our code in one piece, check it out on GitHub.

async function getData() {
  const document = await fetchFromWebOrCache(
    'https://news.ycombinator.com/',
    true, // Hacker News is always changing, so ignore the cache!
  );
  const data = extractData(document);
  saveData('hacker-news-links', data);
}

getData();

When we run our script from the command line, it will execute getData(). That function will fetch the HTML from Hacker News' front page, extract all of the links and their titles, and then save it to data/hacker-news-links.json. And while you probably don't need a list of links from Hacker News, this information should be enough to get you started with collecting some data from the web which you do care about.

This Dot Labs is a development consultancy that is trusted by top industry companies, including Stripe, Xero, Wikimedia, Docusign, and Twilio. This Dot takes a hands-on approach by providing tailored development strategies to help you approach your most pressing challenges with clarity and confidence. Whether it's bridging the gap between business and technology or modernizing legacy systems, you’ll find a breadth of experience and knowledge you need. Check out how This Dot Labs can empower your tech journey.

You might also like

Angular 17: Continuing the Renaissance cover image

Angular 17: Continuing the Renaissance

Angular 17: A New Era November 8th marked a significant milestone in the world of Angular with the release of Angular 17. This wasn't just any ordinary update; it was a leap forward, signifying a new chapter for the popular framework. But what made this release truly stand out was the unveiling of Angular's revamped website, complete with a fresh brand identity and a new logo. This significant transformation represents the evolving nature of Angular, aligning with the modern demands of web development. To commemorate this launch, we also hosted a release afterparty, where we went deep into its new features with Minko Gechev from the Angular core team, and Google Developer Experts (GDEs) Brandon Roberts, Deborah Kurata, and Enea Jahollari. But what exactly are these notable new features in the latest version? Let's dive in and explore. The Angular Renaissance Angular has been undergoing a significant revival, often referred to as Angular's renaissance, a term coined by Sarah Drasner, the Director of Engineering at Google, earlier this year. This revival has been particularly evident in its recent versions. The Angular team has worked hard to introduce many new improvements, focusing on signal-based reactivity, hydration, server-side rendering, standalone components, and migrating to esbuild and Vite for a better and faster developer experience. This latest release, in particular, marks many of these features as production-ready. Standalone Components About a year ago, Angular began a journey toward modernity with the introduction of standalone components. This move significantly enhanced the developer experience, making Angular more contemporary and user-friendly. In Angular's context, a standalone component is a self-sufficient, reusable code unit that combines logic, data, and user interface elements. What sets these components apart is their independence from Angular's NgModule system, meaning they do not rely on it for configuration or dependencies. By setting a standalone: true` flag, you no longer need to embed your component in an NgModule and you can bootstrap directly off that component: `typescript // ./app/app.component.ts @Component({ selector: 'app', template: 'hello', standalone: true }) export class AppComponent {} // ./main.ts import { bootstrapApplication } from '@angular/platform-browser'; import { AppComponent } from './app/app.component'; bootstrapApplication(AppComponent).catch(e => console.error(e)); ` Compared to the NgModules way of adding components, as shown below, you can immediately see how standalone components make things much simpler. `ts // ./app/app.component.ts import { Component } from '@angular/core'; @Component({ selector: 'app-root', templateUrl: './app.component.html', styleUrls: ['./app.component.css'], }) export class AppComponent { title = 'CodeSandbox'; } // ./app/app.module.ts import { NgModule } from '@angular/core'; import { BrowserModule } from '@angular/platform-browser'; import { AppComponent } from './app.component'; @NgModule({ declarations: [ AppComponent ], imports: [ BrowserModule ], providers: [], bootstrap: [AppComponent] }) export class AppModule { } // .main.ts import { platformBrowserDynamic } from '@angular/platform-browser-dynamic'; import { AppModule } from './app/app.module'; platformBrowserDynamic() .bootstrapModule(AppModule) .catch((err) => console.error(err)); ` In this latest release, the Angular CLI now defaults to generating standalone components, directives, and pipes. This default setting underscores the shift towards a standalone-centric development approach in Angular. New Syntax for Enhanced Control Flow Angular 17 introduces a new syntax for control flow, replacing traditional structural directives like ngIf` or `ngFor`, which have been part of Angular since version 2. This new syntax is designed for fine-grained change detection and eventual zone-less operation when Angular completely migrates to signals. It's more streamlined and performance-efficient, making handling conditional or list content in templates easier. The @if` block replaces `*ngIf` for expressing conditional parts of the UI. `ts @if (a > b) { {{a}} is greater than {{b}} } @else if (b > a) { {{a}} is less than {{b}} } @else { {{a}} is equal to {{b}} } ` The @switch` block replaces `ngSwitch`, offering benefits such as not requiring a container element to hold the condition expression or each conditional template. It also supports template type-checking, including type narrowing within each branch. ```ts @switch (condition) { @case (caseA) { Case A. } @case (caseB) { Case B. } @default { Default case. } } ``` The @for` block replaces `*ngFor` for iteration and presents several differences compared to its structural directive predecessor, `ngFor`. For example, the tracking expression (calculating keys corresponding to object identities) is mandatory but offers better ergonomics. Additionally, it supports `@empty` blocks. `ts @for (item of items; track item.id) { {{ item.name }} } ` Defer Block for Lazy Loading Angular 17 introduces the @defer` block, a dramatically improving lazy loading of content within Angular applications. Within the `@defer` block framework, several sub-blocks are designed to elegantly manage different phases of the deferred loading process. The main content within the `@defer` block is the segment designated for lazy loading. Initially, this content is not rendered, becoming visible only when specific triggers are activated or conditions are met, and after the required dependencies have been loaded. By default, the trigger for a `@defer` block is the browser reaching an idle state. For instance, take the following block: it delays the loading of the calendar-imp` component until it comes into the viewport. Until that happens, a placeholder is shown. This placeholder displays a loading message when the `calendar-imp` component begins to load, and an error message if, for some reason, the component fails to load. `ts @defer (on viewport) { } @placeholder { Calendar placeholder } @loading { Loading calendar } @error { Error loading calendar } ` The on` keyword supports a wide a variety of other conditions, such as: - idle` (when the browser has reached an idle state) - interaction` (when the user interacts with a specified element) - hover` (when the mouse has hovered over a trigger area) - timer(x)` (triggers after a specified duration) - immediate` (triggers the deferred load immediately) The second option of configuring when deferring happens is by using the when` keyword. For example: `ts @defer (when isVisible) { } ` Server-Side Rendering (SSR) Angular 17 has made server-side rendering (SSR) much more straightforward. Now, a --ssr` option is included in the `ng new` command, removing the need for additional setup or configurations. When creating a new project with the `ng new` command, the CLI inquires if SSR should be enabled. As of version 17, the default response is set to 'No'. However, for version 18 and beyond, the plan is to enable SSR by default in newly generated applications. If you prefer to start with SSR right away, you can do so by initializing your project with the `--ssr` flag: `shell ng new --ssr ` For adding SSR to an already existing project, utilize the ng add` command of the Angular CLI: `shell ng add @angular/ssr ` Hydration In Angular 17, the process of hydration, which is essential for reviving a server-side rendered application on the client-side, has reached a stable, production-ready status. Hydration involves reusing the DOM structures rendered on the server, preserving the application's state, and transferring data retrieved from the server, among other crucial tasks. This functionality is automatically activated when server-side rendering (SSR) is used. It offers a more efficient approach than the previous method, where the server-rendered tree was completely replaced, often causing visible UI flickers. Such re-rendering can adversely affect Core Web Vitals, including Largest Contentful Paint (LCP), leading to layout shifts. By enabling hydration, Angular 17 allows for the reuse of the existing DOM, effectively preventing these flickers. Support for View Transitions The new View Transitions API, supported by some browsers, is now integrated into the Angular router. This feature, which must be activated using the withViewTransitions` function, allows for CSS-based animations during route transitions, adding a layer of visual appeal to applications. To use it, first you need to import withViewTransitions`: `ts import { provideRouter, withViewTransitions } from '@angular/router'; ` Then, you need to add it to the provideRouter` configuration: `ts bootstrapApplication(AppComponent, { providers: [ provideRouter(routes, withViewTransitions()) ] }) ` Other Notable Changes - Angular 17 has stabilized signals, initially introduced in Angular 16, providing a new method for state management in Angular apps. - Angular 17 no longer supports Node 16. The minimal Node version required is now 18.13. - TypeScript version 5.2 is the least supported version starting from this release of Angular. - The @Component` decorator now supports a `styleUrl` attribute. This allows for specifying a single stylesheet path as a string, simplifying the process of linking a component to a specific style sheet. Previously, even for a single stylesheet, an array was required under `styleUrls`. Conclusion With the launch of Angular 17, the Angular Renaissance is now in full swing. This release has garnered such positive feedback that developers are showing renewed interest in the framework and are looking forward to leveraging it in upcoming projects. However, it's important to note that it might take some time for IDEs to adapt to the new templating syntax fully. While this transition is underway, rest assured that you can still write perfectly valid code using the old templating syntax, as all the changes in Angular 17 are backward compatible. Looking ahead, the future of Angular appears brighter than ever, and we can't wait to see what the next release has in store!...

Nuxt DevTools v1.0: Redefining the Developer Experience Beyond Conventional Tools cover image

Nuxt DevTools v1.0: Redefining the Developer Experience Beyond Conventional Tools

In the ever-evolving world of web development, Nuxt.js has taken a monumental leap with the launch of Nuxt DevTools v1.0. More than just a set of tools, it's a game-changer—a faithful companion for developers. This groundbreaking release, available for all Nuxt projects and being defaulted from Nuxt v3.8 onwards, marks the beginning of a new era in developer tools. It's designed to simplify our development journey, offering unparalleled transparency, performance, and ease of use. Join me as we explore how Nuxt DevTools v1.0 is set to revolutionize our workflow, making development faster and more efficient than ever. What makes Nuxt DevTools so unique? Alright, let's start delving into the features that make this tool so amazing and unique. There are a lot, so buckle up! In-App DevTools The first thing that caught my attention is that breaking away from traditional browser extensions, Nuxt DevTools v1.0 is seamlessly integrated within your Nuxt app. This ensures universal compatibility across browsers and devices, offering a more stable and consistent development experience. This setup also means the tools are readily available in the app, making your work more efficient. It's a smart move from the usual browser extensions, making it a notable highlight. To use it you just need to press Shift + Option + D` (macOS) or `Shift + Alt + D` (Windows): With simple keystrokes, the Nuxt DevTools v1.0 springs to life directly within your app, ready for action. This integration eliminates the need to toggle between windows or panels, keeping your workflow streamlined and focused. The tools are not only easily accessible but also intelligently designed to enhance your productivity. Pages, Components, and Componsables View The Pages, Components, and Composables View in Nuxt DevTools v1.0 are a clear roadmap for your app. They help you understand how your app is built by simply showing its structure. It's like having a map that makes sense of your app's layout, making the complex parts of your code easier to understand. This is really helpful for new developers learning about the app and experienced developers working on big projects. Pages View lists all your app's pages, making it easier to move around and see how your site is structured. What's impressive is the live update capability. As you explore the DevTools, you can see the changes happening in real-time, giving you instant feedback on your app's behavior. Components View is like a detailed map of all the parts (components) your app uses, showing you how they connect and depend on each other. This helps you keep everything organized, especially in big projects. You can inspect components, change layouts, see their references, and filter them. By showcasing all the auto-imported composables, Nuxt DevTools provides a clear overview of the composables in use, including their source files. This feature brings much-needed clarity to managing composables within large projects. You can also see short descriptions and documentation links in some of them. Together, these features give you a clear picture of your app's layout and workings, simplifying navigation and management. Modules and Static Assets Management This aspect of the DevTools revolutionizes module management. It displays all registered modules, documentation, and repository links, making it easy to discover and install new modules from the community! This makes managing and expanding your app's capabilities more straightforward than ever. On the other hand, handling static assets like images and videos becomes a breeze. The tool allows you to preview and integrate these assets effortlessly within the DevTools environment. These features significantly enhance the ease and efficiency of managing your app's dynamic and static elements. The Runtime Config and Payload Editor The Runtime Config and Payload Editor in Nuxt DevTools make working with your app's settings and data straightforward. The Runtime Config lets you play with different configuration settings in real time, like adjusting settings on the fly and seeing the effects immediately. This is great for fine-tuning your app without guesswork. The Payload Editor is all about managing the data your app handles, especially data passed from server to client. It's like having a direct view and control over the data your app uses and displays. This tool is handy for seeing how changes in data impact your app, making it easier to understand and debug data-related issues. Open Graph Preview The Open Graph Preview in Nuxt DevTools is a feature I find incredibly handy and a real time-saver. It lets you see how your app will appear when shared on social media platforms. This tool is crucial for SEO and social media presence, as it previews the Open Graph tags (like images and descriptions) used when your app is shared. No more deploying first to check if everything looks right – you can now tweak and get instant feedback within the DevTools. This feature not only streamlines the process of optimizing for social media but also ensures your app makes the best possible first impression online. Timeline The Timeline feature in Nuxt DevTools is another standout tool. It lets you track when and how each part of your app (like composables) is called. This is different from typical performance tools because it focuses on the high-level aspects of your app, like navigation events and composable calls, giving you a more practical view of your app's operation. It's particularly useful for understanding the sequence and impact of events and actions in your app, making it easier to spot issues and optimize performance. This timeline view brings a new level of clarity to monitoring your app's behavior in real-time. Production Build Analyzer The Production Build Analyzer feature in Nuxt DevTools v1.0 is like a health check for your app. It looks at your app's final build and shows you how to make it better and faster. Think of it as a doctor for your app, pointing out areas that need improvement and helping you optimize performance. API Playground The API Playground in Nuxt DevTools v1.0 is like a sandbox where you can play and experiment with your app's APIs. It's a space where you can easily test and try out different things without affecting your main app. This makes it a great tool for trying out new ideas or checking how changes might work. Some other cool features Another amazing aspect of Nuxt DevTools is the embedded full-featured VS Code. It's like having your favorite code editor inside the DevTools, with all its powerful features and extensions. It's incredibly convenient for making quick edits or tweaks to your code. Then there's the Component Inspector. Think of it as your code's detective tool. It lets you easily pinpoint and understand which parts of your code are behind specific elements on your page. This makes identifying and editing components a breeze. And remember customization! Nuxt DevTools lets you tweak its UI to suit your style. This means you can set up the tools just how you like them, making your development environment more comfortable and tailored to your preferences. Conclusion In summary, Nuxt DevTools v1.0 marks a revolutionary step in web development, offering a comprehensive suite of features that elevate the entire development process. Features like live updates, easy navigation, and a user-friendly interface enrich the development experience. Each tool within Nuxt DevTools v1.0 is thoughtfully designed to simplify and enhance how developers build and manage their applications. In essence, Nuxt DevTools v1.0 is more than just a toolkit; it's a transformative companion for developers seeking to build high-quality web applications more efficiently and effectively. It represents the future of web development tools, setting new standards in developer experience and productivity....

Creating Custom Types in TypeScript with Indexed Access Types, Const Assertions, and Satisfies cover image

Creating Custom Types in TypeScript with Indexed Access Types, Const Assertions, and Satisfies

Frequently when writing TypeScript, you may need to create a new type from an existing type. For example, you may have a large type that you need to use in multiple places, and you want to create a new type that is a subset of the original type. Or you may have a large object full of data that you want to use to create types to maintain type safety. In this post, we'll cover how to create new types from existing types and data in TypeScript. Accessing parts of a type with indexed access types In JavaScript, you can access an object property's value with the string key of that property using someObject['someProperty']`. You can use the same sort of syntax with TypeScript's types to get specific pieces out of a type. For example: `ts type Pizza = { diameter: number; crust: 'thin' | 'thick' | 'stuffed'; sauce: 'red' | 'white'; toppings: Array; customer: string; } type Diameter = Pizza["diameter"]; // type Diameter = number type Toppings = Pizza["toppings"]; // type Toppings = Array ` Using TypeName["someProperty"]` allows you to extract that piece of the type. These are called indexed access types. If you needed to use a piece of a large, complex type, you could simply pull that piece out into its own type using indexed access types. Why indexed access types? But what good is this? Couldn't I just refactor? In the previous example, wouldn't it be better for the pizza's Toppings` to be a type of its own before defining `Pizza`, and then passed in as `toppings: Toppings`? I'd say yes, it would be. (And we'll cover that later!) But what if you're working with a type that you don't have control over (e.g., from a third party library), but you need to use a piece of it in a different type? That's where indexed access types come in. Why not just use `Pick`? Wait, why not just use Pick` instead of indexed access types? You would want to use the indexed access type when you want _specifically_ a piece of the type, and not a type with that single property. For example: `ts type Pizza = { diameter: number; crust: 'thin' | 'thick' | 'stuffed'; sauce: 'red' | 'white'; toppings: Array; customer: string; } type Toppings1 = Pick; // type Toppings1 = { // toppings: Array; // } type Toppings2 = Pizza["toppings"]; // type Toppings2 = Array; ` The index is a type! It isn't obvious from looking at the examples, but when you index a type, you're doing so with another type! So if I wanted to access a piece of a type with a defined string, it would fail. For example: `ts const key = "toppings" type Toppings = Pizza[key] // This fails with the following message: // Type 'key' cannot be used as an index type. // 'key' refers to a value, but is being used as a type here. Did you mean 'typeof key'? ` In this case, I would instead have to use Pizza[typeof key]` to get the same result as I would from just passing the value directly as `Pizza["toppings"]`. Alternatively, changing `const key` into `type key` would work. Because the index is a type, you can pass a type in as the index. This lets me do things like tell TypeScript: "I want to create a type that could be any one of the items in this array". You would do this by using the type number` as your index access type. For example, if I wanted to create a single `Topping` type from our `Pizza` example, I could do the following: `ts type Topping = Pizza['toppings'][number]; // type Topping = 'pineapple' | 'pepperoni' | 'anchovy' | 'peppers' | 'olives' | 'mushrooms'; ` Creating types with const assertions Sometimes in TypeScript, you'll have some object full of data that you would like to use in a type-safe way. Let's return to our pizza example. Say we're building a web app to let people order our pizzas. Inside our order form, we have a list of toppings. This list of data could include a name, a description, and an extra price. `ts const TOPPINGS = [ { name: 'pineapple', description: 'A delicious tropical fruit', price: 0.50, }, { name: 'pepperoni', description: 'A spicy meat topping', price: 0.75, }, { name: 'anchovy', description: 'A salty fish topping', price: 1.00, }, { name: 'peppers', description: 'A colorful vegetable topping', price: 0.50, }, { name: 'olives', description: 'A salty vegetable topping', price: 0.75, }, { name: 'mushrooms', description: 'A savory vegetable topping', price: 0.50, }, ]; ` Since we've gone through the trouble of writing all of this out, we should use this data to inform the Pizza` type about our toppings. If we don't, it's both a duplication of code (a time-waster) and an opportunity for this data to get out of sync with our `Pizza` type. For a first attempt, you might use the indexed access types we learned about earlier to get each of the topping names: `ts type Topping = typeof TOPPINGS[number]['name']; // type Topping = string; ` But that won't work! TypeScript has widened the type from those literal values to the broader string` type. It doesn't assume that these values can't be changed later on. But it did notice that every `name` in `TOPPINGS` was a string, so it decided that the `string` type was the safest bet. Here, you can see how it would widely interpret the type of any entry in `TOPPINGS`: `ts type Toppings = typeof TOPPINGS[number]; // type Topping1 = { // name: string; // description: string; // price: number; // }[] ` This is a good default, but it's not what we want here. The fix to this problem is easy: const assertions. We can simply append as const` at the end of our `TOPPINGS` declaration. This tells TypeScript that we want to treat everything about this object as literal values that should not be widened. For example: `ts const TOPPINGS = [ { name: 'pineapple', description: 'A delicious tropical fruit', price: 0.50, }, { name: 'pepperoni', description: 'A spicy meat topping', price: 0.75, }, // ... ] as const; // ^ THIS is the important part type Toppings = typeof TOPPINGS[number]; // type Toppings = [ // { // readonly name: "pineapple"; // readonly description: "A delicious tropical fruit"; // readonly price: 0.5; // } | { // readonly name: "pepperoni"; // readonly description: "A spicy meat topping"; // readonly price: 0.75; // } | // And so on... // ]; ` Now we've got a type with all of the literal values from TOPPINGS` as `readonly` properties in our type! From here, we can use indexed access types to create our `Topping` type from the `name` property: `ts const TOPPINGS = [ // All of the toppings... ] as const; type Topping = typeof TOPPINGS[number]['name']; // type Topping = "pineapple" | "pepperoni" | "anchovy" | "peppers" | "olives" | "mushrooms"; ` And we can use this type to inform our Pizza` type: `ts type Pizza = { diameter: number; crust: 'thin' | 'thick' | 'stuffed'; sauce: 'red' | 'white'; toppings: Array; customer: string; } ` Extra type safety with `satisfies` Let's say we're factoring out the available crusts for making our Pizza`. We could start with an array of strings, use a const assertion to use the literal values and avoid widening, and then again use our indexed access types to create a type from that array: `ts const CRUSTS = ['thin', 'thick',, 'stuffed'] as const; type Crust = typeof CRUSTS[number]; // type Crust = 'thin' | 'thick' | 'stuffed' | undefined; ` Well, almost there. Notice that we have an undefined` type in there. That's because we have an extra comma in our array. This is effectively the same as saying `['thin', 'thick', undefined, 'stuffed']`. You could detect the undefined` with type annotations, but that can't be mixed with const assertions. The type cannot both be `string[]` and `readonly ['thin', 'thick', 'stuffed']`. `ts // We can detect the undefined type, but we can't use const assertions const CRUSTS: string[] = ['thin', 'thick',, 'stuffed']; // Type '(string | undefined)[]' is not assignable to type 'string[]' // Or we can use const assertions, but we can't detect the undefined type const CRUSTS = ['thin', 'thick',, 'stuffed'] as const; // const CRUSTS: readonly ["thin", "thick", undefined, "stuffed"] // But we can't do both! const CRUSTS: string[] = ['thin', 'thick', 'stuffed'] as const; // The type 'readonly ["thin", "thick", "stuffed"]' is 'readonly' and cannot be assigned to the mutable type 'string[]'. ` To avoid this issue, we can use satisfies` to confirm that the value conforms to a certain intended shape. In our case, we want to confirm that the array is a tuple of strings. We don't need TypeScript to confirm which strings exactly—only that it matches the intended shape. `ts // We detect the error, but without giving CRUSTS a specific type! const CRUSTS = ['thin', 'thick',, 'stuffed'] satisfies string[]; // Type '(string | undefined)[]' does not satisfy the expected type 'string[]'. ` We can further combine satifies` with `as const` to get the literal values we want while verifying that the array is a tuple of strings: `ts // Assert that the values of the array are literal values that satisfy the type readonly string[] const CRUSTS = ['thin', 'thick', 'stuffed'] as const satisfies readonly string[]; type Crust = typeof CRUSTS[number]; // type Crust = "thin" | "thick" | "stuffed" ` With as const`, we tell TypeScript that it should not widen the inferred type of `CRUSTS` and that we expect it to be the literal values given. And with `satisfies readonly string[]`, we tell TypeScript that `CRUSTS` should satisfy the shape of an array of readonly strings. Now we can't accidentally add an extra comma or other value to the array, and we can still use the literal values from `CRUST` to create new types. Conclusion The combination of indexed access types, const assertions, and the satisfies` operator, give us a lot of power to create types that are more specific, and more accurate. You can use them to transform your data into useful types, rather than attempting to duplicate that information manually, and inevitably having the data and types fall out of sync. This can ultimately save you and your team a lot of time, effort, and headache. If you want to view the examples in this article in a runnable playground, you can find them at the TypeScript playground....

Testing a Fastify app with the NodeJS test runner cover image

Testing a Fastify app with the NodeJS test runner

Introduction Node.js has shipped a built-in test runner for a couple of major versions. Since its release I haven’t heard much about it so I decided to try it out on a simple Fastify API server application that I was working on. It turns out, it’s pretty good! It’s also really nice to start testing a node application without dealing with the hassle of installing some additional dependencies and managing more configurations. Since it’s got my stamp of approval, why not write a post about it? In this post, we will hit the highlights of the testing API and write some basic but real-life tests for an API server. This server will be built with Fastify, a plugin-centric API framework. They have some good documentation on testing that should make this pretty easy. We’ll also add a SQL driver for the plugin we will test. Setup Let's set up our simple API server by creating a new project, adding our dependencies, and creating some files. Ensure you’re running node v20 or greater (Test runner is a stable API as of the 20 major releases) Overview `index.js` - node entry that initializes our Fastify app and listens for incoming http requests on port 3001 `app.js` - this file exports a function that creates and returns our Fastify application instance `sql-plugin.js` - a Fastify plugin that sets up and connects to a SQL driver and makes it available on our app instance Application Code A simple first test For our first test we will just test our servers index route. If you recall from the app.js` code above, our index route returns a 501 response for “not implemented”. In this test, we're using the createApp` function to create a new instance of our Fastify app, and then using the `inject` method from the Fastify API to make a request to the `/` route. We import our test utilities directly from the node. Notice we can pass async functions to our test to use async/await. Node’s assert API has been around for a long time, this is what we are using to make our test assertions. To run this test, we can use the following command: By default the Node.js test runner uses the TAP reporter. You can configure it using other reporters or even create your own custom reporters for it to use. Testing our SQL plugin Next, let's take a look at how to test our Fastify Postgres plugin. This one is a bit more involved and gives us an opportunity to use more of the test runner features. In this example, we are using a feature called Subtests. This simply means when nested tests inside of a top-level test. In our top-level test call, we get a test parameter t` that we call methods on in our nested test structure. In this example, we use `t.beforeEach` to create a new Fastify app instance for each test, and call the `test` method to register our nested tests. Along with `beforeEach` the other methods you might expect are also available: `afterEach`, `before`, `after`. Since we don’t want to connect to our Postgres database in our tests, we are using the available Mocking API to mock out the client. This was the API that I was most excited to see included in the Node Test Runner. After the basics, you almost always need to mock some functions, methods, or libraries in your tests. After trying this feature, it works easily and as expected, I was confident that I could get pretty far testing with the new Node.js core API’s. Since my plugin only uses the end method of the Postgres driver, it’s the only method I provide a mock function for. Our second test confirms that it gets called when our Fastify server is shutting down. Additional features A lot of other features that are common in other popular testing frameworks are also available. Test styles and methods Along with our basic test` based tests we used for our Fastify plugins - `test` also includes `skip`, `todo`, and `only` methods. They are for what you would expect based on the names, skipping or only running certain tests, and work-in-progress tests. If you prefer, you also have the option of using the describe` → `it` test syntax. They both come with the same methods as `test` and I think it really comes down to a matter of personal preference. Test coverage This might be the deal breaker for some since this feature is still experimental. As popular as test coverage reporting is, I expect this API to be finalized and become stable in an upcoming version. Since this isn’t something that’s being shipped for the end user though, I say go for it. What’s the worst that could happen really? Other CLI flags —watch` - https://nodejs.org/dist/latest-v20.x/docs/api/cli.html#--watch —test-name-pattern` - https://nodejs.org/dist/latest-v20.x/docs/api/cli.html#--test-name-pattern TypeScript support You can use a loader like you would for a regular node application to execute TypeScript files. Some popular examples are tsx` and `ts-node`. In practice, I found that this currently doesn’t work well since the test runner only looks for JS file types. After digging in I found that they added support to locate your test files via a glob string but it won’t be available until the next major version release. Conclusion The built-in test runner is a lot more comprehensive than I expected it to be. I was able to easily write some real-world tests for my application. If you don’t mind some of the features like coverage reporting being experimental, you can get pretty far without installing any additional dependencies. The biggest deal breaker on many projects at this point, in my opinion, is the lack of straightforward TypeScript support. This is the test command that I ended up with in my application: I’ll be honest, I stole this from a GitHub issue thread and I don’t know exactly how it works (but it does). If TypeScript is a requirement, maybe stick with Jest or Vitest for now 🙂...