Javascript

Web Scraping with TypeScript and Node.js

Published Apr 21, 2022

Updated Feb 13, 2023

6 min read

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

Sometimes you'll find yourself wanting to use a set of data from a website, but the data won't be available as an API or in a downloadable format like CSV. In these cases, you may have to write a web scraper to download individual web pages and extract the data you want from within their HTML. This guide will teach you the basics of writing a web scraper using TypeScript and Node.js, and will note several of the obstacles you might encounter during web scraping.

If you want to skip straight to the finish code example, check it out on GitHub.

Setup

First things first: we need to initialize our project and install the base dependencies. We'll be writing our web scraper in TypeScript and running it as Node.js scripts using ts-node. For simplicity, we'll create an index.ts file in the project root to work from. From the command line, run the following to get started:

mkdir my-web-scraper && cd my-web-scraper # create project directory
git init # initialize new git repository
echo "node_modules" >> .gitignore # do not track node_modules in git
npm init -y # initialize Node.js project
# install dependencies
npm install typescript ts-node
npm install --save-dev @types/node
touch index.ts # create an empty TypeScript file

Node.js doesn't run TypeScript files natively. Rather than use the TypeScript compiler to output new JavaScript files whenever we want to run the script, we'll use ts-node to run the TypeScript files directly. We'll go ahead and add this to our new package.json file as an npm script.

  "scripts": {
    "scrape": "ts-node ./index.ts"
  }

Now, we'll be able to run our scraper from index.ts with the command npm run scrape.

Fetching Websites

In our examples, we'll be using Axios to make our http requests. If you'd prefer something else, like Node Fetch to match the Fetch API until it's ready in Node.js, that's fine too.

npm install axios

Let's create our first function for fetching a given URL and returning the HTML from that page.

import axios from 'axios';

function fetchPage(url: string): Promise<string | undefined> {
  const HTMLData = axios
    .get(url)
    .then(res => res.data)
    .catch((error: AxiosError) => {
      console.error(`There was an error with ${error.config.url}.`);
      console.error(error.toJSON());
    });

  return HTMLData;
}

This function will use Axios to create a promise to fetch a given URL, and return the HTML it gets back as a string. If there's an error, it will log that error to the console, and return undefined instead. Since you're probably going to be running this scraper from your command line throughout development, a healthy number of console.logs will help you make sure the script is running as expected.

Caching Scraped Pages

In the event that you're trying to scrape many, many static web pages in a single script, you might want to cache the pages locally as you download them. This will save you time and headache as you work on your scraper. You're much less likely to annoy the website you're scraping with high traffic and the bandwidth costs associated with it, and your scripts will probably run faster if they aren't limited by your Internet connection.

Let's go ahead and create a .cache folder in the project root. You probably won't want to keep cached files in your git history, so we'll want to add this folder to your .gitignore file.

mkdir .cache
echo ".cache" >> .gitignore

To cache our results, we'll first check if a cached version of the given page already exists. If so, we'll use that. If not, we'll fetch the page and save it to the .cache folder. For filenames, we're just going to base-64 encode the page's URL. If you prefer some other way to generate a unique filename, that's fine too. I've chosen the base-64 encoded URLs because it's easy and very obviously a temporary sort of file. We also have an optional function argument ignoreCache, in case you've built up your cache but want to scrape fresh data anyway.

import { existsSync, mkdirSync } from 'fs';
import { readFile, writeFile } from 'fs/promises';
import { resolve } from 'path';

async function fetchFromWebOrCache(url: string, ignoreCache = false) {
  // If the cache folder doesn't exist, create it
  if (!existsSync(resolve(__dirname, '.cache'))) {
    mkdirSync('.cache');
  }
  console.log(`Getting data for ${url}...`);
  if (
    !ignoreCache &&
    existsSync(
      resolve(__dirname, `.cache/${Buffer.from(url).toString('base64')}.html`),
    )
  ) {
    console.log(`I read ${url} from cache`);
    const HTMLData = await readFile(
      resolve(__dirname, `.cache/${Buffer.from(url).toString('base64')}.html`),
      { encoding: 'utf8' },
    );
    return HTMLData;
  } else {
    console.log(`I fetched ${url} fresh`);
    const HTMLData = await fetchPage(url);
    if (!ignoreCache && HTMLData) {
      writeFile(
        resolve(
          __dirname,
          `.cache/${Buffer.from(url).toString('base64')}.html`,
        ),
        HTMLData,
        { encoding: 'utf8' },
      );
    }
    return HTMLData;
  }
}

Extracting Data with jsdom

Now that we have HTML to work with, we want to extract the relevant data from it. To do this, we will use jsdom, a JavaScript implementation of the DOM. This will let us interact with the downloaded HTML in the exact same way as if we were working in a browser's console, giving access to methods like querySelector.

(If you prefer a syntax more like jQuery's, Cheerio is also a popular option.)

npm install jsdom
npm install --save-dev @types/jsdom

Now let's import jsdom and use it to return the Document object of our HTML string. Just modify the previous fetchFromWebOrCache to turn HTMLData into a DOM object, and return its window.document.

import { JSDOM } from 'jsdom';

async function fetchFromWebOrCache(url: string, ignoreCache = false) {
  // Get the HTMLData from fetching or from cache
  const HTMLData = '<html>...</html>'
  const dom = new JSDOM(HTMLData);
  return dom.window.document;
}

Now that we're working with a Document instead of a string, we've got access to everything we'd have if we were working in the browser console. This makes it much easier to write code that extracts the pieces of a page that we want! For example, let's scrape whatever is on the front page of Hacker News right now. We'll write a function that accepts the Document of the Hacker News front page, finds all of the links, and gives us back the link text and URL as a JavaScript object.

Using your browser's developer tools, you can easily inspect an element on the page with desired data to figure out a selector path. In our example, we can right-click a link and choose Inspect to view it in DevTools. Then we right-click the DOM element, and choose "Copy > Copy selector" in Chrome or "Copy > CSS Selector" in Firefox, for example.

A copied selector will give you a string of text that selects only the element you copied it from in DevTools. And often that is useful! Just throw your selector into document.querySelector('selector'), and you're good to go. But in our case, we want all of the front page links. So we need a broader selector than copy-pasting from DevTools will give us. This is where you'll have to actually read through the HTML, classes, ids, etc., to figure out how to craft the right selector.

Fortunately for us in this example, all of the links on the Hacker News feed have a unique class: titlelink. So we can use document.querySelectorAll('a.titlelink') to get all of them.

// Pass the scraped Document from news.ycombinator.com to this
// function to extract data about front page links.
function extractData(document: Document) {
  const writingLinks: HTMLAnchorElement[] = Array.from(
    document.querySelectorAll('a.titlelink'),
  );
  return writingLinks.map(link => {
    return {
      title: link.text,
      url: link.href,
    };
  });
}

This function is only a simple example, and would be different depending on what you want to get out of a page. When working with jsdom, remember that you're not working with arrays and objects but with NodeLists and Elements. To get useful data out of your selections, you'll often have to do things like convert a NodeList into an array as shown above.

Sometimes you'll have to get creative with your selections. I recently tried to scrape the information from an HTML table on a page with varying numbers of tables and no classes. Because the number of tables was always different, I couldn't reliably select from a list of tables by which number table it was. I had to select every table present on a page, then filter them by the text in the first cell to get precisely the one table I needed!

// Sometimes, web scraping is just hard...
const table: HTMLTableElement = Array.from(
    data.querySelectorAll('table'),
  ).filter(t =>
    t.children[0].children[0].children[0].innerHTML.match(
      /Unique Text in First Cell which IDs the Table/,
    ),
  )[0];

Extracting Data with Regular Expressions

Unfortunately for us, not all pages on the Internet are well-structured and ready for scraping. Sometimes, they don't even try to use HTML tags properly. In these sad cases, you may need to turn to regular expressions (regex) to extract what you need. We won't need to resort to such extreme measures in our example of scraping Hacker News, but it's worth knowing that you might need to do this.

I'll give you a contrived example where you would need some regex, based on another site I recently scraped. Imagine the following badly-done HTML:

<div class="pokemon">
  Name: Pikachu<br />
  Number: 25<br />
  Type: Electric<br />
  Weakness: Ground
</div>

The various data attributes we care about aren't wrapped by their own HTML elements! Everything is just inside a div with some br tags to create line breaks. If I wanted to extract the data from this, I could use regex to find and match the text and patterns I expect to find. This can require trial and error, and I recommend using a tool like regex101 to test the regular expressions you come up with. In this example, we might write the following code:

const rawPokemonHTML = document.querySelector('.pokemon');
const name = rawPokemonHTML.match(/Name: (\w+)/)[0];
const num = rawPokemonHTML.match(/Number: (\d+)/)[0];
// etc...

Saving Data

Once we've extracted our data from the HTML, we'll want to save it. This is basically the same as when we created a cache for the downloaded HTML files.

import { existsSync, mkdirSync } from 'fs';
import { writeFile } from 'fs/promises';
import { resolve } from 'path';

function saveData(filename: string, data: any) {
  if (!existsSync(resolve(__dirname, 'data'))) {
    mkdirSync('data');
  }
  writeFile(resolve(__dirname, `data/${filename}.json`), JSON.stringify(data), {
    encoding: 'utf8',
  });
}

Putting It All Together

Now that we've got all the necessary pieces, we're ready to build our JSON file of Hacker News front page stories. To see all of our code in one piece, check it out on GitHub.

async function getData() {
  const document = await fetchFromWebOrCache(
    'https://news.ycombinator.com/',
    true, // Hacker News is always changing, so ignore the cache!
  );
  const data = extractData(document);
  saveData('hacker-news-links', data);
}

getData();

When we run our script from the command line, it will execute getData(). That function will fetch the HTML from Hacker News' front page, extract all of the links and their titles, and then save it to data/hacker-news-links.json. And while you probably don't need a list of links from Hacker News, this information should be enough to get you started with collecting some data from the web which you do care about.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

About the author(s)

Tom VanAntwerp
@tvanantwerp @tvanantwerp

The 2025 Guide to JS Build Tools

The 2025 Guide to JS Build Tools In 2025, we're seeing the largest number of JavaScript build tools being actively maintained and used in history. Over the past few years, we've seen the trend of many build tools being rewritten or forked to use a faster and more efficient language like Rust and Go. In the last year, new companies have emerged, even with venture capital funding, with the goal of working on specific sets of build tools. Void Zero is one such recent example. With so many build tools around, it can be difficult to get your head around and understand which one is for what. Hopefully, with this blog post, things will become a bit clearer. But first, let's explain some concepts. Concepts When it comes to build tools, there is no one-size-fits-all solution. Each tool typically focuses on one or two primary features, and often relies on other tools as dependencies to accomplish more. While it might be difficult to explain here all of the possible functionalities a build tool might have, we've attempted to explain some of the most common ones so that you can easily understand how tools compare. Minification The concept of minification has been in the JavaScript ecosystem for a long time, and not without reason. JavaScript is typically delivered from the server to the user's browser through a network whose speed can vary. Thus, there was a need very early in the web development era to compress the source code as much as possible while still making it executable by the browser. This is done through the process of *minification*, which removes unnecessary whitespace, comments, and uses shorter variable names, reducing the total size of the file. This is what an unminified JavaScript looks like: ` This is the same file, minified: ` Closely related to minimizing is the concept of source maps#Source_mapping), which goes hand in hand with minimizing - source maps are essentially mappings between the minified file and the original source code. Why is that needed? Well, primarily for debugging minified code. Without source maps, understanding errors in minified code is nearly impossible because variable names are shortened, and all formatting is removed. With source maps, browser developer tools can help you debug minified code. Tree-Shaking *Tree-shaking* was the next-level upgrade from minification that became possible when ES modules were introduced into the JavaScript language. While a minified file is smaller than the original source code, it can still get quite large for larger apps, especially if it contains parts that are effectively not used. Tree shaking helps eliminate this by performing a static analysis of all your code, building a dependency graph of the modules and how they relate to each other, which allows the bundler to determine which exports are used and which are not. Once unused exports are found, the build tool will remove them entirely. This is also called *dead code elimination*. Bundling Development in JavaScript and TypeScript rarely involves a single file. Typically, we're talking about tens or hundreds of files, each containing a specific part of the application. If we were to deliver all those files to the browser, we would overwhelm both the browser and the network with many small requests. *Bundling* is the process of combining multiple JS/TS files (and often other assets like CSS, images, etc.) into one or more larger files. A bundler will typically start with an entry file and then recursively include every module or file that the entry file depends on, before outputting one or more files containing all the necessary code to deliver to the browser. As you might expect, a bundler will typically also involve minification and tree-shaking, as explained previously, in the process to deliver only the minimum amount of code necessary for the app to function. Transpiling Once TypeScript arrived on the scene, it became necessary to translate it to JavaScript, as browsers did not natively understand TypeScript. Generally speaking, the purpose of a *transpiler* is to transform one language into another. In the JavaScript ecosystem, it's most often used to transpile TypeScript code to JavaScript, optionally targeting a specific version of JavaScript that's supported by older browsers. However, it can also be used to transpile newer JavaScript to older versions. For example, arrow functions, which are specified in ES6, are converted into regular function declarations if the target language is ES5. Additionally, a transpiler can also be used by modern frameworks such as React to transpile JSX syntax (used in React) into plain JavaScript. Typically, with transpilers, the goal is to maintain similar abstractions in the target code. For example, transpiling TypeScript into JavaScript might preserve constructs like loops, conditionals, or function declarations that look natural in both languages. Compiling While a transpiler's purpose is to transform from one language to another without or with little optimization, the purpose of a *compiler* is to perform more extensive transformations and optimizations, or translate code from a high-level programming language into a lower-level one such as bytecode. The focus here is on optimizing for performance or resource efficiency. Unlike transpiling, compiling will often transform abstractions so that they suit the low-level representation, which can then run faster. Hot-Module Reloading (HMR) *Hot-module reloading* (HMR) is an important feature of modern build tools that drastically improves the developer experience while developing apps. In the early days of the web, whenever you'd make a change in your source code, you would need to hit that refresh button on the browser to see the change. This would become quite tedious over time, especially because with a full-page reload, you lose all the application state, such as the state of form inputs or other UI components. With HMR, we can update modules in real-time without requiring a full-page reload, speeding up the feedback loop for any changes made by developers. Not only that, but the full application state is typically preserved, making it easier to test and iterate on code. Development Server When developing web applications, you need to have a locally running development server set up on something like http://localhost:3000. A development server typically serves unminified code to the browser, allowing you to easily debug your application. Additionally, a development server will typically have hot module replacement (HMR) so that you can see the results on the browser as you are developing your application. The Tools Now that you understand the most important features of build tools, let's take a closer look at some of the popular tools available. This is by no means a complete list, as there have been many build tools in the past that were effective and popular at the time. However, here we will focus on those used by the current popular frameworks. In the table below, you can see an overview of all the tools we'll cover, along with the features they primarily focus on and those they support secondarily or through plugins. The tools are presented in alphabetical order below. Babel Babel, which celebrated its 10th anniversary since its initial release last year, is primarily a JavaScript transpiler used to convert modern JavaScript (ES6+) into backward-compatible JavaScript code that can run on older JavaScript engines. Traditionally, developers have used it to take advantage of the newer features of the JavaScript language without worrying about whether their code would run on older browsers. esbuild esbuild, created by Evan Wallace, the co-founder and former CTO of Figma, is primarily a bundler that advertises itself as being one of the fastest bundlers in the market. Unlike all the other tools on this list, esbuild is written in Go. When it was first released, it was unusual for a JavaScript bundler to be written in a language other than JavaScript. However, this choice has provided significant performance benefits. esbuild supports ESM and CommonJS modules, as well as CSS, TypeScript, and JSX. Unlike traditional bundlers, esbuild creates a separate bundle for each entry point file. Nowadays, it is used by tools like Vite and frameworks such as Angular. Metro Unlike other build tools mentioned here, which are mostly web-focused, Metro's primary focus is React Native. It has been specifically optimized for bundling, transforming, and serving JavaScript and assets for React Native apps. Internally, it utilizes Babel as part of its transformation process. Metro is sponsored by Meta and actively maintained by the Meta team. Oxc The JavaScript Oxidation Compiler, or Oxc, is a collection of Rust-based tools. Although it is referred to as a compiler, it is essentially a toolchain that includes a parser, linter, formatter, transpiler, minifier, and resolver. Oxc is sponsored by Void Zero and is set to become the backbone of other Void Zero tools, like Vite. Parcel Feature-wise, Parcel covers a lot of ground (no pun intended). Largely created by Devon Govett, it is designed as a zero-configuration build tool that supports bundling, minification, tree-shaking, transpiling, compiling, HMR, and a development server. It can utilize all the necessary types of assets you will need, from JavaScript to HTML, CSS, and images. The core part of it is mostly written in JavaScript, with a CSS transformer written in Rust, whereas it delegates the JavaScript compilation to a SWC. Likewise, it also has a large collection of community-maintained plugins. Overall, it is a good tool for quick development without requiring extensive configuration. Rolldown Rolldown is the future bundler for Vite, written in Rust and built on top of Oxc, currently leveraging its parser and resolver. Inspired by Rollup (hence the name), it will provide Rollup-compatible APIs and plugin interface, but it will be more similar to esbuild in scope. Currently, it is still in heavy development and it is not ready for production, but we should definitely be hearing more about this bundler in 2025 and beyond. Rollup Rollup is the current bundler for Vite. Originally created by Rich Harris, the creator of Svelte, Rollup is slowly becoming a veteran (speaking in JavaScript years) compared to other build tools here. When it originally launched, it introduced novel ideas focused on ES modules and tree-shaking, at the time when Webpack as its competitor was becoming too complex due to its extensive feature set - Rollup promised a simpler way with a straightforward configuration process that is easy to understand. Rolldown, mentioned previously, is hoped to become a replacement for Rollup at some point. Rsbuild Rsbuild is a high-performance build tool written in Rust and built on top of Rspack. Feature-wise, it has many similiarities with Vite. Both Rsbuild and Rspack are sponsored by the Web Infrastructure Team at ByteDance, which is a division of ByteDance, the parent company of TikTok. Rsbuild is built as a high-level tool on top of Rspack that has many additional features that Rspack itself doesn't provide, such as a better development server, image compression, and type checking. Rspack Rspack, as the name suggests, is a Rust-based alternative to Webpack. It offers a Webpack-compatible API, which is helpful if you are familiar with setting up Webpack configurations. However, if you are not, it might have a steep learning curve. To address this, the same team that built Rspack also developed Rsbuild, which helps you achieve a lot with out-of-the-box configuration. Under the hood, Rspack uses SWC for compiling and transpiling. Feature-wise, it’s quite robust. It includes built-in support for TypeScript, JSX, Sass, Less, CSS modules, Wasm, and more, as well as features like module federation, PostCSS, Lightning CSS, and others. Snowpack Snowpack was created around the same time as Vite, with both aiming to address similar needs in modern web development. Their primary focus was on faster build times and leveraging ES modules. Both Snowpack and Vite introduced a novel idea at the time: instead of bundling files while running a local development server, like traditional bundlers, they served the app unbundled. Each file was built only once and then cached indefinitely. When a file changed, only that specific file was rebuilt. For production builds, Snowpack relied on external bundlers such as Webpack, Rollup, or esbuild. Unfortunately, Snowpack is a tool you’re likely to hear less and less about in the future. It is no longer actively developed, and Vite has become the recommended alternative. SWC SWC, which stands for Speedy Web Compiler, can be used for both compilation and bundling (with the help of SWCpack), although compilation is its primary feature. And it really is speedy, thanks to being written in Rust, as are many other tools on this list. Primarily advertised as an alternative to Babel, its SWC is roughly 20x faster than Babel on a single thread. SWC compiles TypeScript to JavaScript, JSX to JavaScript, and more. It is used by tools such as Parcel and Rspack and by frameworks such as Next.js, which are used for transpiling and minification. SWCpack is the bundling part of SWC. However, active development within the SWC ecosystem is not currently a priority. The main author of SWC now works for Turbopack by Vercel, and the documentation states that SWCpack is presently not in active development. Terser Terser has the smallest scope compared to other tools from this list, but considering that it's used in many of those tools, it's worth separating it into its own section. Terser's primary role is minification. It is the successor to the older UglifyJS, but with better performance and ES6+ support. Vite Vite is a somewhat of a special beast. It's primarily a development server, but calling it just that would be an understatement, as it combines the features of a fast development server with modern build capabilities. Vite shines in different ways depending on how it's used. During development, it provides a fast server that doesn't bundle code like traditional bundlers (e.g., Webpack). Instead, it uses native ES modules, serving them directly to the browser. Since the code isn't bundled, Vite also delivers fast HMR, so any updates you make are nearly instant. Vite uses two bundlers under the hood. During development, it uses esbuild, which also allows it to act as a TypeScript transpiler. For each file you work on, it creates a file for the browser, allowing an easy separation between files which helps HMR. For production, it uses Rollup, which generates a single file for the browser. However, Rollup is not as fast as esbuild, so production builds can be a bit slower than you might expect. (This is why Rollup is being rewritten in Rust as Rolldown. Once complete, you'll have the same bundler for both development and production.) Traditionally, Vite has been used for client-side apps, but with the new Environment API released in Vite 6.0, it bridges the gap between client-side and server-rendered apps. Turbopack Turbopack is a bundler, written in Rust by the creators of webpack and Next.js at Vercel. The idea behind Turbopack was to do a complete rewrite of Webpack from scratch and try to keep a Webpack compatible API as much as possible. This is not an easy feat, and this task is still not over. The enormous popularity of Next.js is also helping Turbopack gain traction in the developer community. Right now, Turbopack is being used as an opt-in feature in Next.js's dev server. Production builds are not yet supported but are planned for future releases. Webpack And finally we arrive at Webpack, the legend among bundlers which has had a dominant position as the primary bundler for a long time. Despite the fact that there are so many alternatives to Webpack now (as we've seen in this blog post), it is still widely used, and some modern frameworks such as Next.js still have it as a default bundler. Initially released back in 2012, its development is still going strong. Its primary features are bundling, code splitting, and HMR, but other features are available as well thanks to its popular plugin system. Configuring Webpack has traditionally been challenging, and since it's written in JavaScript rather than a lower-level language like Rust, its performance lags behind compared to newer tools. As a result, many developers are gradually moving away from it. Conclusion With so many build tools in today's JavaScript ecosystem, many of which are similarly named, it's easy to get lost. Hopefully, this blog post was a useful overview of the tools that are most likely to continue being relevant in 2025. Although, with the speed of development, it may as well be that we will be seeing a completely different picture in 2026!...

Feb 14, 2025

12 mins

JavaScriptDevTools

Incremental Hydration in Angular

Incremental Hydration in Angular Some time ago, I wrote a post about SSR finally becoming a first-class citizen in Angular. It turns out that the Angular team really treats SSR as a priority, and they have been working tirelessly to make SSR even better. As the previous blog post mentioned, full-page hydration was launched in Angular 16 and made stable in Angular 17, providing a great way to improve your Core Web Vitals. Another feature aimed to help you improve your INP and other Core Web Vitals was introduced in Angular 17: deferrable views. Using the @defer blocks allows you to reduce the initial bundle size and defer the loading of heavy components based on certain triggers, such as the section entering the viewport. Then, in September 2024, the smart folks at Angular figured out that they could build upon those two features, allowing you to mark parts of your application to be server-rendered dehydrated and then hydrate them incrementally when needed - hence incremental hydration. I’m sure you know what hydration is. In short, the server sends fully formed HTML to the client, ensuring that the user sees meaningful content as quickly as possible and once JavaScript is loaded on the client side, the framework will reconcile the rendered DOM with component logic, event handlers, and state - effectively hydrating the server-rendered content. But what exactly does "dehydrated" mean, you might ask? Here's what will happen when you mark a part of your application to be incrementally hydrated: 1. Server-Side Rendering (SSR): The content marked for incremental hydration is rendered on the server. 2. Skipped During Client-Side Bootstrapping: The dehydrated content is not initially hydrated or bootstrapped on the client, reducing initial load time. 3. Dehydrated State: The code for the dehydrated components is excluded from the initial client-side bundle, optimizing performance. 4. Hydration Triggers: The application listens for specified hydration conditions (e.g., on interaction, on viewport), defined with a hydrate trigger in the @defer block. 5. On-Demand Hydration: Once the hydration conditions are met, Angular downloads the necessary code and hydrates the components, allowing them to become interactive without layout shifts. How to Use Incremental Hydration Thanks to Mark Thompson, who recently hosted a feature showcase on incremental hydration, we can show some code. The first step is to enable incremental hydration in your Angular application's appConfig using the provideClientHydration provider function: ` Then, you can mark the components you want to be incrementally hydrated using the @defer block with a hydrate trigger: ` And that's it! You now have a component that will be server-rendered dehydrated and hydrated incrementally when it becomes visible to the user. But what if you want to hydrate the component on interaction or some other trigger? Or maybe you don't want to hydrate the component at all? The same triggers already supported in @defer blocks are available for hydration: - idle: Hydrate once the browser reaches an idle state. - viewport: Hydrate once the component enters the viewport. - interaction: Hydrate once the user interacts with the component through click or keydown triggers. - hover: Hydrate once the user hovers over the component. - immediate: Hydrate immediately when the component is rendered. - timer: Hydrate after a specified time delay. - when: Hydrate when a provided conditional expression is met. And on top of that, there's a new trigger available for hydration: - never: When used, the component will remain static and not hydrated. The never trigger is handy when you want to exclude a component from hydration altogether, making it a completely static part of the page. Personally, I'm very excited about this feature and can't wait to try it out. How about you?...

Mar 14, 2025

3 mins

AngularJavaScript

Upgrading from Astro 2 to Astro 4

Upgrading from Astro 2 to Astro 4 Astro is building fast. Right on the heels of their version 3 launch on August 30th, Astro version 4 launched on December 6th, 2023. They've built so fast, that I didn't even have a chance to try 3 before 4 came out! But the short time between versions makes sense, because the two releases are very complementary. Many Astro features introduced in version 3 as experimental are made stable by version 4. If, like me, you're looking at a two-version upgrade, here's what you need to know about Astro 3 and 4 combined. View Transitions Astro makes it easy to include animated transitions between routes and components with the component. You can add it to the of specific pages, or to your site-wide to be enabled across the entire site. No configuration is required, but you'll probably still want to configure it. Adding between pages effectively turns your site into a single-page application, animating in and out content on route change rather than downloading a new statically generated HTML document. You can further customize how specific components animate in and out with the transition:animate property. If you don't want client side routing for a specific link, you can opt out for that link with the data-astro-reload property. Image Optimization The way Astro works with images has changed a lot from version 2. If you were using @astrojs/image, then updating how you handle images is probably going to be the most time-consuming part of your Astro migration. Astro's and components have had API changes, which will require you to make changes in your usage of them. You should definitely check out the full image migration guide in the Astro docs for all of the details. But some of the details you should be aware of: - @astrojs/image is out, astro:assets is in - You get image optimization when you import images inside /src. This can change your entire image src referencing strategy. Optimization only works for images imported from inside /src, so you might want to relocate images you've been keeping inside the /public directory. - Importing an image file no longer returns the path as a string, and an ImageMetadata object with src, width, height, and format properties. If you need the previous behavior, add ?url to the import path. - Markdown documents can reference image paths inside /src for automatic image optimization, no need to use in MDX nor reference the root relative paths of images in the /public directory. - The and components have changes to properties. For example, aspectRatio is no longer a valid property because it is inferred from the width and height of images. A new pictureAttributes property lets you do things like add a CSS style string. - You can use a helper image schema validator in your content collection schemas. Dev Toolbar The Dev Toolbar is a new local development feature to help developers work with their interactive islands and to integrate with other tools. The Inspect option highlights what parts of the page are interactive, and lets you examine them. You can view their props and even open them directly in your editor. Audit checks for accessibility issues, such as missing alt attributes in images. And the Menu lets you use specific integrations. Currently only Storyblok and spotlight are available, but you can expect more integrations in the future. And if you don't want to wait, you can also extend the Dev Toolbar yourself with their API. If you don't like the Dev Toolbar, you can turn it off in the CLI with astro preferences disable devToolbar. Conclusion Astro has added a lot of cool features in 2 major back-to-back releases, and you should absolutely consider upgrading if you're still on version 2. Be prepared to modify how you've handled images, but also get excited to play with view transitions!...

Dec 22, 2023

3 mins

Astro

Next.js + MongoDB Connection Storming

Building a Next.js application connected to MongoDB can feel like a match made in heaven. MongoDB stores all of its data as JSON objects, which don’t require transformation into JavaScript objects like relational SQL data does. However, when deploying your application to a serverless production environment such as Vercel, it is crucial to manage your database connections properly. If you encounter errors like these, you may be experiencing Connection Storming: * MongoServerSelectionError: connect ECONNREFUSED <IP_ADDRESS>:<PORT> * MongoNetworkError: failed to connect to server [<hostname>:<port>] on first connect * MongoTimeoutError: Server selection timed out after <x> ms * MongoTopologyClosedError: Topology is closed, please connect * Mongo Atlas: Connections % of configured limit has gone above 80 Connection storming occurs when your application has to mount a connection to Mongo for every serverless function or API endpoint call. Vercel executes your application’s code in a highly concurrent and isolated fashion. So, if you create new database connections on each request, your app might quickly exceed the connection limit of your database. We can leverage Vercel’s fluid compute model to keep our database connection objects warm across function invocations. Traditional serverless architecture was designed for quick, stateless web app transactions. Now, especially with the rise of LLM-oriented applications built with Next.js, interactions with applications are becoming more sequential. We just need to ensure that we assign our MongoDB connection to a global variable. Protip: Use global variables Vercel’s fluid compute model means all memory, including global constants like a MongoDB client, stays initialized between requests as long as the instance remains active. By assigning your MongoDB client to a global constant, you avoid redundant setup work and reduce the overhead of cold starts. This enables a more efficient approach to reusing connections for your application’s MongoDB client. The example below demonstrates how to retrieve an array of users from the users collection in MongoDB and either return them through an API request to /api/users or render them as an HTML list at the /users route. To support this, we initialize a global clientPromise variable that maintains the MongoDB connection across warm serverless executions, avoiding re-initialization on every request. ` Using this database connection in your API route code is easy: ` You can also use this database connection in your server-side rendered React components. ` In serverless environments like Vercel, managing database connections efficiently is key to avoiding connection storming. By reusing global variables and understanding the serverless execution model, you can ensure your Next.js app remains stable and performant....

Jul 11, 2025

3 mins

NextJSJavaScript

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Web Scraping with TypeScript and Node.js

Setup

Fetching Websites

Caching Scraped Pages

Extracting Data with jsdom

Extracting Data with Regular Expressions

Saving Data

Putting It All Together

Tom VanAntwerp

You might also like

The 2025 Guide to JS Build Tools

Incremental Hydration in Angular

Upgrading from Astro 2 to Astro 4

Next.js + MongoDB Connection Storming

Let's innovate together!

You might also like

The 2025 Guide to JS Build Tools

Incremental Hydration in Angular

Upgrading from Astro 2 to Astro 4

Next.js + MongoDB Connection Storming