Skip to content

This Dot Blog

This Dot provides teams with technical leaders who bring deep knowledge of the web platform. We help teams set new standards, and deliver results predictably.

Newest First
Tags: NodeJS
Testing a Fastify app with the NodeJS test runner cover image

Testing a Fastify app with the NodeJS test runner

ant to simplify your testing process? Our new blog post on Node.js' built-in test runner is a great place to start. Learn how to test Fastify apps, including practical examples for testing an API server and SQL plugins....

Bun v1.0 cover image

Bun v1.0

On September 8, 2023, Bun version 1 was released as the first production-ready version of Bun, a fast, all-in-one toolkit for running, building, testing, and debugging JavaScript and TypeScript. Why a new JS runtime You may ask, we already have Node and Deno, so why would we need another javascript runtime, Well yes we had Node for a very long time, but developers face a lot of problems with it, and maybe the first problem is because it’s there for a very long time, it has been changing a lot between different versions and one of the biggest nightmares for JavaScript developers these days is upgrading the node version. Also, Node lacks support for Typescriptt. Zig programming language One of the main reasons that Bun is faster than Node, is the programming language it has been built with which is Zig. Zig is a very fast programming language, even faster than C) (here is some benchmarks), it focuses on performance and memory control. The reasons it’s faster than C is because of the LLVM optimizations it has, and also the way it handles the undefined behavior under the hood Developer Experience Bun delivers a better developer experience than Node on many levels. First, it’s almost fully compatible with Node so you can use Node packages without any issues. Also, you don’t need to worry about JS Common and ES Modules anymore, you can use both in the same file, yup you read that right, for example: ` Also, it has a built-in test framework similar to Jest or Vitest in the project so no need to install a different test framework with different bundler in the same project like Webpack or Vite ` Also, it supports JSX out-of-the-box ` Also, Bun has the fastest javascript package manager and the most efficient you can find as of the time of this post ` Bun Native APIs Bun supports the Node APIs but also they have fun and easy APIs to work with like * Bun.serve() : to create HTTP server * Bun.file() : to read and write the file system * Bun. password.hash(): to hash passwords * Bun.build(): to bundle files for the browser * Bun.FileSystemRouter(): a file system router And many more features Plugin system Bun also has an amazing plugin system that allows developers to create their own plugins and add them to the Bun ecosystem. ` Conclusion Bun is a very promising project, and it’s still in the early stages, but it has a lot of potential to be the next big thing in the JavaScript world. It’s fast, easy to use, and has a lot of amazing features. I’m very excited to see what the future holds for Bun and I’m sure it will be a very successful project....

How to automatically deploy your full-stack JavaScript app with AWS CodePipeline cover image

How to automatically deploy your full-stack JavaScript app with AWS CodePipeline

In our previous blog post we set up a horizontally scalable deployment for our full-stack javascript app. In this article we would like to show you how to set up AWS CodePipeline to automatically deploy changes to the application....

How to host a full-stack app with AWS CloudFront and Elastic Beanstalk cover image

How to host a full-stack app with AWS CloudFront and Elastic Beanstalk

You have an SPA with a NestJS back-end. What if your app is a hit? You need to be prepared to serve thousands of users? You might need to scale your API horizontally, which means you need to have more instances running behind a load balancer....

Building a Multi-Response Streaming API with Node.js, Express, and React cover image

Building a Multi-Response Streaming API with Node.js, Express, and React

Introduction As web applications become increasingly complex and data-driven, efficient and effective data transfer methods become critically important. A streaming API that can send multiple responses to a single request can be a powerful tool for handling large amounts of data or for delivering real-time updates. In this article, we will guide you through the process of creating such an API. We will use video streaming as an illustrative example. With their large file sizes and the need for flexible, on-demand delivery, videos present a fitting scenario for showcasing the power of multi-response streaming APIs. The backend will be built with Node.js and Express, utilizing HTTP range requests to facilitate efficient data delivery in chunks. Next, we'll build a React front-end to interact with our streaming API. This front-end will handle both the display of the streamed video content and its download, offering users real-time progress updates. By the end of this walkthrough, you will have a working example of a multi-response streaming API, and you will be able to apply the principles learned to a wide array of use cases beyond video streaming. Let's jump right into it! Hands-On Implementing the Streaming API in Express In this section, we will dive into the server-side implementation, specifically our Node.js and Express application. We'll be implementing an API endpoint to deliver video content in a streaming fashion. Assuming you have already set up your Express server with TypeScript, we first need to define our video-serving route. We'll create a GET endpoint that, when hit, will stream a video file back to the client. Please make sure to install cors for handling cross-origin requests, dotenv for loading environment variables, and throttle for controlling the rate of data transfer. You can install these with the following command: ` ` In the code snippet above, we are implementing a basic video streaming server that responds to HTTP range requests. Here's a brief overview of the key parts: 1. File and Range Setup: We start by determining the path to the video file and getting the file size. We also grab the range header from the request, which contains the range of bytes the client is requesting. 2. Range Requests Handling: If a range is provided, we extract the start and end bytes from the range header, then create a read stream for that specific range. This allows us to stream a portion of the file rather than the entire thing. 3. Response Headers: We then set up our response headers. In the case of a range request, we send back a '206 Partial Content' status along with information about the byte range and total file size. For non-range requests, we simply send back the total file size and the file type. 4. Data Streaming: Finally, we pipe the read stream directly to the response. This step is where the video data actually gets sent back to the client. The use of pipe() here automatically handles backpressure, ensuring that data isn't read faster than it can be sent to the client. With this setup in place, our streaming server is capable of efficiently delivering large video files to the client in small chunks, providing a smoother user experience. Implementing the Download API in Express Now, let's add another endpoint to our Express application, which will provide more granular control over the data transfer process. We'll set up a GET endpoint for '/download', and within this endpoint, we'll handle streaming the video file to the client for download. ` This endpoint has a similar setup to the video streaming endpoint, but it comes with a few key differences: 1. Response Headers: Here, we include a 'Content-Disposition' header with an 'attachment' directive. This header tells the browser to present the file as a downloadable file named 'video.mp4'. 2. Throttling: We use the 'throttle' package to limit the data transfer rate. Throttling can be useful for simulating lower-speed connections during testing, or for preventing your server from getting overwhelmed by data transfer operations. 3. Data Writing: Instead of directly piping the read stream to the response, we attach 'data' and 'end' event listeners to the throttled stream. On the 'data' event, we manually write each chunk of data to the response, and on the 'end' event, we close the response. This implementation provides a more hands-on way to control the data transfer process. It allows for the addition of custom logic to handle events like pausing and resuming the data transfer, adding custom transformations to the data stream, or handling errors during transfer. Utilizing the APIs: A React Application Now that we have a server-side setup for video streaming and downloading, let's put these APIs into action within a client-side React application. Note that we'll be using Tailwind CSS for quick, utility-based styling in our components. Our React application will consist of a video player that uses the video streaming API, a download button to trigger the download API, and a progress bar to show the real-time download progress. First, let's define the Video Player component that will play the streamed video: ` In the above VideoPlayer component, we're using an HTML5 video tag to handle video playback. The src attribute of the source tag is set to the video endpoint of our Express server. When this component is rendered, it sends a request to our video API and starts streaming the video in response to the range requests that the browser automatically makes. Next, let's create the DownloadButton component that will handle the video download and display the download progress: ` In this DownloadButton component, when the download button is clicked, it sends a fetch request to our download API. It then uses a while loop to continually read chunks of data from the response as they arrive, updating the download progress until the download is complete. This is an example of more controlled handling of multi-response APIs where we are not just directly piping the data, but instead, processing it and manually sending it as a downloadable file. Bringing It All Together Let's now integrate these components into our main application component. ` In this simple App component, we've included our VideoPlayer and DownloadButton components. It places the video player and download button on the screen in a neat, centered layout thanks to Tailwind CSS. Here is a summary of how our system operates: - The video player makes a request to our Express server as soon as it is rendered in the React application. Our server handles this request, reading the video file and sending back the appropriate chunks as per the range requested by the browser. This results in the video being streamed in our player. - When the download button is clicked, a fetch request is sent to our server's download API. This time, the server reads the file, but instead of just piping the data to the response, it controls the data sending process. It sends chunks of data and also logs the sent chunks for monitoring purposes. The React application collects these chunks and concatenates them, displaying the download progress in real-time. When all chunks are received, it compiles them into a Blob and triggers a download in the browser. This setup allows us to build a full-featured video streaming and downloading application with fine control over the data transmission process. To see this system in action, you can check out this video demo. Conclusion While the focus of this article was on video streaming and downloading, the principles we discussed here extend beyond just media files. The pattern of responding to HTTP range requests is common in various data-heavy applications, and understanding it can be a useful tool in your web development arsenal. Finally, remember that the code shown in this article is just a simple example to demonstrate the concepts. In a real-world application, you would want to add proper error handling, validation, and possibly some form of access control depending on your use case. I hope this article helps you in your journey as a developer. Building something yourself is the best way to learn, so don't hesitate to get your hands dirty and start coding!...

Implementing a Task Scheduler in Node Using Redis cover image

Implementing a Task Scheduler in Node Using Redis

Node.js and Redis are often used together to build scalable and high-performing applications. Although Redis has always been primarily an in-memory data store that allows for fast and efficient data access, over time, it has gained many useful features, and nowadays it can be used for things like rate limiting, session management, or queuing. With its excellent support for sorted sets, one feature to be added to that list can also be task scheduling. Node.js doesn't have support for any kind of task scheduling other than the built-in setInterval() and setTimeout() functions, which are quite simple and don't have task queuing mechanisms. There are third-party packages like node-schedule and node-cron of course. But what if you wanted to understand how this could work under the hood? This blog post will show you how to build your own scheduler from scratch. Redis Sorted Sets Redis has a structure called _sorted sets_, a powerful data structure that allows developers to store data that is both ordered and unique, which is useful in many different use cases such as ranking, scoring, and sorting data. Since their introduction in 2009, sorted sets have become one of the most widely used and powerful data structures in Redis. To add some data to a sorted set, you would need to use the ZADD command, which accepts three parameters: the name of the sorted set, the name of the member, and the score to associate with that member. When having multiple members, each with its own score, Redis will sort them by score. This is incredibly useful for implementing leaderboard-like lists. In our case, if we use a timestamp as a score, this means that we can order sorted set members by date, effectively implementing a queue where members with the most recent timestamp are at the top of the list. If the member name is a task identifier, and the timestamp is the time at which we want the task to be executed, then implementing a scheduler would mean reading the sorted list, and just grabbing whatever task we find at the top! The Algorithm Now that we understand the capabilities of Redis sorted sets, we can draft out a rough algorithm that will be implemented by our Node scheduler. Scheduling Tasks The scheduling task piece would include adding a task to the sorted set, and adding task data to the global set using the task identifier as the key. The steps are as follows: 1. Generate an identifier for the submitted task using the INCR command. This command will get the next integer sequence each time it's called. 2. Use the SET command to set task data in the global set. The SET command accepts a key and a string. The key must be unique, therefore it can be something like task:${taskId}, while the value can be a JSON representation of the task data. 3. Use the ZADD command to add the task identifier and the timestamp to the sorted set. The name of the sorted set can be something simple like sortedTasks, while the set member can be the task identifier and the score is the timestamp. Processing Tasks The processing part is an endless loop that checks if there are any tasks to process, otherwise it waits for a predefined interval before trying again. The algorithm can be as follows: 1. Check if we are still allowed to run. We need a way to stop the loop if we want to stop the scheduler. This can be a simple boolean flag in the code. 2. Use the ZRANGE command to get the first task in the list. ZRANGE accepts several useful arguments, such as the score range (the timestamp interval, in our case), and the offset/limit. If we provide it with the following arguments, we will get the first next task we need to execute. - Minimal score: 0 (the beginning of time) - Maximum score: current timestamp - Offset: 0 - Count: 1 3. If there is a task found: 3.1 Get the task data by executing the GET command on the task:${taskId} key. 3.2 Deserialize the data and call the task handler. 3.3 Remove the task data using the DEL command on the task:${taskId} key. 3.4 Remove the task identifier from the sorted set by calling ZREM on the sortedSets key. 3.5 Go back to point 2 to get the next task. 4. If there is no task found, wait for a predefined number of seconds before trying again. The Code Now to the code. We will have two objects to work with. The first one is RedisApi and this is simply a façade over the Redis client. For the Redis client, we chose to use ioredis, a popular Redis library for Node. ` The RedisApi function returns an object that has all the Redis operations that we mentioned previously, with the addition of isConnected, which we will use to check if the Redis connection is working. The other object is the Scheduler object and has three functions: - start() to start the task processing - stop() to stop the task processing - schedule() to submit new tasks The start() and schedule() functions contain the bulk of the algorithm we wrote above. The schedule() function adds a new task to Redis, while the start() function creates a findNextTask() function internally, which it schedules recursively while the scheduler is running. When creating a new Scheduler object, you need to provide the Redis connection details, a polling interval, and a task handler function. The task handler function will be provided with task data. ` That's it! Now, when you run the scheduler and submit a simple task, you should see an output like below: ` ` Conclusion Redis sorted sets are an amazing tool, and Redis provides you with some really useful commands to query or update sorted sets with ease. Hopefully, this blog post was an inspiration for you to consider using sorted sets in your applications. Feel free to use StackBlitz to view this project online and play with it some more....

State of Node.js Wrap-up cover image

State of Node.js Wrap-up

In this State of Node.js event, our panelists discussed updates, LTS releases and APIs with Node.js maintainers, technical steering committee members and collaborators, and much more. In this wrap-up, we will take a deeper look into these latest developments and explore what is on the horizon for Node.js. You can watch the full State of Node.js event on the This Dot Media YouTube Channel. Here is a complete list of the host and panelists that participated in this online event. Hosts: - Tracy Lee, CEO, This Dot Labs, @ladyleet - James Snell, Node.js Foundation Technical Steering Committee, @jasnell Panelists: - Beth Griggs, Senior Software Engineer, Red Hat, Node.js TSC Member, @BethGriggs_ - Matteo Collina, Co-Founder and CTO of Platformatic.dev, Node.js TSC member, @matteocollina - Michael Dawson, Node.js Lead, Red Hat and IBM, @mhdawson1 General state of Node.js Michael kicks off the conversation saying there are a lot of things happening with Node.js right now. There were over a billion downloads last year alone, and it is continuing to grow. Beth talked about the major release of Node v20 coming out in April. Node 14 end of life is coming at the end of April. Matteo talked about two micro conferences happening this year for Node.js. One will be in North America in Vancouver in May, and the other one is in September in Bilbao. Updates from specific working groups Michael talks about spinning up a uvwasi team. The wasi is the web assembly system interface. It’s not only used in Node, but in other projects like grain. It’s a key component of wasm support. Michael also talks about how the Node.js API team has been great for building long term contributors. If you’re interested in add-ons and native code, it is a friendly group to get involved with. Beth talks about other ways folks can contribute to Node.js. She talks about a redesign of the website that happened recently. The main website has been migrated over to Next.js. Matteo talks about a massive PR that is open right now about the new loader API. There is a lot of effort being put into this with a lot of contributors. This new loader will replace the - - hack. New Features Michael talks about the single executable application that enables bundling code into the Node.js binaries without having to build it. He also mentions process permissions. These are two big new experimental features right now. Beth talks about the built-in test runner. It allows you to throw some scripts together, and get some simple tests without having to deal with dependable warnings for a mod. End of Event Each panelist takes time to go over what they are currently doing on their own. Beth is working in security for releases, and takes time to talk about everything there. Michael is working with the Node API, which is a long-term working project. James is work on standard APIs and also bringing interoperability with Node, Bun, and Dino. Finally, Matteo is working on getting Platformatic going. Conclusion The conversation went in depth about the state of Node.js, and what is being done in the new releases as well as experimental updates. The panelists were very engaged, and were great at bringing up ways to get involved with the Node community. You can watch the full State of Node.js event on the This Dot Media Youtube Channel....

Introducing the express-typeorm-postgres Starter Kit cover image

Introducing the express-typeorm-postgres Starter Kit

At This Dot, we've been working with ExpressJS APIs for a while, and we've created a starter.dev kit for ExpressJS that you can use to scaffold your next backend project....

State of Deno: A Look at the Deno CLI, Node.js Compatibility and the Fresh Framework cover image

State of Deno: A Look at the Deno CLI, Node.js Compatibility and the Fresh Framework

In this State of Deno event, our panelists discussed the Deno CLI, Node.js compatibility for the npm ecosystem, and the Fresh framework. In this wrap-up, we will take a deeper look into these latest developments and explore what is on the horizon for Deno. You can watch the full State of Deno event on the This Dot Media YouTube Channel. Here is a complete list of the host and panelists that participated in this online event. Hosts: Tracy Lee, CEO, This Dot Labs, @ladyleet Panelists: Colin Ihrig, Software Engineer at Deno, @cjihrig Luca Casonato, Software Engineer at Deno, @lcasdev Bartek Iwańczuk, Software Engineer at Deno, @biwanczuk David Sherret, Software Engineer at Deno, @DavidSherret Table of Contents - Exploring the Deno CLI and its features - What is Deno? - Built in support for TypeScript - Built in toolchain - Deno install and upgrade commands - Deno permissions - Upcoming features - Deno products - Deno Deploy - Deno and Node.js compatibility - Future support for npm packages - The Deno to Node Transform library tool - Fresh framework - Conclusion - We look forward to seeing you at our next State of Deno! Exploring the Deno CLI and Its Features What is Deno? Deno is server-side runtime for JavaScript that also behaves similarly to a browser because it supports all of the same browser APIs on the server. This support provides access to existing knowledge, resources, and documentation for these browser APIs. The team at Deno works closely with browser vendors to make sure that new web APIs work well for both server-side runtime and browsers. Built In Support for TypeScript One of the advantages of Deno is that it ships with TypeScript by default. This removes the setup and configuration time, and reduces the barrier of entry for getting started with TypeScript. Deno also type checks your TypeScript code so you no longer have to use tsc. Built in Toolchain The Deno CLI comes with an entire toolchain which includes a formatter, Linter, package manager, vendor remote dependencies, editor integrations, and more. One of those tools is the Documentation Generator, which annotates library function comments, types, or interfaces with JSDocs comments to easily generate documentation. For a complete list of the Deno CLI tools, please visit their documentation page. Deno install and upgrade commands Deno install is another feature that allows you to take a script and install it using a global command. ` If there is a new version of Deno, you can just run the upgrade command and it will upgrade itself, which makes it a version manager for itself. ` Deno permissions Deno by default will not have file, network or environment access unless you grant those permissions by running a script with command line flags. ` Even with permissions granted, you can scope it to certain directories to allow it to only read and write in the directory of your choosing. If you run the program without permissions granted, the program will still prompt you for permission access before accessing a file. To learn more about Deno's permissions, please read through the documentation. Upcoming features Deno is currently working on improving performance in the areas of HTTP, FFI (Foreign Function Interface) and Node compatibility. There are also improvements being made on the Documentation Generator, to make sure that the docs provided are good and removing the need to maintain two separate docs. Deno Products Deno Deploy Deno deploy is a hosted offering that makes it easy to go from local development to production ready. This service integrates well with GitHub, and provides an option to pay only when users make requests to your services. Deno deploy has a dashboard that allows you to automate most of the processes and show you metrics, deployment statistics, CPU utilization, and network utilization. It also allows you to set up custom domains and provision certificates. Deno and Node.js compatibility Deno v1.15 will introduce a Node.js compatibility mode which will make it possible to run some of Node's programs in Deno. Node APIs like the http server will work in Deno as they would in Node.js. When it comes to the NPM ecosystem compatibility, the goal is to support the large number of packages built on Node.js. The Deno team is working on these compatibility issues, because it uses web APIs for most of their operations. All of these Web APIs were created after Node.js was conceived, meaning that Node implements a whole different set of APIs to do certain operations like performing network requests. This new Node.js API compatibility layer will be able to translate API calls to modern underlying APIs which is what the actual runtime implements. Future support for npm packages When it comes to supporting npm packages on Deno, the goal is to have a transpiler server that takes common.js code and translates that into ESM (ECMAScript module) code. The reason for this is because, just like browsers, Deno only supports ESM code. Deno uses the concept of npm specifiers to provide access to the npm package. Deno downloads the npm package and runs it from a global cache. common.js is also supported ,and it runs the code as it is. ` For npm packages, Deno will create a single copy of the downloaded package instead of multiple directories or sub-directories of modules. It is one global hash file, with no node_modules directory by default, and no need for a package.json by default. If a package requires a node_modules directory, then that directory can be created using a specifier flag. The following command will create a node_modules directory in the project directory, using symlink. ` The Deno to Node Transform library tool The Deno team built a tool to allow library authors to transform Deno packages to Node.js packages. Deno to Node Transform (DNT) takes the Deno code then builds it for Node and distributes it as an npm package. This allows library authors to take advantage of the Deno tool chain. This package can also be shipped on npm for Node.js users. Fresh framework Fresh is a new web framework for Deno, that makes use of the Deno toolchain ecosystem. Fresh uses JSX for templating, and it is similar to Next.js or Remix. A key difference between Fresh and Next.js or Remix, is that Fresh is built to be server-side rendered on the edge rather than server-side in a few locations. Another difference is that with Fresh, no JavaScript is shipped to the client by default, which makes it faster. Fresh handles the Server-side rendering, generates the HTML, sends the file to the client, and hydrates only the part of the page that requires JavaScript on the client by default. Here are some products that already use the Fresh framework: - Deno - merch.deno.com - Deno Deploy To learn more about how to build apps with the Fresh framework, please read through this helpful blog post. Conclusion The team at Deno is making great progress to bring more exciting features to the community to make the runtime engine easy to use for building or migrating existing libraries. If you have any questions about the State of Deno, be sure to ask here. What is it you find exciting about Deno? We will be happy to hear about it on Twitter! We look forward to seeing you at our next State of Deno!...

The Future of Node.js: Experimental Web APIs and Default Test Runner for Node 18 cover image

The Future of Node.js: Experimental Web APIs and Default Test Runner for Node 18

In the latest This Dot State of Node.js, Node.js core contributors shared their thoughts on the new APIs released with Node.js 18. Here is a complete list of the panelists that participated in the online event: Host: Tracy Lee, CEO, This Dot Labs, @ladyleet Panelists: Beth Griggs, Senior Software Engineer, Red Hat @bethgriggs_ Colin Ihrig, Engineer, Deno @cjihrig Danielle Adams, Software Engineer, Amazon Web Services @adamzdanielle Michael Dawson, Node.js Lead, Red Hat and IBM @mhdawson1 You can watch the full This Dot State of Node.js, event on This Dot Media's YouTube Channel. Here are detailed key points from the panel discussion: Node.js 18 Node.js 18 was released in April with updates to Google’s V8 JavaScript Engine, experimental browser APIs, and a Test Runner. Getting involved in Node.js The Node API Working Group is a growing community of developers who contribute to the Node.js Open Source project. To get involved with meetings, working groups, and projects happening, visit (the Node.js website](https://nodejs.org/en/get-involved/). The Future of Node.js Node.js had a very successful first 10 years, and the team is working towards the next 10 years of building and maintaining the ecosystem. To help achieve the vision a working group, The Node.js Next 10 Working Group has set up a meeting every two weeks to discuss documentations and projects. Node.js 18 new Features There are so many new features to be excited for with the Node.js 18 release, including some web APIs and a test runner. Native Fetch API (experimental) Node.js 18 has enabled an experimental fetch API on the global scope by default #41811, ` Node.js Test Runner (experimental) The Node.js Test Runner node:test facilitates creating JavaScript tests that report results in TAP format. #42658 Upgrade to Node.js 18 To upgrade from a version of Node.js to the latest version 18, it’s a good idea to read the change logs and follow the breaking changes. This Node.js release follows the experimental flag for major features to give the community the opportunity to test out the features before they are designated “stable”. To use Node.js without experimental features, use the --no-experimental-fetch command-line flag. If you face any issue with the experimental features or any features on the Node.js, Node.js has a help repository where you can ask questions. Other Topics Other discussions include Async Hook Module, which is still in experimental mode, and the future of ESModules and CommonJs for Node.js. With the community adoption of the new ESModule, and some existing libraries still using CommonJs, the support for both will be maintained by Node.js. There was also a round up discussion of the WinterCG, a community group that provides a space for JavaScript runtimes to collaborate on API interoperability. Conclusion Among discussions and updates for Node.js Test Runner, Browser APIs in Node.js, and The Future of Node (Node Next10), the panelists also discussed OpenJS Foundation hosting the OpenJS World 2022 event with Collab Summit. I recommend you watch the full video of the event on This Dot’s YouTube Channel. Want to learn more about Node.js? Check out the This Dot blog for more Node.js content!...

Deep Dive into Node.js with James Snell cover image

Deep Dive into Node.js with James Snell

Node.js is one of the most used engines globally, and it is unique in the approach it follows to make our code work. As developers, we usually ignore how the underlying tools we use in daily work. In this article, we will deep dive into the Node.js internals using James Snell's talk as our guide, but we will expand in some areas to clarify some of the concepts discussed. We will learn about the Event Loop and the asynchronous model of Node.js. We will understand Event Emitters and how they power almost everything inside Node, and then we will build on that knowledge to understand what Streams are, and how they work. Event Loop We will begin our journey inside Node.js by exploring the Event Loop. The Event Loop is one of the most critical aspects of Node.js to understand. The Event Loop is the big orchestrator- the mechanism in charge of scheduling Node.js' synchronous and asynchronous nature. This section teaches how everything related to the scheduling mechanism works. When the Node.js process begins, it starts multiple threads: the Main process thread and the Libuv pool threads with four worker threads by default. The Libuv thread concern is handling heavy load work like IO by reading files from the disc, running some encryption, or reading from a socket. We can see the Event Loop as a glorified for/while loop that lives inside the main thread. The loop process happens at the C++ level. The Event Loop must perform several steps to complete a full back-around iteration. These steps will involve performing different checks and listening to OS events, like checking if there are any timers expired that need to be fired. It will check if there is any pending IO to process or schedule. These tasks run at the C++ level, but the event associated with the steps usually involves a callback like the action to execute when a timer expires, or a new chunk of a file is ready to be processed. When this happens, the callback executes in Javascript. Because the event loop exists in the process main thread, every time one of the steps of the event loop is processed, the event loop and the main thread are blocked until that step completes. This means that even if the Libuv is still executing the IO operations while the main thread is completing one-step tasks, the result of the IO operations is not accessible until the main thread is unblocked and iterates to the step that handles that result. "There is no such thing as asynchronous Javascript" James Snell But if asynchronous doesn't exist in Node and Javascript. What are promises for? Promises are just a mechanism to defer processing to the future. The power of Promises is the ability to schedule code execution after a specific set of events happens. But we will learn more about Promises later in this article. Step Execution Now that we understand the event loop high-level picture, it is time to go one level deeper and understand what happens in each step of the Event Loop. We learned that each step of the event loop executes in C++, but what happens if that step needs to run some Javascript? The C++ step code will use the V8 APIs to execute the passed Javascript code in a synchronous call. That Javascript code could have calls to a Node API named process.nextTick(fn), which receives another function as an argument. > "nextTick is not a good name because no tick is involved." James Snell If present, the process.nextTick(fn) appends its argument function in queue data structure. Later, we will find that there is a second queue where the Javascript code can append functions. But for simplicity, let's assume for now that there is only one queue: the nextTick queue. Once the Javascript runs and completes filling the queue through the process.nextTick method, it will return control to the C++ layer. Now it is time for Node to drain the nextTick Queue synchronously, in the order they were added, by running each of the functions added to the nextTick queue. Only when the queue is empty can the event loop move to the next step and start again with the same process. Remember everything described before runs asynchronously. > "Any time you have Javascript running, anything else is happening" James Snell Therefore, the key to keeping your Javascript performant is to keep your functions small, and use a scheduling mechanism to defer work. But what is the scheduling mechanism? The scheduling mechanisms are the instruments through which Node.js can simulate asynchronicity by scheduling the execution of a given Javascript function to a given time in the future. The scheduling mechanisms are the nextTick queue and the Microtask queue. The difference between these two is the order in which they execute. NodeJS will only start draining the Microtasks queue after the nextTick queue is empty. And the nextTick queue after the call stack is empty. The call stack is a LIFO (Last In, First Out) stack data structure. The operations from the stack are completely synchronous and are scheduled to run ASAP. The v8 API that we saw before runs the Javascript code sent from the C++ layer by adding and removing the statements into the stack, and executing them as corresponding. We saw how the nextTick queue is filled by the V8 execution when processing one statement from the stack, and how it drains as soon as C++ processes the Javascript stack. The Microtask queue is the one that process Promises continuation like events, catches, and finallies in V8. It is drained immediately after the nextTick queue is empty. Let's paint a picture of how this works. The following represents a function body executing several tasks. ` But the final order in which Node will execute this code looks more like ` Notice how nextTick is processed after the stack of operations is emptied, and the Microstask operations (the promise continuations) are deferred to just after the nextTick queue is emptied. However, there are other scheduling mechanisms: - Timers like setTimeout, setInterval, etc - Inmediates, the execution order of which is combined with timers, but they are not timers. A setInmediate(fn) registers a callback that will execute at the end of the current event loop turn, or just before the next event loop turn starts. - Promises, which are not the same as promises event continuation, which is what the Microstask handles. - Callbacks on the native layers this is something that is not necessarily available to the javascript developer. - Workers threads are separate node instances with their own process main thread, their own Libuv threads, and their own event loop. It is basically a complete Node instance running in a separate thread. Because it is a separate Node instance, it can run Javascript Node in parallel to the main Node instance. > "NextTick and Immediate names should be inverted because NextTick operations happen immediately after the Javascript stack is empty and Immediate functions will happen just before the nextTick start." To find complementary resources, you can go to the Node.js documentation. Event Emitter Event emitters are one of the first Node APIs ever created. Almost everything in Node is an event emitter. It is the first thing that loads and is fundamental to how Node works. If you see an object with the following methods, you have an event emitter. ` But how do these work? Inside the event emitter object, we have a map/look-up table, which is just another object. Inside each map entry, the key is the event name, and the value is an array of callback functions. The on method on the event emitter object receives the event name as the first argument, and a callback function as its second. When called, this method adds the on callback function into the corresponding array of the look-up table. The once method behaves like the on method, but with a key difference; instead of storing the once callback function directly, it will wrap it inside another function, and then add the wrapped function to the array. When the wrapper function gets executed, it will execute the wrapped callback function, removing itself from the array. On the other hand, emit uses the event name to access the corresponding array. It will create a backup copy of the array, and iterate synchronously over each callback function of the array executing it. It is important to highlight that emit is synchronous. If any of the callback functions on the array take a long time to execute, the main thread will be blocked. Emit won't return until all of functions on the array have executed. The asynchronous behavior of the emit is an illusion. This effect is caused because internally, Node will invoke each callback function using a nextTick, and therefore deferring the function's execution into the future. You can find more info about Event Emitter at the Node.js documentation. Streams (Node | Web) Streams are one of the fundamental concepts that power Node.js applications. They are a way to handle reading/writing files, network communications, or any end-to-end information exchange efficiently. Streams are not a concept unique to Node.js. They were introduced in the Unix operating system decades ago, and programs can interact with each other by passing streams through the pipe operator (|). For example, traditionally, when you tell the program to read a file, it is read into memory, from start to finish, and then you process it. You read it piece by piece using streams, processing its content without keeping it all in memory. The Node.js stream module provides the foundation upon which all streaming APIs are built. All streams are instances of EventEmitter. Node Streams There are four Node stream types: Readable, Writable, Duplex, and Transform. All of them are Event Emitters. Readable: a stream you can pipe from but not pipe into (you can receive data, but not send data to it). When you push data into a readable stream, it is buffered until a consumer starts to read the data. Writable: a stream you can pipe into but not pipe from (you can send data but not receive from it). Duplex: a stream you can both pipe into and pipe from. Basically a combination of a Readable and Writable stream. Transform: a Transform stream is similar to a Duplex, but the output is a transform of its input. Readable Stream The Readable stream works through a queue and the highwatermark in an oversimplified way. The highwatermark will delimit how much data can be in the queue, but the Readable Stream will not enforce that limit. Every time data is pushed into the queue, the stream will give feedback to the client code, telling if the highwatermark was reached or not, and transferring the responsibility to the client. The client code must decide if it continues pushing data and overflowing the queue. This feedback is received through the push method, which is the mechanism for pushing data into the queue. If the push method returns true, the highwatermark has not been reached, and you can push more. If the push method returns false that means that the highwatermark has been reached, and you are not supposed to push more, but you can. Events When data has been pushed into the queue, the internal of the Readable Stream will emit a couple of events. The on:readable is part of the pull model; it alerts that this stream is readable and has data to be read. If you are listening to the on:readable event, a read method can be called. When the client code calls the read() method, it will get a chunk of data out of the queue, and it will dequeue it. Then, the client code can keep calling read until there is no more data in the queue. After emptying the queue, the client code can restart pulling data when the on:readable event triggers again. The on:read event is part of the push model. In this event, any new chunk of data that is pushed using the push method will be synchronously sent to its listeners. That means we don't need to call the read method. The data will arrive automatically. However, there is another crucial difference; the sent data will never be stored in the queue. The queue is only filled when there is no active on:read event listener. This is called the "flow" mode because the data just flows, and it doesn't get stored. Other events are the end event that would notice that there is no more data to be consumed from the stream. The 'close' event is emitted when the stream and any underlying resources (a file descriptor, for example) have been closed. The event indicates that no more events will be emitted, and no further computation will occur. The 'error' event may be emitted by a Readable implementation. Typically, this may occur if the underlying stream cannot generate data due to an underlying internal failure or when a stream implementation attempts to push an invalid chunk of data. The key to understanding the Readable Streams and their events is that the Readable Streams are just Event Emitters. Like Event Emitters, the Readable Streams don't have anything built into it that is asynchronous. It will not call any of the scheduling mechanisms. Therefore they operate purely synchronously, and to obtain an asynchronous behavior, the client code needs to defer when to call the different event APIs like the read() and push() methods. Let's understand how! When creating a Readable Stream, we need to provide a read function. The stream will repeatedly call this read function as long as its buffer is not full; however, after it calls it once, it will wait until we call push before calling it again. If we synchronously call push after the Readable Stream called read and the buffer is not full, the stream will synchronously call read again, up until we call push(null), marking the end of the stream. Otherwise, we can defer the push calls to some other point in time, effectively making the Stream read calls operate asynchronously; we might, for example, wait for some File IO callback to return. Examples Creating a Readable Stream ` ` Using a Readable Stream ` Writable Stream The Writable Streams works similarly. When creating a Writable Stream, we need to provide a _write(chunk, encoding, callback) function, and it will be an external write(chunck) function. When the external write function is called, it will just call the internal _write function. The internal _write function must call its callback argument function. Let's imagine that we write ten chunks; when the first chunk is written, if the internal _write function doesn't invoke the callback function, what will happen is that the chunks will accumulate in the Writable Stream internal buffer. When the callback is invoked, it will grab the next chunk and write it, draining the buffer. This means that if the _write is calling the callback function synchronously, then all those writes will happen synchronously; if it is deferring invoking that callback, then all those calls will happen asynchronously, but the data will accumulate in the internal buffer up until the highwatermark is hit. Even then, you can decide to keep incrementation the buffer queue. Events Like the Readable Stream, the Writable Stream is an Event Emitter that will provide its events listeners with useful alerts. The on:drain event notifies that the Writable buffer is longer, and more writes are allowed. If a call to the write function returns false, indicating backpressure, the 'drain' event will be emitted when it is appropriate to resume writing data to the stream. The 'close' event is emitted when the stream and any underlying resources (a file descriptor, for example) have been closed. The event indicates that no more events will be emitted, and no further computation will occur. The 'error' event is emitted if an error occurs while writing or piping data. The listener callback is passed a single Error argument when called. The 'finish' event is emitted after the stream.end() method has been called, and all data has been flushed to the underlying system. Examples Creating a Writable Stream ` Duplex Stream Duplex streams are streams that implement both the Readable and Writable interfaces. We can see a Duplex stream as an object containing both a Readable Stream and a Writable Stream. These two internal objects are not connected, and each has independent buffers. They share some events; for example, when the close event emits, both Streams are closed. They still emit the stream type-specific events like the readable and read for the Readable Stream, or the drain for the Writable Stream independently. However, this doesn't mean that data will be available on the Readable stream if we write into the Writable Stream. Those are independent. Duplex streams are especially useful for Sockets. Transform Stream Transform streams are Duplex streams where the output is related to the input. Contrary to the Duplex streams, when we write some chunk of data into the Writable stream, that data will be passed to the Readable stream by calling a transform function and invoking the push method on the readable side with the transformed data. As with Duplex, both the Readable and the Writable sides will have independent buffers with their own highwatermark. Examples Creating a Transform Stream ` Web Streams In Web streams, we still see some of the concepts we have seen for Node Streams. There exist the Readable, Writable, Duplex, and Transform streams. However, they also have significant differences. Unlike Node streams, web streams are not Event Emitter-based but Promise-based. Another big difference is that Node streams support multiple event listeners at the same time, and each of those listeners will get a copy of the data, while Web streams are restricted to a single listener at a time, and it has no events in it; it is purely Promise-based. While Node streams work entirely synchronously because Web streams are Promise-based, they work entirely asynchronously. This happens because the continuation is always deferred to the MicroTasks queue. It is essential to highlight that Node streams are significantly faster than Web streams, while Web streams are much more portable than Node streams. So keep that in mind when you are planning to select one or the other. Readable stream To understand the underlying implementation of how this works, let's take the Readable stream and dissect it. The developer can pass the underlayingSource object to the constructor when creating a new Readable stream. We can define the pull(controller) function inside this object, which receives the Controller object as its first and only parameter. This is the function where the client code can push the data into the stream by defining a custom data pulling logic. The Controller object, argument of the pull() function, contains the enqueue() function. This is the equivalent to the Node Readable stream push() function, and it is used to push the callback function return data into the Readable stream. The reader object, is the element through which the Web Readable stream enforces a single listener or reader at the time. It can be accessed through the getReader() method of the stream. Inside the reader object, we can find the read() function, which returns a promise providing access to the next chunk in the stream's internal queue. When the client code of the stream calls read() from the reader object, this will look into the Controller data queue and check if there is any data. If there is data in the queue, it will only dequeue the first chunk of data from the queue and resolve the read promise. If there is no data in the queue, the stream is going to call the pull(controller) function defined in the stream constructor argument object. Then the pull(controller) function will run and as part of its execution, it will call the Controller function enqueue() to push data into the stream. After pushing the data in, the pull function will resolve the initial read() Promise with the data pushed in. Because the pull(controller) function can be an async function, and return a Promise, if it calls once and that Promise is not resolved yet, and the client code continues calling the read() function, those read promises are going to accumulate in a read queue. Now we have our data queue and the read queue, and every time a new data is enqueued into the data queue, the stream logic will check if there is some pending read promise in the read queue. If the read queue is not empty, the first pending Promise will be dequeued and resolved with the data enqueued in the data queue. We can see this process as the read queue and the data queue trying to balance each other out. Consequently, the read queue will not accumulate read promises if the data queue has data. And the opposite is also true; we will not accumulate data in the data queue if the read queue has unresolved promises. Example Creating a readable stream ` Conclusions There is immense beauty in the depths for those curious developers who are not afraid of diving into the deep waters of the Node.js inner mechanics. In this article, we barely start exploring some of the solutions chosen by the Node.js core contributors, and we have found how elegant and straightforward many of those solutions are. There is still much to learn about the internals of Node and the Browser engines, but I think this article and its companion Javascript Marathon are a great starting point. If you get here, thank you, and I hope you got inspired to continue digging into the tools that allow us to unleash our passion and creativity....

Web Scraping with TypeScript and Node.js cover image

Web Scraping with TypeScript and Node.js

For those times when you can't get web data from an API or as a CSV download, learn to write a web scraper with TypeScript and Node....