Skip to content

D1 SQLite: Schema, migrations and seeds

D1 SQLite: Schema, migrations and seeds

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

I’ve written posts about some of the popular ORM’s in TypeScript and covered their pros and cons. Prisma is probably the most well known and Drizzle is a really popular up and comer. I like and use ORM’s in most of my projects but there’s also a camp of folks who believe they shouldn’t be used.

I started a small Cloudflare Workers project recently and decided to try using their D1 SQLite database without adding any ORM. This is the first post in a 2 part series where we’ll explore what this experience is like using only the driver and utilities made available in the Wrangler CLI.

Introduction

If you’re unfamiliar with Cloudflare D1 - it’s a distributed SQLite database for the Cloudflare Workers platform. Workers are lightweight serverless functions/compute distributed across a global network. The platform includes services and API’s like D1 that provide extended capabilities to your Workers.

At the time of this writing, there are only two ways to interact with a D1 database that I’m aware of.

In this 2 part series, we will create a simple Workers project and use the D1 Client API to build out our queries.

Getting Started

For this tutorial, we’ll create a simple Cloudflare Worker project and treat it like a simple node script/http server to do our experiments.

The first step is initializing a new Cloudflare Worker and D1 database:

npm create cloudflare@latest d1-tutorial

using create-cloudflare version 2.22.0

╭ Create an application with Cloudflare Step 1 of 3
│
├ In which directory do you want to create your application?
│ dir ./d1-tutorial
│
├ What type of application do you want to create?
│ type "Hello World" Worker
│
├ Do you want to use TypeScript?
│ yes typescript
│
├ Copying template files
│ files copied to project directory
│
├ Updating name in `package.json`
│ updated `package.json`
│
├ Installing dependencies
│ installed via `npm install`
│
╰ Application created
cd ./d1-tutorial
npx wrangler d1 create test-db

✅ Successfully created DB 'test-db' in region ENAM
Created your new D1 database.

[[d1_databases]]
binding = "DB" # i.e. available in your Worker on env.DB
database_name = "test-db"
database_id = "your-database-id"

We need to take our binding and add it to our project wrangler.toml configuration. Once our binding is added we can re-generate the types for our Worker project.

npm run cf-typegen

> d1-tutorial@0.0.0 cf-typegen
> wrangler types

 ⛅️ wrangler 3.65.0
-------------------

interface Env {
	DB: D1Database;
}

We now have our DB binding added to our project Env types. Let’s add a simple query to our worker script to make sure our database is setup and working:

return new Response(
  await env.DB.prepare("SELECT 1 + 1;")
    .first()
    .then((row) => JSON.stringify(row))
);

Start the development server with npm run dev and access the server at http://localhost:8787 . When the page loads we should see a successful result {"1 + 1":2} .

We now have a working SQLite database available.

Creating a schema

Since we’re not using an ORM with some kind of DSL to define a schema for our database, we’re going to do things the old fashioned way. “Schema” will just refer to the data model that we create for our database. We’ll define it using SQL and the D1 migrations utility.

Let’s create a migration to define our initial schema:

npx wrangler d1 migrations create DB initial_schema

 ⛅️ wrangler 3.65.0
-------------------

✔ No migrations folder found. Set `migrations_dir` in wrangler.toml to choose a different path.
Ok to create /Users/dane/Projects/d1-tutorial/migrations? … yes
✅ Successfully created Migration '0001_initial_schema.sql'!

The migration is available for editing here
/Users/dane/Projects/d1-tutorial/migrations/0001_initial_schema.sql

For our demo purposes we will build out a simple blog application database. It includes posts, authors, and tags to include some relational data. We need to write the SQL in our migration to create all the tables and columns that we need:

-- Migration number: 0001 	 2024-07-19T15:17:15.625Z

-- Create Authors table
CREATE TABLE authors (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    name VARCHAR(100) NOT NULL,
    bio TEXT
);

-- Create Posts table
CREATE TABLE posts (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    author_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT NOT NULL,
    published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (author_id) REFERENCES authors(id) ON DELETE CASCADE
);

-- Create Tags table
CREATE TABLE tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    name VARCHAR(50) NOT NULL UNIQUE
);

-- Create Posts_Tags junction table to implement many-to-many relationship between posts and tags
CREATE TABLE posts_tags (
    post_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (post_id, tag_id),
    FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

The SQL above defines our tables, columns, and their relations with foreign keys. It also includes a join table for posts and tags to represent a many-to-many relationship.

If you don’t have a lot of experience writing SQL queries it might look a little bit intimidating at first. Once you take some time to learn it it’s actually pretty nice. DataLemur is a pretty great resource for learning SQL.

If you need help with a specific query, Copilot and GPT are quite good at generating SQL queries. Just make sure you take some time to try to understand them and check for any potential issues.

After completing a migration script it needs to be applied:

npx wrangler d1 migrations apply DB --local

 ⛅️ wrangler 3.65.0 (update available 3.65.1)
-------------------------------------------------------

Migrations to be applied:
┌─────────────────────────┐
│ name                    │
├─────────────────────────┤
│ 0001_initial_schema.sql │
└─────────────────────────┘
✔ About to apply 1 migration(s)
Your database may not be available to serve requests during the migration, continue? … yes
🌀 Executing on local database DB (5440f1ab-dd2e-42d7-a7d4-16d855948517) from .wrangler/state/v3/d1:
🌀 To execute on your remote database, add a --remote flag to your wrangler command.
┌─────────────────────────┬────────┐
│ name                    │ status │
├─────────────────────────┼────────┤
│ 0001_initial_schema.sql │ ✅       │
└─────────────────────────┴────────┘

I added the --local flag so that we’re working against a local database for now.

Typing our Schema

One of the downsides of our ORMless approach is we don’t get TypeScript types out of the box. In a smaller project, I think the easiest approach is just to manage your own types. It’s not hard, you can even have GPT help if you want.

If managing type definitions for your schema is not acceptable for your project or use case you can look for a code generation tool or switch to an ORM / toolset that provides types.

For this example I created some basic types to map to our schema so that we can get help from the lsp when working with our queries.

export type Author = {
	id: number;
	name: string;
	bio: string | null;
};

export type Post = {
	id: number;
	author_id: number;
	title: string;
	content: string;
	published_at: string;
};

export type Tag = {
	id: number;
	name: string;
};

export type PostTag = {
	post_id: number;
	tag_id: number;
};

Seeding development data

Outside of our migrations, we can write SQL scripts and execute them against our D1 SQLite database using wrangler. To start we can create a simple seeds/dev.sql script to load some initial development seed data into our local database. Another example might be a reset.sql that drops all of our tables so we can easily reset our database during development as we rework the schema or run other experiments.

Since our database is using auto incrementing integer ID’s, we can know up front what the ID’s for the rows we are creating are since our database is initially empty. This can be a bit tricky if you’re using randomly generated ID’s. In that case you would probably want to write a script that can collect ID’s of newly created records and use them for creating related records. Here we are just passing the integer ID directly in our SQL script. As an example, we know up front that the author Alice Smith will have the id 1, Bob Johnson 2, and so on.

Post_tags looks a little bit crazy since it’s just a join table. Each row is just a post_id and a tag_id. (1, 1), (1 2), etc.

Here’s the code for a dev seed script:

-- Seed script for demo blog database

-- Insert authors
INSERT INTO authors (name, bio) VALUES
('Alice Smith', 'Alice is a freelance writer and photographer.'),
('Bob Johnson', 'Bob is a passionate travel blogger.'),
('Charlie Davis', 'Charlie is a tech enthusiast and gadget reviewer.');

-- Insert tags
INSERT INTO tags (name) VALUES
('Travel'),
('Technology'),
('Photography'),
('Lifestyle'),
('Food');

INSERT INTO posts (author_id, title, content) VALUES
(1, 'Exploring the Alps', 'Content about exploring the Alps...'),
(1, 'A Guide to Underwater Photography', 'Content about underwater photography...'),
(1, 'The Best Street Food Around the World', 'Content about street food...');

-- Insert posts by Bob Johnson
INSERT INTO posts (author_id, title, content) VALUES
(2, 'The Ultimate Guide to Backpacking in South America', 'Content about backpacking in South America...'),
(2, 'Discovering Hidden Gems in Europe', 'Content about hidden gems in Europe...'),
(2, 'How to Travel on a Budget', 'Content about budget travel...');

-- Insert posts by Charlie Davis
INSERT INTO posts (author_id, title, content) VALUES
(3, 'The Future of Smart Homes', 'Content about smart homes...'),
(3, 'Top 10 Gadgets for Tech Enthusiasts', 'Content about gadgets for tech enthusiasts...'),
(3, 'How to Build Your Own PC', 'Content about building your own PC...');

-- Associate posts with tags randomly
INSERT INTO posts_tags (post_id, tag_id) VALUES
(1, 1), (1, 3),
(2, 3), (2, 4),
(3, 4), (3, 5),
(4, 1), (4, 2),
(5, 1), (5, 4),
(6, 1), (6, 5),
(7, 2), (7, 3),
(8, 2), (8, 4),
(9, 2), (9, 5);

Here’s the code for a reset script - it’s important to remember to drop the migrations table in your reset so you can apply your migrations.

DROP TABLE IF EXISTS posts_tags;
DROP TABLE IF EXISTS posts;
DROP TABLE IF EXISTS tags;
DROP TABLE IF EXISTS authors;
DROP TABLE IF EXISTS d1_migrations;

Using the wrangler CLI we can execute our script files against our local development and remote d1 database instances. Since we have already applied our migrations to our local database, we can use our dev.sql seed script to load some data into our db.

npx wrangler d1 execute DB --local --file="./seeds/dev.sql"

 ⛅️ wrangler 3.65.0 (update available 3.67.1)
-------------------------------------------------------

🌀 Executing on local database DB (530df8ea-f4e0-4548-856d-be76f46e421c) from .wrangler/state/v3/d1:
🌀 To execute on your remote database, add a --remote flag to your wrangler command.

The Wrangler output is pretty helpful - it lets us know to add the --remote flag to run against our remote instance.

We can also use execute to run commands against our database. Lets run a select to look at the data added to our posts table.

npx wrangler d1 execute DB --local --command="select * from posts;"

This command should output a table showing the columns of our db and the 7 rows we added from the dev seed script.

Summary

Using wrangler and the Cloudflare D1 platform we’ve already gotten pretty far without an ORM or any other additional tooling. We have a simple but effective migrations system in place and some initial scripts for easily seeding and resetting our databases.

There are also some other really great things built-in to the D1 platform like time travel and backups. I definitely recommend at least skimming through the documentation at some point.

In the next post we will start interacting with our database and sample data using the D1 Client API.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

You might also like

Increasing development velocity with Cursor cover image

Increasing development velocity with Cursor

If you’re a developer, you’ve probably heard of Cursor by now and have either tried it out or are just curious to learn more about it. Cursor is a fork of VSCode with a ton of powerful AI/LLM-powered features added on. For around $20/month, I think it’s the best value in the AI coding space. Tech giants like Shopify and smaller companies like This Dot Labs have purchased Cursor subscriptions for their developers with the goal of increased productivity. I have been using Cursor heavily for a few months now and am excited to share how it’s impacted me personally. In this post, we will cover some of the basic features, use cases, and I’ll share some tips and tricks I’ve learned along the way. If you love coding and building like me, I hope this post will help you unleash some of the superpowers Cursor’s AI coding features make possible. Let’s jump right in! Cursor 101 The core tools of the Cursor tool belt are Autocomplete, Ask, and Agent. Feature: Autocomplete The first thing that got me hooked was Autocomplete. It just worked so much better than the tools I had used previously, like GitHub Copilot. It was quicker and smarter, and I could immediately notice the amount of keystrokes that it was saving me. This feature is great because it doesn’t really require any work or skilled prompting from the user. There are a couple of tricks for getting a little bit more out of it that I will share later, but for now, just enjoy the ride! Feature: Ask If you’ve interacted with AI/LLMs before, like ChatGPT - this is what the Ask feature is. It’s just a chat feature you can easily provide context to from your code base and choose which Model to chat with. This feature is best suited for just asking more general questions that you might have queried Google or Stack Overflow for in the past. It’s also good for planning how to implement a feature you’re working on. After chatting or planning, you can switch directly to Agent mode to pick up and take action on something you were cooking up in Ask mode. Here’s an example of planning a simple tic-tac-toe game implementation using the Ask feature: Feature: Agent Agent mode lets the AI model take the wheel and write code, make edits, or take other similar actions on your code base. The goal is that you can write prompts and give instructions, and the Agent can generate the code and build features or even entire applications for you. With great power comes great responsibility. Agents are a feature where the more you put into them, the more you get out. The more skilled you become in using them by providing better prompts and including the right context, you will continue to get better results. The AI doesn’t always get it right, but the fact that the models and the users are both getting better is exciting. Throughout this post, I will share the best use cases, tips, and tricks I have found using Cursor Agent. Here’s an example using the Agent to execute the implementation details of the tic-tac-toe game we planned using Ask: Core Concept: Context After understanding the features and the basics of prompting, context is the most important thing for getting the best results out of Cursor. In Cursor and in general, whenever you’re prompting a chat or an agent, you want to make sure that it has all the relevant information that it needs to provide an answer or result. Cursor, by default, always has some context of your code. It indexes your code base and usually keeps the open buffer in the context window at the very least. At the top left of the Ask or Agent panel, there is an @ button, and next to that are badges for all the current items that have been explicitly added to the context for the current session. The @ button has a dropdown that allows you to add files, folders, web links, past chats, git commits, and more to the context. Before you prompt, always make sure you add the relevant content it needs as context so that it has everything it needs to provide the best response. Settings and Rules Cursor has its own settings page, which you can access through Cursor → Settings → Cursor Settings. This is where you log in to your account, manage various features, and enable or disable models. In the General section, there is an option for Privacy Mode. This is one setting in particular I recommend enabling. Aside from that, just explore around and see what’s available. Models The model you use is just as important as your prompt and the context that you provide. Models are the underlying AI/LLM used to process your input. The most well-known is GPT-4o, the default model for ChatGPT. There are a lot of different models available, and Cursor provides access to most of them out of the box. Model pricing A lot of the most common models, like GPT-4o or Sonnet 3.5/3.7, are included in your Cursor subscription. Some models like o1 and Sonnet 3.7 MAX are considered premium models, and you will be billed for usage for these. Be sure to pay attention to which models you are using so you don’t get any surprise bills. Choosing a Model Some models are better suited for certain tasks than others. You can configure which models are enabled in the Cursor Settings. If you are planning out a big feature or trying to solve some complex logic issue, you may want to use one of the thinking models, like o1, o3-mini, or Deep Seek R1. For most coding tasks and as a good default, I recommend using Sonnet 3.5 or 3.7. The great thing about Cursor is that you have the options available right in your editor. The most important piece of advice that I can give in this post is to keep trying things out and experimenting. Try out different models for different tasks, get a feel for it, and find what works for you. Use cases Agents and LLM models are still far from perfect. That being said, there are already a lot of tasks they are very good at. The more effective you are with these tools, the more you will be able to get done in a shorter amount of time. Generating test cases Have some code that you would like unit tested? Cursor is very good at generating test cases and assertions for your code. The fewer barriers there are to testing a piece of code, the better the result you will get. So, try your best to write code that is easily testable! If testing the code requires some mocks or other pieces to work, do your best to provide it the context and instructions it needs before writing the tests. Always review the test cases! There could be errors or test cases that don’t make sense. Most of the time, it will get you pretty close to where you want to be. Here’s an example of using the Agent mode to install packages for testing and generate unit tests for the tic-tac-toe game logic: Generating documentation This is another thing we know AI models are good at - summarizing large chunks of information. Make sure it has the context of whatever you want to document. This one, in particular, is really great because historically, keeping documentation up to date is a rare and challenging practice. Here’s an example of using the Agent mode to generate documentation for the tic-tac-toe game: Code review There are a lot of up-and-coming tools outside of Cursor that can handle this. For example, GitHub now has Copilot integrated in pull requests for code reviews. It’s never a bad idea to have whatever change set you’re looking to commit reviewed and inspected before pushing it up to the remote, though. You can provide your unstaged changes or even specific commits as context to a Cursor Ask or Agent prompt. Getting up to speed in a new code base Being able to query a codebase with the power of LLM’s is truly fantastic. It can be a great help to get up to speed in a large new codebase quickly. Some example prompts: > Please provide an overview of this project and how to get started developing with it > I need to make some changes to the way that notifications are grouped in the UI, please provide a detailed analysis and pseudo code outlining how the grouping algorithm works If you have a question about the code base, ask Cursor! Refactoring Refactoring code in a code base is a much quicker process in Cursor. You can execute refactors depending on their scope in a couple of distinct ways. For refactors that don’t span a lot of files or are less complex, you can probably get away with just using the autocomplete. For example, if you make a change to something in a file and there are several instances of the same pattern following, the autocomplete will quickly pick up on this and help you tab through the changes. If you switch to another file, this information will still be in context and can be continued most of the time. For larger refactors spanning several files, using the Agent feature will most likely be the quickest way to get it done. Add all the files you plan to make changes to the Agent tab’s context window. Provide specific instructions and/or a basic example of how to execute the refactor. Let the Agent work, if it doesn’t get it exactly right initially, you can always give it corrections in a follow-up prompt. Generating new code/features This is the big promise of AI agents and the one with the most room for mixed results. My main recommendation here is to keep experimenting. Keep learning to prompt more effectively, compare results from different models, and pay attention to the results you get from each use case. I personally get the best results building new features in small, focused chunks of work. It can also be helpful to have a dialog with the Ask feature first to plan out the feature's details that the Agent can follow up on and implement. If there are existing patterns in your codebase for accomplishing certain things, provide this information in your prompts and make sure to add the relevant code to the context. For example, if you’re adding a new form to the web page and you have other similar forms that handle validation and making back-end calls in the same way, Cursor can base the code for the new feature on this. Example prompt: Generate a form for creating a new post, follow similar patterns from the create user profile form, and look to the post schema for the fields that should be included. Remember that you can always follow up with additional prompts if you aren’t quite happy with the results of the first.. If the results are close but need to be adjusted in some way, let the agent know in the next prompt. You may find that for some things, it just doesn’t do well yet. Mentally note these things and try to get to a place where you can intuit when to reach for the Agent feature or just write some of the code the old-fashioned way. Tips and tricks The more you use Cursor, the more you will find little ways to get more out of it. Here are some of the tips and patterns that I find particularly useful in my day-to-day work. Generating UI with screenshots You can attach images to your prompts that the models can understand using computer vision. To the left of the send button, there is a little button to attach an image from your computer. This functionality is incredibly useful for generating UI code, whether you are giving it an example UI as a reference for generating new UI in your application or providing a screenshot of existing UI in your application and prompting it to change details in reference to the image. Cursor Rules Cursor Rules allow you to add additional information that the LLM models might need to provide the best possible experience in your codebase. You can create global rules as well as project-specific ones. An example use case is if your project has some updated dependency with newer APIs than the one on which the LLM has been trained. I ran into this when adding Tailwind v4 to a project; the models are always generating code based on Tailwind v3 or earlier. Here’s how we can add a rules file to handle this use case: ` If you want to see some more examples, check out the awesome-cursorrules repository. Summary Learn to use Cursor and similar tools to enhance your development process. It may not give you actual superpowers, but it may feel like it. All the features and tools we’ve covered in this post come together to provide an amazing experience for developing all types of software and applications....

This Dot AI Field Notes - Anatomy of a Coding Harness cover image

This Dot AI Field Notes - Anatomy of a Coding Harness

A coding agent is not magic, it’s a loop. We call this a harness. The harness is a deterministic layer of code that wraps an LLM. Claude Code is a harness. Codex is a harness. Pi is a harness. The harness, on initialization, provides to the LLM a system prompt defining all tools the harness implements for the LLM. Without the harness, you cannot read or modify files on the user’s local filesystem without them having to copy-and-pasting by hand. The harness is the final place where engineers can customize how coding agents do work before the LLM takes over. Think of the LLM as a train and the harness as the rails the train rides on. Below… one full task executed by a harness, traced step by step....

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Prefer email? hi@thisdot.co