Skip to content

How to build an AI assistant with OpenAI, Vercel AI SDK, and Ollama with Next.js

How to build an AI assistant with OpenAI, Vercel AI SDK, and Ollama with Next.js

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

In today’s blog post, we’ll build an AI Assistant using three different AI models: Whisper and TTS from OpenAI and Llama 3.1 from Meta.

While exploring AI, I wanted to try different things and create an AI assistant that works by voice. This curiosity led me to combine OpenAI’s Whisper and TTS models with Meta’s Llama 3.1 to build a voice-activated assistant.

Here’s how these models will work together:

  • First, we’ll send our audio to the Whisper model, which will convert it from speech to text.
  • Next, we’ll pass that text to the Llama 3.1 model. Llama will understand the text and generate a response.
  • Finally, we’ll take Llama’s response and send it to the TTS model, turning the text back into speech. We’ll then stream that audio back to the client.

Let’s dive in and start building this excellent AI Assistant!

Getting started

We will use different tools to build our assistant. To build our client side, we will use Next.js. However, you could choose whichever framework you prefer.

To use our OpenAI models, we will use their TypeScript / JavaScript SDK. To use this API, we require the following environmental variable: OPENAI_API_KEY—

To get this key, we need to log in to the OpenAI dashboard and find the API keys section. Here, we can generate a new key.

Open AI dashboard inside the API keys section

Awesome. Now, to use our Llama 3.1 model, we will use Ollama and the Vercel AI SDK, utilizing a provider called ollama-ai-provider.

Ollama will allow us to download our preferred model (we could even use a different one, like Phi) and run it locally. The Vercel SDK will facilitate its use in our Next.js project.

To use Ollama, we just need to download it and choose our preferred model. For this blog post, we are going to select Llama 3.1. After installing Ollama, we can verify if it is working by opening our terminal and writing the following command:

Terminal, with the command ‘ollama run llama3.1’

Notice that I wrote “llama3.1” because that’s my chosen model, but you should use the one you downloaded.

Kicking things off

It's time to kick things off by setting up our Next.js app. Let's start with this command:

npx create-next-app@latest

After running the command, you’ll see a few prompts to set the app's details. Let's go step by step:

  • Name your app.
  • Enable app router.

The other steps are optional and entirely up to you. In my case, I also chose to use TypeScript and Tailwind CSS.

Now that’s done, let’s go into our project and install the dependencies that we need to run our models:

npm i ai ollama-ai-provider openai

Building our client logic

Now, our goal is to record our voice, send it to the backend, and then receive a voice response from it.

To record our audio, we need to use client-side functions, which means we need to use client components. In our case, we don’t want to transform our whole page to use client capabilities and have the whole tree in the client bundle; instead, we would prefer to use Server components and import our client components to progressively enhance our application.

So, let’s create a separate component that will handle the client-side logic.

Inside our app folder, let's create a components folder, and here, we will be creating our component:

app
 ↳components
  ↳audio-recorder.tsx

Let’s go ahead and initialize our component. I went ahead and added a button with some styles in it:

// app/components/audio-recorder.tsx
'use client'
export default function AudioRecorder() {
    function handleClick(){
      console.log('click')
    }

    return (
        <section>
		<button onClick={handleClick}
                    className={`bg-blue-500 text-white px-4 py-2 rounded shadow-md hover:bg-blue-400 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 focus:ring-offset-white transition duration-300 ease-in-out absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2`}>
                Record voice
            </button>
        </section>
    )
}

And then import it into our Page Server component:

// app/page.tsx
import AudioRecorder from '@/app/components/audio-recorder';

export default function Home() {
  return (
      <AudioRecorder />
  );
}

Now, if we run our app, we should see the following:

First look of the app, showing a centered blue button

Awesome! Now, our button doesn’t do anything, but our goal is to record our audio and send it to someplace; for that, let us create a hook that will contain our logic:

app
 ↳hooks
  ↳useRecordVoice.ts

import { useEffect, useRef, useState } from 'react';

export function useRecordVoice() {
  return {}
}

We will use two APIs to record our voice: navigator and MediaRecorder. The navigator API will give us information about the user’s media devices like the user media audio, and the MediaRecorder will help us record the audio from it. This is how they’re going to play out together:

// apps/hooks/useRecordVoice.ts
import { useEffect, useRef, useState } from 'react';

export function useRecordVoice() {
    const [isRecording, setIsRecording] = useState(false);
    const [mediaRecorder, setMediaRecorder] = useState<MediaRecorder | null>(null);

     const startRecording = async () => {
        if(!navigator?.mediaDevices){
            console.error('Media devices not supported');
            return;
        }

        const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
        const mediaRecorder = new MediaRecorder(stream);
        setIsRecording(true)
        setMediaRecorder(mediaRecorder);
        mediaRecorder.start(0)
    }

    const stopRecording = () =>{
        if(mediaRecorder) {
            setIsRecording(false)
            mediaRecorder.stop();
        }
    }

  return {
    isRecording,
    startRecording,
    stopRecording,
  }
}

Let’s explain this code step by step. First, we create two new states. The first one is for keeping track of when we are recording, and the second one stores the instance of our MediaRecorder.

 const [isRecording, setIsRecording] = useState(false);
    const [mediaRecorder, setMediaRecorder] = useState<MediaRecorder | null>(null);

Then, we’ll create our first method, startRecording. Here, we are going to have the logic to start recording our audio. We first check if the user has media devices available thanks to the navigator API that gives us information about the browser environment of our user:

If we don’t have media devices to record our audio, we just return. If they do, then let us create a stream using their audio media device.

// check if they have media devices
if(!navigator?.mediaDevices){
 console.error('Media devices not supported');
 return;
}
// create stream using the audio media device
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

Finally, we go ahead and create an instance of a MediaRecorder to record this audio:

// create an instance passing in the stream as parameter
const mediaRecorder = new MediaRecorder(stream);
// Set this state to true to 
setIsRecording(true)
// Store the instance in the state
setMediaRecorder(mediaRecorder);
// Start recording inmediately
mediaRecorder.start(0)

Then we need a method to stop our recording, which will be our stopRecording. Here, we will just stop our recording in case a media recorder exists.

if (mediaRecorder) {
  setIsRecording(false)
  mediaRecorder.stop();
}

We are recording our audio, but we are not storing it anywhere. Let’s add a new useEffect and ref to accomplish this. We would need a new ref, and this is where our chunks of audio data will be stored.

const audioChunks = useRef<Blob[]>([]);

In our useEffect we are going to do two main things: store those chunks in our ref, and when it stops, we are going to create a new Blob of type audio/mp3:

export function useRecordVoice() {   
   const audioChunks = useRef<Blob[]>([]);

   ...
   useEffect(() => {
        if (mediaRecorder) {
            // listen to when data is available and store it as chunks in our ref
            mediaRecorder.ondataavailable = (e) => {
                audioChunks.current.push(e.data);
            }

            mediaRecorder.onstop = () => {
                // Listen to when we stop recording audio 
                // Then, convert our data to a Blob of type audio/mp3 and reset the ref
                const audioBlob = new Blob(audioChunks.current, { type: 'audio/mp3' });
                audioChunks.current = [];
            }
        }

    }, [mediaRecorder]);
    ...
}

It is time to wire this hook with our AudioRecorder component:

'use client'
import { useRecordVoice } from '@/hooks/useRecordVoice';

export default function AudioRecorder() {
    const { isRecording, stopRecording, startRecording } = useRecordVoice();

    async function handleClick() {
        if (isRecording) {
           stopRecording();
         } else {
         await startRecording();
       }
     }

    return (
        <div>

            <button onClick={handleClick}
                    className={`bg-blue-500 text-white px-4 py-2 rounded shadow-md hover:bg-blue-400 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 focus:ring-offset-white transition duration-300 ease-in-out absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2`}>
                {isRecording ? "Stop Recording" : "Start Recording"}
            </button>

        </div>
    )
}

Let’s go to the other side of the coin, the backend!

Setting up our Server side

We want to use our models on the server to keep things safe and run faster. Let’s create a new route and add a handler for it using route handlers from Next.js. In our App folder, let’s make an “Api” folder with the following route in it:

We want to use our models on the server to keep things safe and run faster. Let’s create a new route and add a handler for it using route handlers from Next.js. In our App folder, let’s make an “Api” folder with the following route in it:

app
 ↳api
  ↳chat
    ↳route.ts

Our route is called ‘chat’. In the route.ts file, we’ll set up our handler. Let’s start by setting up our OpenAI SDK.

const openai = getOpenai();

export async function POST(req: Request) {
  // our logic will go here
}

// inside a utils folder apps/utils/get-openai.ts
import OpenAI from 'openai';

const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
});

export function getOpenai() {
    return openai;
}

In this route, we’ll send the audio from the front end as a base64 string. Then, we’ll receive it and turn it into a Buffer object.

export async function POST(req: Request) {
    const { audio } = await req.json();
    const audioBuffer = Buffer.from(audio, 'base64');
 }

It’s time to use our first model. We want to turn this audio into text and use OpenAI’s Whisper Speech-To-Text model. Whisper needs an audio file to create the text. Since we have a Buffer instead of a file, we’ll use their ‘toFile’ method to convert our audio Buffer into an audio file like this:

import { toFile } from 'openai';

export async function POST(req: Request) {
    const { audio } = await req.json();
    const audioBuffer = Buffer.from(audio, 'base64');

    try {
        // FileLike object
        const audioFile = await toFile(audioBuffer, 'audio.mp3');

    } catch (err) {
        console.error(err);
        return NextResponse.json(
            {
                err: err,
                error: 'Error converting audio',
            },
            {
                status: 500,
            }
        );
    }
}

Notice that we specified “mp3”. This is one of the many extensions that the Whisper model can use. You can see the full list of supported extensions here: https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-file

Now that our file is ready, let’s pass it to Whisper! Using our OpenAI instance, this is how we will invoke our model:

import { toFile } from 'openai';
const openai = getOpenai();

export async function POST(req: Request) {
        ...
        const audioFile = await toFile(audioBuffer, 'audio.mp3');

        const transcription = await openai.audio.transcriptions.create({
            // here we specify the model
            model: 'whisper-1',
            // our audio file
            file: audioFile,
        });
        ...
}

That’s it! Now, we can move on to the next step: using Llama 3.1 to interpret this text and give us an answer. We’ll use two methods for this. First, we’ll use ‘ollama’ from the ‘ollama-ai-provider’ package, which lets us use this model with our locally running Ollama. Then, we’ll use ‘generateText’ from the Vercel AI SDK to generate the text. Side note: To make our Ollama run locally, we need to write the following command in the terminal:

ollama serve
import { toFile } from 'openai';
// new imports
import { ollama } from 'ollama-ai-provider';
import { generateText } from 'ai';

const openai = getOpenai();

export async function POST(req: Request) {
        ...
        const audioFile = await toFile(audioBuffer, 'audio.mp3');

        const transcription = await openai.audio.transcriptions.create({
            model: 'whisper-1',
            file: audioFile,
        });

        const { text: response } = await generateText({
            // we specify our model running locally in the background
            model: ollama('llama3.1'),
            // we can set initial instructions to our model
            system: 'You know a lot about video games',
            // the text we want the model to interpret
            prompt: transcription.text,
        });
        ...
}

Finally, we have our last model: TTS from OpenAI. We want to reply to our user with audio, so this model will be really helpful. It will turn our text into speech:

import { toFile } from 'openai';
// new imports
import { ollama } from 'ollama-ai-provider';
import { generateText } from 'ai';

const openai = getOpenai();

export async function POST(req: Request) {
        ...
        const audioFile = await toFile(audioBuffer, 'audio.mp3');

        const transcription = await openai.audio.transcriptions.create({
            model: 'whisper-1',
            file: audioFile,
        });

        const { text: response } = await generateText({
            model: ollama('llama3.1'),
            system: 'You know a lot about video games',
            prompt: transcription.text,
        });

        const voiceResponse = await openai.audio.speech.create({
            // Specify here our tts model
            model: 'tts-1',
            // we pass in our response
            input: response,
            // We can choose a variety of different voices
            // I chose 'onyx' but you can pick from this list: <https://platform.openai.com/docs/guides/text-to-speech/quickstart>
            voice: 'onyx',
        });
        ...
}

The TTS model will turn our response into an audio file. We want to stream this audio back to the user like this:

import { toFile } from 'openai';
import { getOpenai } from '@/utils/getOpenai';
import { ollama } from 'ollama-ai-provider';
import { NextResponse } from 'next/server';
import { generateText } from 'ai';

const openai = getOpenai();

export async function POST(req: Request) {
    const { audio } = await req.json();
    const audioBuffer = Buffer.from(audio, 'base64');

    try {
        const audioFile = await toFile(audioBuffer, 'audio.mp3');

        const transcription = await openai.audio.transcriptions.create({
            model: 'whisper-1',
            file: audioFile,
        });

        const { text: response } = await generateText({
            model: ollama('llama3.1'),
            system: 'You know a lot about video games',
            prompt: transcription.text,
        });

        const voiceResponse = await openai.audio.speech.create({
            model: 'tts-1',
            input: response,
            voice: 'onyx',
        });

        // stream back our audio
        return new Response(voiceResponse.body, {
            headers: {
                 // we specify the content type         
                'Content-Type': 'audio/mpeg',
                // we indicate that this is going to be streamed in chunks of data
                'Transfer-Encoding': 'chunked',
            },
        });
    } catch (err) {
        console.error(err);
        return NextResponse.json(
            {
                err: err,
                error: 'Error converting audio',
            },
            {
                status: 500,
            }
        );
    }
}

And that’s all the whole backend code! Now, back to the frontend to finish wiring everything up.

Putting It All Together

In our useRecordVoice.tsx hook, let's create a new method that will call our API endpoint. This method will also take the response back and play to the user the audio that we are streaming from the backend.

// app/hooks/useRecordVoice.tsx
...
export function useRecordVoice() {
// new state to track when our server is loading the response for us
const [loading, setLoading] = useState(false);

async function getResponse(audioBlob: Blob) {
  // We transform our audio to base64 to send it to the endpoint
  const audioBase64 = await transformBlobToBase64(audioBlob);

  try {
            setLoading(true);
            // Calling out "chat" endpoint
            const res = await fetch('/api/chat', {
                method: 'POST',
                // Sending our base64 audio here
                body: JSON.stringify({ audio: audioBase64 }),
                headers: {
                    'Content-Type': 'application/json',
                },
            });

            if (!res.ok) {
                throw new Error('Error getting response');
            }

      } catch (err) {
         console.error(err);
      } finally {
        setLoading(false);
      }

}

    useEffect(() => {
        if (mediaRecorder) {
            mediaRecorder.ondataavailable = (e) => {
                audioChunks.current.push(e.data);
            };

            mediaRecorder.onstop = () => {
                const audioBlob = new Blob(audioChunks.current, {
                    type: 'audio/mp3',
                });
                // we call our method here
                void getResponse(audioBlob);
                audioChunks.current = [];
            };
        }
    }, [mediaRecorder]);

...

// app/utils/transform-blob-to-base64.ts
export function transformBlobToBase64(blob: Blob): Promise<string> {
    return new Promise((resolve, reject) => {
        const reader = new FileReader();
        reader.onloadend= () => {
            resolve(reader?.result?.toString().split(',')[1] || '');
        }
        reader.onerror = reject;
        reader.readAsDataURL(blob);
    })
}

Great! Now that we’re getting our streamed response, we need to handle it and play the audio back to the user. We’ll use the AudioContext API for this. This API allows us to store the audio, decode it and play it to the user once it’s ready:

...
async function getResponse(audioBlob: Blob) {
    const audioBase64 = await transformBlobToBase64(audioBlob);

    try {
        setLoading(true);
        const res = await fetch('/api/chat', {
            method: 'POST',
            body: JSON.stringify({ audio: audioBase64 }),
            headers: {
                'Content-Type': 'application/json',
            },
        });

        if (!res.ok) {
            throw new Error('Error getting response');
        }

        // Create an instance of AudioContext
        const audioContext = new AudioContext();

        // Create a reader to read the streaming response
        const reader = res.body?.getReader();
        if (!reader) {
            throw new Error('Error getting response');
        }

        // Create a buffer source to store the audio
        const source = audioContext.createBufferSource();

        // Array to hold the audio chunks received from the backend
        let audioChunks: Uint8Array[] = [];

        // Flag to check if the audio streaming has finished
        let isDataStreamed = false;

        while (!isDataStreamed) {
            // Start reading the data
            const { value, done } = await reader.read();

            // If true, the stream has finished
            if (done) {
                isDataStreamed = true;
                break;
            }

            // Add each data chunk to our list of audio chunks
            if (value) {
                audioChunks.push(value);
            }
        }

        // Merge all buffer chunks into a single Uint8Array
        const audioBuffer = new Uint8Array(
            audioChunks.reduce(
                (acc, val) => acc.concat(Array.from(val)),
                [] as number[]
            )
        );

        // Decode the audio data and store it in our source buffer
        source.buffer = await audioContext.decodeAudioData(
            audioBuffer.buffer
        );

        // Connect the source to the audio output (speakers or headphones)
        source.connect(audioContext.destination);

        // Start playing the audio
        source.start(0);
    } catch (err) {
        console.error(err);
    } finally {
        setLoading(false);
    }
}

...

return {
    startRecording,
    stopRecording,
    isRecording,
    // Return the loading state
    loading,
};

And that's it! Now the user should hear the audio response on their device. To wrap things up, let's make our app a bit nicer by adding a little loading indicator:

// app/components/audio-recorder.tsx

'use client';
import { useRecordVoice } from '@/hooks/useRecordVoice';

export default function AudioRecorder() {
    const { isRecording, stopRecording, startRecording, loading } =
        useRecordVoice();
    async function handleClick() {
        if (isRecording) {
            stopRecording();
        } else {
            await startRecording();
        }
    }

    // New condition
    if (loading) {
        return <div>Loading...</div>;
    }

    return (
        <div>
            <button
                onClick={handleClick}
                className={`bg-blue-500 text-white px-4 py-2 rounded shadow-md hover:bg-blue-400 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 focus:ring-offset-white transition duration-300 ease-in-out absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2`}
            >
                {isRecording ? 'Stop Recording' : 'Start Recording'}
            </button>
        </div>
    );
}

Conclusion

In this blog post, we saw how combining multiple AI models can help us achieve our goals. We learned to run AI models like Llama 3.1 locally and use them in our Next.js app. We also discovered how to send audio to these models and stream back a response, playing the audio back to the user.

This is just one of many ways you can use AI—the possibilities are endless. AI models are amazing tools that let us create things that were once hard to achieve with such quality. Thanks for reading; now, it’s your turn to build something amazing with AI!

You can find the complete demo on GitHub: AI Assistant with Whisper TTS and Ollama using Next.js

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

You might also like

Vercel BotID: The Invisible Bot Protection You Needed cover image

Vercel BotID: The Invisible Bot Protection You Needed

Nowadays, bots do not act like “bots”. They can execute JavaScript, solve CAPTCHAs, and navigate as real users. Traditional defenses often fail to meet expectations or frustrate genuine users. That’s why Vercel created BotID, an invisible CAPTCHA that has real-time protections against sophisticated bots that help you protect your critical endpoints. In this blog post, we will explore why you should care about this new tool, how to set it up, its use cases, and some key considerations to take into account. We will be using Next.js for our examples, but please note that this tool is not tied to this framework alone; the only requirement is that your app is deployed and running on Vercel. Why Should You Care? Think about these scenarios: - Checkout flows are overwhelmed by scalpers - Signup forms inundated with fake registrations - API endpoints draining resources with malicious requests They all impact you and your users in a negative way. For example, when bots flood your checkout page, real customers are unable to complete their purchases, resulting in your business losing money and damaging customer trust. Fake signups clutter the app, slowing things down and making user data unreliable. When someone deliberately overloads your app’s API, it can crash or become unusable, making users angry and creating a significant issue for you, the owner. BotID automatically detects and filters bots attempting to perform any of the above actions without interfering with real users. How does it work? A lightweight first-party script quickly gathers a high set of browser & environment signals (this takes ~30ms, really fast so no worry about performance issues), packages them into an opaque token, and sends that token with protected requests via the rewritten challenge/proxy path + header; Vercel’s edge scores it, attaches a verdict, and checkBotId() function simply reads that verdict so your code can allow or block. We will see how this is implemented in a second! But first, let’s get started. Getting Started in Minutes 1. Install the SDK: ` 1. Configure redirects Wrap your next.config.ts with BotID’s helper. This sets up the right rewrites so BotID can do its job (and not get blocked by ad blockers, extensions, etc.): ` 2. Integrate the client on public-facing pages (where BotID runs checks): Declare which routes are protected so BotID can attach special headers when a real user triggers those routes. We need to create instrumentation-client.ts (place it in the root of your application or inside a src folder) and initialize BotID once: ` instrumentation-client.ts runs before the app hydrates, so it’s a perfect place for a global setup! If we have an inferior Next.js version than 15.3, then we would need to use a different approach. We need to render the React component inside the pages or layouts you want to protect, specifying the protected routes: ` 3. Verify requests on your server or API: ` - NOTE: checkBotId() will fail if the route wasn’t listed on the client, because the client is what attaches the special headers that let the edge classify the request! You’re all set - your routes are now protected! In development, checkBotId() function will always return isBot = false so you can build without friction. To disable this, you can override the options for development: ` What happens on a failed check? In our example above, if the check failed, we return a 403, but it is mostly up to you what to do in this case; the most common approaches for this scenario are: - Hard block with a 403 for obviously automated traffic (just what we did in the example above) - Soft fail (generic error/“try again”) when you want to be cautious. - Step-up (require login, email verification, or other business logic). Remember, although rare, false positives can occur, so it’s up to you to determine how you want to balance your fail strategy between security, UX, telemetry, and attacker behavior. checkBotId() So far, we have seen how to use the property isBot from checkBotId(), but there are a few more properties that you can leverage from it. There are: isHuman (boolean): true when BotID classifies the request as a real human session (i.e., a clear “pass”). BotID is designed to return an unambiguous yes/no, so you can gate actions easily. isBot (boolean): We already saw this one. It will be true when the request is classified as automated traffic. isVerifiedBot (boolean): Here comes a less obvious property. Vercel maintains and continuously updates a comprehensive directory of known legitimate bots from across the internet. This directory is regularly updated to include new legitimate services as they emerge. This could be helpful for allowlists or custom logic per bot. We will see an example in a sec. verifiedBotName? (string): The name for the specific verified bot (e.g., “claude-user”). verifiedBotCategory? (string): The type of the verified bot (e.g., “webhook”, “advertising”, “ai_assistant”). bypassed (boolean): it is true if the request skipped BotID check due to a configured Firewall bypass (custom or system). You could use this flag to avoid taking bot-based actions when you’ve explicitly bypassed protection. Handling Verified Bots - NOTE: Handling verified bots is available in botid@1.5.0 and above. It might be the case that you don’t want to block some verified bots because they are not causing damage to you or your users, as it can sometimes be the case for AI-related bots that fetch your site to give information to a user. We can use the properties related to verified bots from checkBotId() to handle these scenarios: ` Choosing your BotID mode When leveraging BotID, you can choose between 2 modes: - Basic Mode: Instant session-based protection, available for all Vercel plans. - Deep Analysis Mode: Enhanced Kasada-powered detection, only available for Pro and Enterprise plan users. Using this mode, you will leverage a more advanced detection and will block the hardest to catch bots To specify the mode you want, you must do so in both the client and the server. This is important because if either of the two does not match, the verification will fail! ` Conclusion Stop chasing bots - let BotID handle them for you! Bots are and will get smarter and more sophisticated. BotID gives you a simple way to push back without slowing your customers down. It is simple to install, customize, and use. Stronger protection equals fewer headaches. Add BotID, ship with confidence, and let the bots trample into a wall without knowing what’s going on....

This Dot AI Field Notes - Anatomy of a Coding Harness cover image

This Dot AI Field Notes - Anatomy of a Coding Harness

A coding agent is not magic, it’s a loop. We call this a harness. The harness is a deterministic layer of code that wraps an LLM. Claude Code is a harness. Codex is a harness. Pi is a harness. The harness, on initialization, provides to the LLM a system prompt defining all tools the harness implements for the LLM. Without the harness, you cannot read or modify files on the user’s local filesystem without them having to copy-and-pasting by hand. The harness is the final place where engineers can customize how coding agents do work before the LLM takes over. Think of the LLM as a train and the harness as the rails the train rides on. Below… one full task executed by a harness, traced step by step....

Introduction to Vercel’s Flags SDK cover image

Introduction to Vercel’s Flags SDK

Introduction to Vercel’s Flags SDK In this blog, we will dig into Vercel’s Flags SDK. We'll explore how it works, highlight its key capabilities, and discuss best practices to get the most out of it. You'll also understand why you might prefer this tool over other feature flag solutions out there. And, despite its strong integration with Next.js, this SDK isn't limited to just one framework—it's fully compatible with React and SvelteKit. We'll use Next.js for examples, but feel free to follow along with the framework of your choice. Why should I use it? You might wonder, "Why should I care about yet another feature flag library?" Unlike some other solutions, Vercel's Flags SDK offers unique, practical features. It offers simplicity, flexibility, and smart patterns to help you manage feature flags quickly and efficiently. It’s simple Let's start with a basic example: ` This might look simple — and it is! — but it showcases some important features. Notice how easily we can define and call our flag without repeatedly passing context or configuration. Many other SDKs require passing the flag's name and context every single time you check a flag, like this: ` This can become tedious and error-prone, as you might accidentally use different contexts throughout your app. With the Flags SDK, you define everything once upfront, keeping things consistent across your entire application. By "context", I mean the data needed to evaluate the flag, like user details or environment settings. We'll get into more detail shortly. It’s flexible Vercel’s Flags SDK is also flexible. You can integrate it with other popular feature flag providers like LaunchDarkly or Statsig using built-in adapters. And if the provider you want to use isn’t supported yet, you can easily create your own custom adapter. While we'll use Next.js for demonstration, remember that the SDK works just as well with React or SvelteKit. Latency solutions Feature flags require definitions and context evaluations to determine their values — imagine checking conditions like, "Is the user ID equal to 12?" Typically, these evaluations involve fetching necessary information from a server, which can introduce latency. These evaluations happen through two primary functions: identify and decide. The identify function gathers the context needed for evaluation, and this context is then passed as an argument named entities to the decide function. Let's revisit our earlier example to see this clearly: ` You could add a custom evaluation context when reading a feature flag, but it’s not the best practice, and it’s not usually recommended. Using Edge Config When loading our flags, normally, these definitions and evaluation contexts get bootstrapped by making a network request and then opening a web socket listening to changes on the server. The problem is that if you do this in Serverless Functions with a short lifespan, you would need to bootstrap the definitions not just once but multiple times, which could cause latency issues. To handle latency efficiently, especially in short-lived Serverless Functions, you can use Edge Config. Edge Config stores flag definitions at the Edge, allowing super-fast retrieval via Edge Middleware or Serverless Functions, significantly reducing latency. Cookies For more complex contexts requiring network requests, avoid doing these requests directly in Edge Middleware or CDNs, as this can drastically increase latency. Edge Middleware and CDNs are fast because they avoid making network requests to the origin server. Depending on the end user’s location, accessing a distant origin can introduce significant latency. For example, a user in Tokyo might need to connect to a server in the US before the page can load. Instead, a good pattern that the Flags SDK offers us to avoid this is cookies. You could use cookies to store context data. The browser automatically sends cookies with each request in a standard format, providing consistent (no matter if you are in Edge Middleware, App Router or Page Router), low-latency access to evaluation context data: ` You can also encrypt or sign cookies for additional security from the client side. Dedupe Dedupe helps you cache function results to prevent redundant evaluations. If multiple flags rely on a common context method, like checking a user's region, Dedupe ensures the method executes only once per runtime, regardless of how many times it's invoked. Additionally, similar to cookies, the Flags SDK standardizes headers, allowing easy access to them. Let's illustrate this with the following example: ` Server-side patterns for static pages You can use feature flags on the client side, but that will lead to unnecessary loaders/skeletons or layout shifts, which are never that great. Of course, it brings benefits, like static rendering. To maintain static rendering benefits while using server-side flags, the SDK provides a method called precompute. Precompute Precompute lets you decide which page version to display based on feature flags and then we can cache that page to statically render it. You can precompute flag combinations in Middleware or Route Handlers: ` Next, inside a middleware (or route handler), we will precompute these flags and create static pages per each combination of them. ` The user will never notice this because, as we use “rewrite”, they will only see the original URL. Now, on our page, we “invoke” our flags, sending the code from the params: ` By sending our code, we are not really invoking the flag again but getting the value right away. Our middleware is deciding which variation of our pages to display to the user. Finally, after rendering our page, we can enable Incremental Static Regeneration (ISR). ISR allows us to cache the page and serve it statically for subsequent user requests: ` Using precompute is particularly beneficial when enabling ISR for pages that depend on flags whose values cannot be determined at build time. Headers, geo, etc., we can’t know their value at build, so we use precompute() so the Edge can evaluate it on the fly. In these cases, we rely on Middleware to dynamically determine the flag values, generate the HTML content once, and then cache it. At build time, we simply create an initial HTML shell. Generate Permutations If we prefer to generate static pages at build-time instead of runtime, we can use the generatePermutations function from the Flags SDK. This method enables us to pre-generate static pages with different combinations of flags at build time. It's especially useful when the flag values are known beforehand. For example, scenarios involving A/B testing and a marketing site with a single on/off banner flag are ideal use cases. ` ` Conclusion Vercel’s Flags SDK stands out as a powerful yet straightforward solution for managing feature flags efficiently. With its ease of use, remarkable flexibility, and effective patterns for reducing latency, this SDK streamlines the development process and enhances your app’s performance. Whether you're building a Next.js, React, or SvelteKit application, the Flags SDK provides intuitive tools that keep your application consistent, responsive, and maintainable. Give it a try, and see firsthand how it can simplify your feature management workflow!...

AI Is Speeding Up Development. But Where Are the New Bottlenecks? cover image

AI Is Speeding Up Development. But Where Are the New Bottlenecks?

AI is accelerating development, but it’s also exposing everything else that’s broken. At the Leadership Exchange, leaders unpacked how AI is reshaping the SDLC and what organizations need to address beyond just coding to make adoption successful. Moderated by Rob Ocel, VP of Innovation at This Dot Labs, the panel featured Itai Gerchikov at Anthropic and Harald Kirschner, Principal Product Manager for GitHub Copilot & VS Code at Microsoft. Panelists explored the current state of AI adoption across the software development lifecycle and shared practical insights into how organizations can effectively integrate AI tools. Panelists discussed how companies are investing in AI tools, skills, and managed competency programs to support developers. While AI can dramatically accelerate coding, the panel emphasized that adoption affects every stage of the SDLC. Bottlenecks now appear in testing, DevOps, product delivery, and marketing as AI speeds up development. Organizations that address technical debt and process inefficiencies are better positioned to extract maximum value from AI tools. The conversation also focused on opportunities and risks. Security, governance, and workforce education were highlighted as critical factors for adoption. Panelists stressed that AI initiatives should be aligned with broader business goals rather than pursued in isolation. They noted that companies experimenting at the cutting edge need to consider organizational readiness just as carefully as technical capabilities. Panelists also explored how leading organizations are navigating the early stages of adoption. Those ahead of the curve are using structured experimentation, prioritizing process improvements, and continuously evaluating outcomes to refine their AI strategies. Learning from these early adopters allows other organizations to anticipate emerging trends and prepare for the next phase of AI adoption rather than simply replicating past approaches. Key Takeaways - Investing in AI skills and tools should be done thoughtfully, with clear alignment to business objectives. - Examining the full SDLC helps identify bottlenecks that AI may accelerate or expose. - Organizations can gain a competitive advantage by learning from early adopters and planning for where AI adoption is heading. AI adoption is not just a technical initiative; it is a strategic transformation that requires attention to people, process, and technology. Organizations that balance innovation with operational discipline will be best positioned to capture the full potential of AI across the software lifecycle. Seeing similar challenges in your own SDLC? Let’s compare notes. Join us at an upcoming Leadership Exchange or reach out to continue the conversation. Tracy can be reached at tlee@thisdot.co....

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Prefer email? hi@thisdot.co