Skip to content

Seeding Initial Data in Amplify

Challenges With Amplify

Amplify is a powerful framework and toolchain that enables developers to build their frontend applications with a serverless backend using all major languages and platforms. It relieves the developers from the effort of writing a backend from scratch - the entire backend stack is provided by AWS.

However, one of the things that does not come with Amplify out of the box, at least at this moment, is a way to perform initial database seeding right after setting up an environment. Database seeding is the process of populating an empty database with data. The data can be some fake placeholder data to use in development, or some necessary data such as a set of initial configuration properties required for the proper functioning of the application.

In this blog post, we'll walk you through our attempt to write our own database seeding script, while having some guiding principles in mind. First of all, the script should be generic - it should not be specific to a particular project. Anyone should be able to copy it and place it in their project with no or minimal adjustments required. Secondly, the database seeding process should be automatic and ideally not require any manual steps by the user - the user should just execute the script and the database should be populated after the script is finished running. The script is responsible for reading the Amplify environment, and putting the data in proper tables.

Prerequisites

Before proceeding further, note that some of the concepts presented here assume at least basic Amplify experience. Likewise, there should be an existing Amplify project. For this purpose, we scaffolded a simple Angular project using Amplify starter project instructions and the entire project is available on GitHub. The tutorial also assumes that a GraphQL API is generated as part of the project, just like in the starter project.

Looking at the official AWS guide for writing items to a DynamoDB database in batch, we know that we need several pieces of information before making a connection to the database:

  • AWS credentials
  • The profile under which AWS credentials are stored
  • Region
  • Amplify environment name
  • Database ID (Tables created by Amplify have the format of tableName-databaseId-environmentName)

A recommended way of reading AWS credentials is through a shared credentials file. The credentials file is located in the .aws folder of your user folder and contains your access key ID and secret access key which are used to sign programmatic requests to AWS. It should have been created after executing amplify configure, and before initializing the Amplify project. The file has the following format:

[your-profile]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

There is one additional file that contains your region preferences. It is named config, and it is also located in the .aws folder:

[profile your-profile]
region = eu-central-1

Extracting Information From Amplify Environment

Now that we have our credentials, we need to find the remaining information. Going through the amplify folder (generated by the Amplify CLI) we see that all information that we need is already there, but scattered across multiple files.

For example:

  • The environment name is located in amplify/.config/local-env-info.json
  • The profile name is located in amplify/.config/local-aws-info.json, assuming that the Amplify project was initialized using a local AWS profile name and not IAM keys
  • Database table ID and region is located in amplify/backend/amplify-meta.json

The risk in this approach is that this logic could change in the future if AWS decides to change the format of any of the above files, which aren't really meant for this kind of use. However, the benefit is ease and simplicity of use, and we feel that the benefits outweigh the risks. Should the files change in the future, we'll simply update the seeding script.

Writing the Script

Reading the environment and the profile name from local-env-info.json and local-aws-info.json is easy. Reading the database table ID and region is a bit tricky since we need to know the GraphQL API name first. Each GraphQL API will generate its own set of database tables on DynamoDB, so knowing GraphQL API name will easily get us the database table ID as well. Fortunately, amplify-meta.json has the list of APIs under the api field, and we'll iterate over this object to get API names.

Within the API object, there are also two fields that we are interested in. The first one is the name of the provider, found in the providerPlugin field. We'll use this to fetch the region from the providers root field in amplify-meta.json. The other one is output.GraphQLAPIIdOutput, which is the GraphQL API ID, and the same ID is used as the database ID for the tables.

Having everything we said so far in mind, the first version of the script is:

const AWS = require('aws-sdk');
const localEnvInfo = require('../../amplify/.config/local-env-info.json');
const localAwsInfo = require('../../amplify/.config/local-aws-info.json');
const amplifyMeta = require('../../amplify/backend/amplify-meta.json');

const environmentName = localEnvInfo.envName;
const profileName = localAwsInfo[environmentName]?.profileName;

if (!profileName) {
  throw Error('Please reinitialize your Amplify project using your AWS profile');
}

for (const [apiName] of Object.entries(amplifyMeta.api)) {
  const providerName = amplifyMeta.api[apiName].providerPlugin;
  const databaseId = amplifyMeta.api[apiName].output.GraphQLAPIIdOutput;
  const region = amplifyMeta.providers[providerName].Region;

  AWS.config.credentials = new AWS.SharedIniFileCredentials({ profile: profileName });
  AWS.config.update({ region: region })
  const documentClient = new AWS.DynamoDB.DocumentClient();

  // ToDo: read seed data and write to DynamoDB
}

The script reads the local Amplify environment from local-env-info.json, local-aws-info.json, amplify-meta.json, performs some safety checks, then iterates over each API found in amplify-meta.json. In the end, it creates an instance of the DocumentClient class which is used to write to DynamoDB.

Organizing Seed Data

The missing step is reading and writing seed data. Again, we don't want any manual configuration here - the script should just run, read the seed data and use the Amplify information we extracted earlier to insert the data to the DynamoDB database. For this to work, we'll organize our directories and files in a way that they are easily mapped to objects on AWS. The directory structure we're aiming for is this:

tools/
β”œβ”€ seeder/
β”‚  β”œβ”€ index.js (this is our seeding script)
β”‚  β”œβ”€ fixtures/
β”‚  β”‚  β”œβ”€ [API 1 name]/
β”‚  β”‚  β”‚  β”œβ”€ [Table 1 name].json
β”‚  β”‚  β”‚  β”œβ”€ [Table 2 name].json
β”‚  β”‚  β”‚  β”œβ”€ ...
β”‚  β”‚  β”œβ”€ [API 2 name]/
β”‚  β”‚  β”œβ”€ .../

For each API found in amplify-meta.json, the script will find the respective directory under fixtures, and iterate over JSON files found in that directory. Each file's name corresponds to the table name we're populating, and the content of the file is a JSON array with table items. For example, the JSON file to populate the Restaurant's table from our Angular starter project would be named Restaurants.json and would have the following contents:

[
  {
    "city": "Menton",
    "description": "",
    "name": "Mirazur"
  },
  {
    "city": "Copenhagen",
    "description": "",
    "name": "Noma"
  },
  {
    "city": "Axpe",
    "description": "",
    "name": "Asador Etxebarri"
  }
]

DynamoDB's DocumentClient

The next step is implementing the logic that reads JSON files, and writes them to DynamoDB. DocumentClient's batchWrite method allows providing multiple items for multiple tables, all in the same request. This means that the method will be called only once, and all items will be pushed in batch, which is very performant.

The structure of this write request looks as follows:

const writeParams = {
  RequestItems: {
    table1Name: [
      {
        PutRequest: {
          Item: {
            id: 1,
            field1:'field 1 value',
            field2: 'field 2 value',
            //...
          },
        }
      },
      {
        PutRequest: {
          Item: {
            id: 2,
            field1:'field 1 value',
            field2: 'field 2 value',
            //...
          },
        }
      },
      //...
    ]
  }
};

The table name is the complete table name, which is in the form of baseTableName-databaseId-environmentName when using Amplify. baseTableName normally comes from the GraphQL schema type, databaseId is equal to GraphQL API ID, while environmentName is the Amplify environment name (such as dev).

In the PutRequest, we should specify all fields that would normally be provisioned when using the GraphQL API. This includes some fields that Amplify uses internally, like:

  • id (primary key, which must be unique and can be generated using uuidv4() for example)
  • __typename (equal to type in the GraphQL schema, which is normally equal to baseTableName)
  • _lastChangedAt (can be the current date, written as Unix timestamp)
  • _version (can be 1)
  • createdAt (can be the current date, written as ISO-8601 string)
  • updatedAt (can be the current date, written as ISO-8601 string)

Other, external fields, can be provided from seed data.

Written in code, this piece of logic looks like this:

const writeParams = {
  RequestItems: {}
};
const baseTableNames = readdirSync(join(__dirname, 'fixtures', apiName)).map((filename) => parse(filename).name);
for (const baseTableName of baseTableNames) {
  const fullTableName = `${baseTableName}-${databaseId}-${environmentName}`;
  const tableItems = require(`./fixtures/${apiName}/${baseTableName}.json`);
  writeParams.RequestItems[fullTableName] = tableItems.map((tableItem) => ({
    PutRequest: {
      Item: {
        id: uuidv4(),
        __typename: baseTableName,
        _lastChangedAt: new Date().getTime(),
        _version: 1,
        createdAt: new Date().toISOString(),
        updatedAt: new Date().toISOString(),
        ...tableItem
      }
    }
  }));
}

documentClient.batchWrite(writeParams, (error, data) => {
  if (error) {
    console.error('Error in batch write', error);
  } else {
    console.error('Successfully executed batch write', data);
  }
});

Conclusion

And that's it - the script is complete. In just over 50 lines of code, we now have an efficient way of populating multiple DynamoDB tables with initial data. The initial data itself is stored in JSON files, which itself is quite readable and anyone can customize it to their project needs.

We hope you enjoyed this tutorial. Should you have any questions, do not hesitate to drop us a line.