dbt seed Command: Usage & Examples

Introduction

dbt is a command-line tool that enables data analysts and engineers to transform data in their warehouses more effectively. One of the powerful features of dbt is the seed command, which allows you to load CSV files (referred to as “seeds”) into your data warehouse. This tutorial will guide you through the process of using the dbt seed command.

Understanding dbt Seeds

In dbt, seeds are CSV files that you can load into your data warehouse. They are particularly useful for handling static data that changes infrequently. For instance, you might have a CSV file containing a list of country codes and their corresponding country names. This data can be loaded into your warehouse as a seed and referenced in your dbt models.

dbt seed Use Cases

Mapping Codes to Descriptive Names

As we’ve seen in the previous example, you can use a seed to map product codes to product names. This can be extended to any scenario where you have codes that need to be mapped to more descriptive names. For example, you might have a seed that maps error codes to error descriptions, or abbreviations to their full forms.

Excluding Certain Data from Analysis

Suppose you have a list of test email addresses or user IDs that you want to exclude from your analysis. You can create a seed with these email addresses or user IDs, and then use the dbt seed command to load this data into your warehouse. In your dbt models, you can then exclude these test email addresses or user IDs from your analysis.

Loading Small Reference Datasets

If you have small reference datasets that don’t change often, it can be more efficient to load them into your warehouse as seeds rather than storing them as tables in your source databases. For example, you might have a small dataset of exchange rates that you want to use in your dbt models.

Data Validation

You can use seeds to validate the data in your warehouse. For example, you might have a seed that contains the expected results of a certain calculation. You can then create a dbt test that compares the actual results in your warehouse to the expected results in your seed.

Machine Learning Model Testing

If you’re a data scientist or machine learning engineer, you can use seeds to load test data into your warehouse. You can then use this test data to evaluate the performance of your machine learning models.

Creating Your First dbt Seed

Let’s create a CSV file to use as a seed. For this tutorial, we’ll use a simple CSV file containing product codes and their corresponding product names. Save the following data in a file named **product_codes.csv**:

product_code,product_name
PRD01,Apple iPhone 13
PRD02,Samsung Galaxy S21
PRD03,Google Pixel 6

Place this file in the seeds directory of your dbt project.

Loading dbt Seeds into Your Data Warehouse

To load the seed into your data warehouse, run the following command in your terminal:

$ dbt seed

This command will load all CSV files in the seeds directory into your data warehouse. If you want to load a specific seed, you can use the --select flag followed by the name of the seed (without the .csv extension). For example:

$ dbt seed --select product_codes

Referencing dbt Seeds in Your Models

Once your seed is loaded into your data warehouse, you can reference it in your dbt models using the ref function. For example, if you have a model that needs to join with the **product_codes** seed, you could do something like this:

select    
  orders.*,    
  product_codes.product_name

from {{ ref('orders') }} as orders

  left join {{ ref('product_codes') }} as product_codes
    on orders.product_code = product_codes.product_code

Configuring and Testing dbt Seeds

You can configure your seeds in the dbt_project.yml file. For example, you can specify the schema where the seed data should be loaded. You can also document and test your seeds by declaring properties in YAML. For more information, check out the dbt documentation.

Conclusion

The dbt seed command is a powerful tool for loading static data into your data warehouse. By using seeds, you can keep your transformation logic within dbt, ensuring that your data transformations are consistent, version controlled, and code reviewable.

dbt run-operation

dbt snapshot

Unified workspace for your dbt workflow

Forget about the painful parts of dbt development, focus on what matters the most - data analysis

Join the beta