In this tutorial, we'll walk you through integrating dbt with Dagster using a smaller version of dbt's example jaffle shop project, the dagster-dbt library, and a data warehouse, such as DuckDB.
Dagster’s software-defined assets (SDAs) bear several similarities to dbt models. A software-defined asset contains an asset key, a set of upstream asset keys, and an operation that is responsible for computing the asset from its upstream dependencies. Models defined in a dbt project are similar to Dagster SDAs in that:
The asset key for a dbt model is (by default) the name of the model.
The upstream dependencies of a dbt model are defined with ref or source calls within the model's definition.
The computation required to compute the asset from its upstream dependencies is the SQL within the model's definition.
These similarities make it natural to interact with dbt models as SDAs. Let’s take a look at a dbt model and an SDA, in code:
Here's what's happening in this example:
The first code block is a dbt model
As dbt models are named using file names, this model is named orders
The data for this model comes from a dependency named raw_orders
The second code block is a Dagster asset
The asset key corresponds to the name of the dbt model, orders
raw_orders is provided as an argument to the asset, defining it as a dependency
To have git installed. If it's not installed already (find out by typing git in your terminal), you can install it using the instructions on the git website.
To install dbt, Dagster, and the Dagster webserver/UI. Run the following to install everything using pip:
The dagster-dbt library installs both dbt-core and dagster as dependencies. dbt-duckdb is installed as you'll be using DuckDB as a database during this tutorial. Refer to the dbt and Dagster installation docs for more info.