Advanced dbt Project Configuration: A Comprehensive Step-by-Step Guide

Vivasoft Team

Published on

09.01.2025

Time to Read

4 min

Project Overview

Best practices and advanced setups for creating and managing a strong DBT project are covered in this project. Tests, documentation, seed, package, incremental models, macros, and model configurations are all covered in detail. You may design an effective and scalable DBT project that guarantees data integrity and quality by using this type of approach.

Prior to delving into this advanced guide, we advise anyone who are unfamiliar with DBT to begin with the beginner guide in order to have a basic grasp.

You can find the beginner guide here.

Technology Used

Python, dbt, SQL

Toplevel Workflow

dbt Environment

In this step, we’ll walk you through the process of setting up your DBT project, including how to connect to data sources and set up the environment. This setup ensures that you have a solid foundation on which to build and effectively maintain your data models.

				
					├──dbt_project
 |	└──analyses: save SQL for reporting purpose
 |      └──checks.yml
 |  └──dbt_packages: all dbt libraries
 |  └──logs
 |       └──dbt.log: dbt logs when each dbt runs models
 |  └──macros
 |       └──date_operation.sql(example): fn to use across dbt project
 |	└──models: Core logic to build the data warehouse
 |       └──source.yml: Top level database configuration to use
 |       └──schema.yml: Top level documentation
 |  └──seeds: To export file as table
 |  └──snapshots: For SCD(slowly changing dimensions)
 |  └──target: All complied SQL files dbt executes
 |  └──tests: Custom test SQL to ensure data quality
 |        └──source.yml: Top level schem
          └──generic
                └──equal_row_count.sql: custom test functionality

dbt_project.yml

This file serves as your DBT project’s primary configuration file. It outlines the settings that DBT employs to execute your project. Here is a basic explanation of the main elements and configurations. The model’s structure, variables, seeds, and other elements can be defined.

				
					vars:  # to use the variable across the dbt project 
    my_variable: 'value'

models:  # core components of dbt 
     dbt_poject_name:
         materialized: view  # top level configuration
         src:  # sub folder name in models dir
            +schema: src   # db schema name → _src
            +materialized: table  # table type
            + tags: [‘src’, ‘table’, ‘incremental’]  # optional → good for dbt doc 
         dim:  # sub folder name in models dir
            +schema: dim   # db schema name → dbt_dim
            +materialized: view  # view type
            + tags: [‘dim’, ‘view’]  # optional → good for dbt doc 
         fact:  # sub folder name in models dir
            +schema: fact   # db schema name → dbt_fact
            +materialized: table  # table type
            + tags: [‘fact’, ‘table’, ‘incremental’]  # optional → good for dbt doc

packages.yml

Dependencies on other DBT packages are managed via the file in a DBT project. With the use of this file, you can quickly install and manage external DBT packages that offer more features, models, macros, and more.
[doc]

Configuration

				
					packages:
  - package: dbt-labs/dbt_utils
    version: 1.3.0

Install

				
					//To install, run: 
dbt deps

macros

Reusable SQL snippets and functions that can streamline and improve your data transformation processes are kept in this folder within a DBT project. DBT macros can be used to encapsulate complex logic, eliminate repetition, and improve maintainability. They are created in Jinja, a Python templating language.

Key Components

Purpose of Macros: It makes your code more modular and maintainable by enabling you to write reusable SQL segments that can be called from several models.

Creating Macros

Macros are defined in .sql files within the macros directory.
A macro is defined using the Jinja macro keyword.

Example

Create a SQL file named str_ops.sql under macros dir.

				
					-- trim and lower a col val
{% macro lower_and_trim(col) %}  – name of macro
    LOWER(TRIM({{col}}))  – sql operation
{% endmacro %}

seeds

The purpose of this folder in a DBT project is to hold CSV files that may be loaded as tables into your data warehouse. For static data—like lookup tables, reference data, or configuration settings—that doesn’t change frequently, seeds are helpful. These datasets may be readily managed and version controlled alongside your other DBT models by putting CSV files in the seeds folder and specifying them in your DBT project.

Configuration

Upload a CSV file (e.g., country_codes.csv) under seeds directory
Create a properties.yml file

				
					version: 2

seeds:
  - name: country_codes
    config:
      quote_columns: true
      delimiter: ','
      column_types:
        FIFA: text
        Global Code: text

dbt_project.yml Configuration

				
					seeds:
  +schema: external_data
  +ful_refresh: true

Load csv data as table

Run the following command to load csv data as a table:

				
					dbt seed

tests

Custom test definitions that help guarantee the quality and accuracy of your data transformations are kept in this folder within a DBT project. DBT tests are intended to verify data in your models and identify problems at an early stage of the data pipeline. Data testing and schema tests are the two primary categories of tests in DBT.

Key Components

1. Purpose of Tests

Tests verify your data’s assumptions, ensuring data integrity and quality.
They assist in the early detection of mistakes and discrepancies in the data stream.

2. Types of Tests

Schema Tests: Verify your data’s constraints, including uniqueness, non-null values, and column connections.
Data Tests: custom SQL queries that verify the accuracy of the data and business logic.

test file configuration

Create a custom test file named equal_row_count.sql under tests/generic folder
Sample code:

				
					{% test equal_row_count(model, column_name, src) %}

SELECT
    COUNT(1)
FROM 
    {{ model }}
HAVING 
    COUNT(1) != (SELECT COUNT(1) FROM {{ src }})

{% endtest %}

3. Create schema.yml file under tests folder

				
					version: 2
data_tests:
  - name: equal_row_count
    description: >
      check actor model row count != dvdrental.public.actor

4. Update the schema.yml in models dir

				
					version: 2

models:
  - name: actor
    description: Incremental table with unique actor id
      - name: actor_id
        description: actor unique identification.
        data_tests:
          - unique  # dbt buitin test
          - not_null  # dbt buitin test
          - equal_row_count:  # dbt custom test
              	src: source('stg_data', 'actor')

Test dbt

To test, run:

				
					dbt test

Conclusion

Advanced configurations enable a more efficient and effective data engineering process by enabling you to fully utilize DBT. These options give you the resources you need to be successful, whether you’re working with a big team, maintaining intricate data models, or guaranteeing good data quality.

Remember to experiment, document the findings, and share your understanding with the community as you continue to investigate and use sophisticated DBT configurations. DBT’s strength is derived not only from its technical prowess but also from the combined knowledge and cooperation of its users.

Happy modeling!

50+ companies rely on our top 1% talent to scale their dev teams.

Case Study

Excellence Our minimum bar.

It has become a prerequisite for companies to develop custom software.

Top Software Development Company on GoodFirms

We've stopped counting. Over 50 brands count on us.

Our company specializes in software outsourcing and provides robust, scalable, and efficient solutions to clients around the world.

Heartfelt appreciation to Vivasoft Limited for believing in my vision. Their talented developers can take any challenges against all odds and helped to bring Klikit into life.appreciation to Vivasoft Limited for believing in my vision. Their talented developers can take any challenges.

Start with a dedicated squad in 7 days

NDA first, transparent rates, agile delivery from day one.

Blogs You May Love

Don’t let understaffing hold you back. Maximize your team’s performance and reach your business goals with the best IT Staff Augmentation

Advanced dbt Project Configuration: A Comprehensive Step-by-Step Guide

Project Overview

Technology Used

Toplevel Workflow

dbt Environment

dbt_project.yml

packages.yml

Configuration

Install

macros

Key Components

Creating Macros

Example

seeds

Configuration

dbt_project.yml Configuration

Load csv data as table

tests

Key Components

test file configuration

Test dbt

Conclusion

Download Company Profile

Get Free Consultancy