Advanced dbt Project Configuration: A Comprehensive Step-by-Step Guide

Last Update: January 9, 2025
Advanced dbt Project Configuration
Table of Contents
Contributors
Picture of Vivasoft Data Engineering Team
Vivasoft Data Engineering Team
Tech Stack
0 +
Want to accelerate your software development company?

It has become a prerequisite for companies to develop custom software products to stay competitive.

Project Overview

Best practices and advanced setups for creating and managing a strong DBT project are covered in this project. Tests, documentation, seed, package, incremental models, macros, and model configurations are all covered in detail. You may design an effective and scalable DBT project that guarantees data integrity and quality by using this type of approach.

Prior to delving into this advanced guide, we advise anyone who are unfamiliar with DBT to begin with the beginner guide in order to have a basic grasp.

You can find the beginner guide here.

Technology Used

Python, dbt, SQL

Toplevel Workflow

dbt_functionality

dbt Environment

In this step, we’ll walk you through the process of setting up your DBT project, including how to connect to data sources and set up the environment. This setup ensures that you have a solid foundation on which to build and effectively maintain your data models.
				
					├──dbt_project
 |	└──analyses: save SQL for reporting purpose
 |      └──checks.yml
 |  └──dbt_packages: all dbt libraries
 |  └──logs
 |       └──dbt.log: dbt logs when each dbt runs models
 |  └──macros
 |       └──date_operation.sql(example): fn to use across dbt project
 |	└──models: Core logic to build the data warehouse
 |       └──source.yml: Top level database configuration to use
 |       └──schema.yml: Top level documentation
 |  └──seeds: To export file as table
 |  └──snapshots: For SCD(slowly changing dimensions)
 |  └──target: All complied SQL files dbt executes
 |  └──tests: Custom test SQL to ensure data quality
 |        └──source.yml: Top level schem
          └──generic
                └──equal_row_count.sql: custom test functionality

				
			

dbt_project.yml

This file serves as your DBT project’s primary configuration file. It outlines the settings that DBT employs to execute your project. Here is a basic explanation of the main elements and configurations. The model’s structure, variables, seeds, and other elements can be defined.
				
					vars:  # to use the variable across the dbt project 
    my_variable: 'value'

models:  # core components of dbt 
     dbt_poject_name:
         materialized: view  # top level configuration
         src:  # sub folder name in models dir
            +schema: src   # db schema name → _src
            +materialized: table  # table type
            + tags: [‘src’, ‘table’, ‘incremental’]  # optional → good for dbt doc 
         dim:  # sub folder name in models dir
            +schema: dim   # db schema name → dbt_dim
            +materialized: view  # view type
            + tags: [‘dim’, ‘view’]  # optional → good for dbt doc 
         fact:  # sub folder name in models dir
            +schema: fact   # db schema name → dbt_fact
            +materialized: table  # table type
            + tags: [‘fact’, ‘table’, ‘incremental’]  # optional → good for dbt doc 

				
			

packages.yml

Dependencies on other DBT packages are managed via the file in a DBT project. With the use of this file, you can quickly install and manage external DBT packages that offer more features, models, macros, and more.
[doc]

Configuration

				
					packages:
  - package: dbt-labs/dbt_utils
    version: 1.3.0

				
			

Install

				
					//To install, run: 
dbt deps

				
			

macros

Reusable SQL snippets and functions that can streamline and improve your data transformation processes are kept in this folder within a DBT project. DBT macros can be used to encapsulate complex logic, eliminate repetition, and improve maintainability. They are created in Jinja, a Python templating language.

Key Components

  1. Purpose of Macros: It makes your code more modular and maintainable by enabling you to write reusable SQL segments that can be called from several models.

Creating Macros

  1. Macros are defined in .sql files within the macros directory.
  2. A macro is defined using the Jinja macro keyword.

Example

Create a SQL file named str_ops.sql under macros dir.
				
					-- trim and lower a col val
{% macro lower_and_trim(col) %}  – name of macro
    LOWER(TRIM({{col}}))  – sql operation
{% endmacro %}

				
			

seeds

The purpose of this folder in a DBT project is to hold CSV files that may be loaded as tables into your data warehouse. For static data—like lookup tables, reference data, or configuration settings—that doesn’t change frequently, seeds are helpful. These datasets may be readily managed and version controlled alongside your other DBT models by putting CSV files in the seeds folder and specifying them in your DBT project.

Configuration

  1. Upload a CSV file (e.g., country_codes.csv) under seeds directory
  2. Create a properties.yml file
				
					version: 2

seeds:
  - name: country_codes
    config:
      quote_columns: true
      delimiter: ','
      column_types:
        FIFA: text
        Global Code: text

				
			

dbt_project.yml Configuration

				
					seeds:
  +schema: external_data
  +ful_refresh: true

				
			

Load csv data as table

Run the following command to load csv data as a table:
				
					dbt seed
				
			

tests

Custom test definitions that help guarantee the quality and accuracy of your data transformations are kept in this folder within a DBT project. DBT tests are intended to verify data in your models and identify problems at an early stage of the data pipeline. Data testing and schema tests are the two primary categories of tests in DBT.

Key Components

1. Purpose of Tests

  • Tests verify your data’s assumptions, ensuring data integrity and quality.
  • They assist in the early detection of mistakes and discrepancies in the data stream.
  1.  

2. Types of Tests

  • Schema Tests: Verify your data’s constraints, including uniqueness, non-null values, and column connections.
  • Data Tests: custom SQL queries that verify the accuracy of the data and business logic.

test file configuration

  1. Create a custom test file named equal_row_count.sql under tests/generic folder
  2. Sample code:
				
					{% test equal_row_count(model, column_name, src) %}

SELECT
    COUNT(1)
FROM 
    {{ model }}
HAVING 
    COUNT(1) != (SELECT COUNT(1) FROM {{ src }})

{% endtest %}

				
			

3. Create schema.yml file under tests folder

				
					version: 2
data_tests:
  - name: equal_row_count
    description: >
      check actor model row count != dvdrental.public.actor

				
			
4. Update the schema.yml in models dir
				
					version: 2

models:
  - name: actor
    description: Incremental table with unique actor id
      - name: actor_id
        description: actor unique identification.
        data_tests:
          - unique  # dbt buitin test
          - not_null  # dbt buitin test
          - equal_row_count:  # dbt custom test
              	src: source('stg_data', 'actor')
				
			

Test dbt

To test, run:
				
					dbt test
				
			

Conclusion

Advanced configurations enable a more efficient and effective data engineering process by enabling you to fully utilize DBT. These options give you the resources you need to be successful, whether you’re working with a big team, maintaining intricate data models, or guaranteeing good data quality.
Remember to experiment, document the findings, and share your understanding with the community as you continue to investigate and use sophisticated DBT configurations. DBT’s strength is derived not only from its technical prowess but also from the combined knowledge and cooperation of its users.
Happy modeling!
Potential Developer
Tech Stack
0 +
Accelerate Your Software Development Potential with Us
With our innovative solutions and dedicated expertise, success is a guaranteed outcome. Let's accelerate together towards your goals and beyond.
Blogs You May Love

Don’t let understaffing hold you back. Maximize your team’s performance and reach your business goals with the best IT Staff Augmentation