As your data team scales its analytics operations, you’ll need to strike that perfect balance between dbt and Snowflake.
This is easier said than done though.
Many organizations struggle with rising costs and performance bottlenecks when implementing dbt and Snowflake together. The key lies in following proven expert optimization strategies that maximize both performance and cost efficiency.
Ready to learn how to set up your Snowflake database environment with dbt-specific roles, table clustering, and the best materialization for your models?
Read on to learn exactly why you should use dbt with Snowflake, and the six best practices to maintain optimal performance and efficiency.
Why Use dbt With Snowflake?
Individually, dbt and Snowflake are both powerful tools. You can run quick data queries, automate your regularly used data models, and scale up or down as your workload requires. Combined though, you can bring out Snowflake’s strengths as a data warehouse and dbt as a data transformation tool.
In particular, dbt’s ability to do the following makes integration with Snowflake not only easy, but also appealing:
- Specify databases, users, schemas, and roles directly into your project
- Provide built-in configuration to specify clustering policies, data masking, and materialization
Though both Snowflake and dbt are easy to use tools, there are a few tripping points that might catch you by surprise. This guide breaks down how to:
- Set up your Snowflake database environment
- Create dbt-specific roles for specific users
- Use table clustering for specific tables
- Choose the best materialization for your models
- Use incremental models to save with Snowflake compute
- Invest in third-party Snowflake optimization tools
dbt Best Practices With Snowflake
These are the best practices we’ve seen customers get the most ROI from after implementation. Depending on the size of your company and industry, you may see more or less optimization from these best practices.
Best Practice #1: Define Development and Production Databases
This best practice is pretty straightforward: set up separate development and product environments. It aligns with your standard software engineering best practice of using different environments.
Your development environment is where you’ll write and test code for your dbt models, and your production environment is where those models will run whenever you need them (daily, hourly, etc.).
Why create these two separate environments? That’s because separate data environments:
- Prevent accidental data corruption in production
- Allow for safe new model testing
- Keep your production workflow stable
- Catch errors early
- Maintain data quality standards
How do you do this in Snowflake? Setup is relatively simple:
- Create two databases in your data warehouse with clearly defined names (e.g., database_dev and database_pro).
- Specify which databases should be used in your different testing environments with the profiles.yml file.
- Pick two targets – one is for your development (local environment) and one is for your production (scheduling tool).
- For each target, specify the name of the database you want to send the data to.
- Make sure to specify the specific schemas to send your data to in your dbt_project.yml file. Try to mimic the structure of your project.
- Specify the same folder structure you want to use in the models directory and subdirectory (if you’re using a different schema from the directory).
Doing this means when you run one of these models in development or production, you can find the schema and corresponding databases in Snowflake.
Best Practice #2: Set Up dbt Snowflake Role and Users
dbt stands out from Snowflake because of its ability to let you specify the user and role you want to use when running dbt in Snowflake with your profiles.yml.
Why is this so useful?
It lets you maintain solid data governance. You can make sure the dbt only accesses the exact resources it needs from Snowflake. Nothing more.
When you’re setting up these roles, try two different ones for dbt:
- Transformer: Role designed to work with dbt in development
- Orchestrator: Role designed to work with dbt in production
This keeps you from accidentally writing to your production database when you’re working locally in dbt.
You’ll also want to create specific users, usually for your analytics and data engineers. Best practices recommend:
- Creating a Snowflake user for every person who works directly with dbt and Snowflake.
- Creating a user for every tool used.
Here’s an example of how you can configure your profiles.yml file for separate development and production environments:
# Example profiles.yml configuration
your_project_name:
target: dev
outputs:
dev:
type: snowflake
account: your_account
user: your_user
database: analytics_dev
schema: dbt_dev
prod:
type: snowflake
account: your_account
user: your_user
database: analytics_prod
schema: dbt_prod
Doing this means when you run one of these models in development or production, you can find the schema and corresponding databases in Snowflake.
Best Practice #3: Use Table Clustering For High-Volume Tables
Table clustering is a Snowflake performance optimization technique perfect for tables with lots of data. dbt takes this strategy to the next level. You won’t have to manually specify table clustering for each query or data model, but can instead set things up to optimize query performance automatically through dbt’s configuration.
Do this by specifying a specific cluster called cluster_by. You can do this either within the model itself or its YAML documentation. You can either apply this to all models in a certain directory, or within the dbt_project.yml file.
What exactly does cluster_by do? It determines how your end table gets sorted. And, most importantly, it lets dbt take some of the load off Snowflake’s automatic clustering, helping optimize for both performance budgets.
Best Practice #4: Default to Views for Cost-Effective Materialization
Storage and computing costs can cause your monthly bill to spiral out of control if you’re not careful. Address those rising prices with a strong materialization strategy.
One of the best ways to do this is by always defaulting to views.
What does this mean?
Well, you’re already paying for storage of your raw data tables. Views are just transformations that sit on top of your raw data. Technically, they don’t store any data themselves, only the queries that you run to access your stored data. That’s why, in dbt, you should create all of your staging models as views.
The only thing to watch out for here is performance issues. Views don’t incur storage costs and should always be up to date with the latest data, but if a model has a complex transformation or even a lot of data, it’s best to materialize it as a table.
Best Practice #5: Use Incremental Models to Save on Snowflake Compute
A big Snowflake drawback? How expensive it can get without you even realizing. The best way to combat is to understand where those costs are coming from. Beyond warehouses, compute is one of the biggest culprits.
Beyond choosing the right default materialization, you can continue to optimize costs by implementing incremental models for your largest, most frequently updated tables.
How exactly does it work? Let’s say you have a huge table filled with complex transformations that need high performance. Running regular transformations on that data will lead to a massive bill… so you should take advantage of incremental models instead.
Incremental models let you run your transformations on new or updated data only, appending results only to already-transformed data. This keeps you from running complex and costly transformations on the same data again and again. In other words: you only need to pay for things once.
It’s important to note here that incremental models should only be used on a case-by-case basis. Try specializing the materialization for these models in your YAML documentation and add a similar config to your dbt_project.yml.
Best Practice #6: Invest in Third-Party Snowflake Optimization Tools
Lots of companies don’t have time to implement these tips, and even if they do, even fewer have the budget and manpower to keep up with the needs of ever-changing datasets. That means even the most perfectly optimized and automated Snowflake tech stack can get out of hand with performance and budget.
Example of Yuki’s Savings Dashboard showing credits saved, usage over time, and warehouse-level optimization results.
The best way to stay on top of things? Investing in a third-party Snowflake optimization tool.
Third-party Snowflake optimization tools like Yuki will automate everything you need. Right-sizing warehouses? Optimizing query performance? Sending automated alerts when your compute costs get out of hand and shutting things down before you start to break the bank? These are all things Yuki can help with – no additional dev lift needed. Previous clients have seen monthly savings up to 30% after they started using it.
Curious to see how much you can save? Reach out now for your free demo.


