Getting Started with dbt Core and Snowflake

In this guide, we will walk through how to setup dbt Core in the cloud with Snowflake. After you finish this guide, you will have the sample data provided uploaded to Snowflake and run your first dbt command in the cloud.

Although the steps in this guide will specifically use Snowflake, the steps can be modified slightly to work with any database that dbt supports. We also have guides made specifically for BigQuery, Databricks, and Redshift.

If you would rather watch a video version of this guide, feel free to head over to YouTube. Let's jump right in!

dbt Core Part 1 - Loading Sample Data into your Cloud Provider

Before getting into the steps of setting up the different cloud data warehouses, please download the sample files that we will use for this tutorial here.

  1. From the Snowflake homepage, click Worksheets on the top of the webpage.
  2. Input the follow query::
USE ROLE accountadmin;
-- dbt roles
CREATE OR REPLACE ROLE dbt_dev_role;
CREATE OR REPLACE USER dbt_user PASSWORD = "sailboats";
GRANT ROLE dbt_dev_role,dbt_prod_role TO USER dbt_user;
GRANT ROLE dbt_dev_role,dbt_prod_role TO ROLE accountadmin;
CREATE OR REPLACE WAREHOUSE dbt_dev_wh  WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 1 INITIALLY_SUSPENDED = TRUE;
GRANT ALL ON WAREHOUSE dbt_dev_wh  TO ROLE dbt_dev_role;
CREATE OR REPLACE DATABASE dbt_hol_dev;
GRANT ALL ON DATABASE dbt_hol_dev  TO ROLE dbt_dev_role;
GRANT ALL ON ALL SCHEMAS IN DATABASE dbt_hol_dev   TO ROLE dbt_dev_role

This query creates an example user, warehouse, and database to use throughout the tutorial.
3. Run the queries.
4. Click the Databases button on the top left of your screen. You should see the DBT_HOL_DEV database that we created:

  1. Enter the following query into the worksheet to create our stg_football_rankings table:
CREATE TABLE DBT_HOL_DEV.PUBLIC.STG_FOOTBALL_RANKINGS(
  rank integer,
  prev_rank integer,
  name string,
  league string,
  off float,
  def float,
  spi float
)
  1. Click Run.
  2. Enter the following query into a worksheet to create our stg_football_matches table:
CREATE TABLE DBT_HOL_DEV.PUBLIC.STG_FOOTBALL_MATCHES(
  season integer,
  date date,
  league_id integer,
  league string,
  team1 string,
  team2 string,
  spi1 float,
  spi2 float,
  prob1 float,
  prob2 float,
  probtie float,
  proj_score1 float,
  proj_score2 float,
  importance1 float,
  importance2 float,
  score1 integer,
  score2 integer,
  xg1 float,
  xg2 float,
  nsxg1 float,
  nsxg2 float,
  adj_score1 float,
  adj_score2 float
 )

Load Data into Tables

  1. Click Databases on the top of your Snowflake page.
  2. Click DBT_HOL_DEV
  3. Click STG_FOOTBALL_RANKINGS.
  4. Click Load Data which will bring up the load data menu.
  5. Choose DBT_DEV_WH as the warehouse to load data. Click next.
  6. Click Select Files and select spi_global_rankings.csv. Click next.
  7. Click the plus sign next to the drop down to create our file format.
  8. Under name, enter dbt_tutorial_csv.
  9. Change header lines to skip to 1 from 0.
  10. Keep the other settings at their default.
  11. Click Finish.
  12. Click Load.
  13. After the data has loaded into Snowflake, you will receive a success message that looks like this:
  1. Click Databases on the top of your Snowflake page.
  2. Click DBT_HOL_DEV.
  3. Click STG_FOOTBALL_MATCHES.
  4. Click Load Data which will bring up the load data menu.
  5. Choose DBT_DEV_WH as the warehouse to load data. Click next.
  6. Click Select Files and select spi_matches_latest.csv. Click next.
  7. Use the dropdown menu to select the dbt_tutorial_csv as the file format.
  8. Click Finish.
  9. Click Load.

You should be able to see both tables listed under your DBT_HOL_DEV database now.

  1. Click on STG_FOOTBALL_RANKINGS and click Grant Privileges on the right of the screen. Grant all actions to the DBT_DEV_ROLE.
  2. Repeat this for STG_FOOTBALL_MATCHES.

You should now be able to query "DBT_HOL_DEV"."PUBLIC"."STG_FOOTBALL_RANKINGS" and "DBT_HOL_DEV"."PUBLIC"."STG_FOOTBALL_MATCHES". Feel free to run this query to verify that this process worked successfully:

select * from "DBT_HOL_DEV"."PUBLIC"."STG_FOOTBALL_MATCHES"

dbt Core Part 2 - Setting Up dbt on Github

Fork dbt Setup from GitHub

  1. Fork this repository. The repository contains the beginning state of a dbt project.
  2. Clone the repository locally on your computer.
  3. Open dbt_project.yml in your text editor.

dbt Project File Setup

  1. Change the project name to soccer_538.
  2. Change the profile to soccer_538.
  3. Change model name to soccer_538.
  4. Under the soccer_538 model, add a staging and marts folder that are both materialized as views.
  5. Save your changes.

Profile Setup

  1. Open profiles.yml.
  2. Update the file to this:
soccer_538:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: "{{ env_var('snowflake_trial_account') }}"
      user: dbt_user
      password: "{{ env_var('dbt_user_password') }}"
      role: dbt_dev_role
      database: dbt_hol_dev
      warehouse: dbt_dev_wh
      schema: soccer_538
      threads: 200
  1. Create a new file in your root directory of your dbt project called execute_dbt.py.
  2. Paste this code block for the content of execute_dbt.py:
import subprocess
import os
import json

dbt_command = os.environ.get('dbt_command', 'dbt run')

subprocess.run(['sh', '-c', dbt_command], check=True)
  1. Commit and push your changes to Github.

Now that we have our sample data and dbt processes setup, we need to write our example models for the dbt job to run.

dbt Models

  1. Navigate into the models folder in your text editor. There should be a subfolder under models called example. Delete that subfolder and create a new folder called 538_football.
  2. Create two subfolders inside 538_football called staging and marts.
  1. Inside the staging folder, create a file called stg_football_matches.sql.
  2. Paste the following code into that file:
    select * from "DBT_HOL_DEV"."PUBLIC"."STG_FOOTBALL_MATCHES"
  3. Inside the staging folder, create a file called stg_football_rankings.sql
  4. Paste the following code into that file:
    select * from "DBT_HOL_DEV"."PUBLIC"."stg_football_rankings"
  5. In the staging folder, add a file called schema.yml.
  6. In this file, paste the following information:
version: 2

models:
  - name: stg_football_matches
    description: Table from 538 that displays football matches and predictions about each match.

  - name: stg_football_rankings
    description: Table from 538 that displays a teams ranking worldwide
  1. In the marts folder, create a file called mart_football_information.sql.
  2. Paste the following code into that file:
with
  qryMatches as (
    SELECT * FROM {{ ref('stg_football_matches') }} where league = 'Barclays Premier League'
    ),
  qryRankings as (
    SELECT * FROM {{ ref('stg_football_rankings') }} where league = 'Barclays Premier League'
  ),

  qryFinal as (
    select
      qryMatches.season,
      qryMatches.date,
      qryMatches.league,
      qryMatches.team1,
      qryMatches.team2,
      team_one.rank as team1_rank,
      team_two.rank as team2_rank
    from
      qryMatches join
      qryRankings as team_one on
        (qryMatches.team1 = team_one.name) join
      qryRankings as team_two on
        (qryMatches.team2 = team_two.name)
  )

select * from qryFinal
  1. In the marts folder, add a file called schema.yml
  2. In this file, paste the following:
version: 2

models:
  - name: mart_football_information
    description: Table that displays football matches along with each team's world ranking.
  1. Save the changes.
  2. Push a commit to Github

We are ready to move into Shipyard to run our process. First, you will need to create a developer account.

dbt Core Part 3 - Setting Up dbt on Shipyard

Create Developer Shipyard Account

  1. Navigate to Shipyard's sign-up page here.
  1. Sign up with your email address and organization name.
  2. Connect to your Github account by following this guide. After connecting your Github account, you'll be ready to create your first Blueprint.

Creating dbt Core Blueprint

  1. On the sidebar of Shipyard's website, click Blueprints.
  2. Click Add Blueprint on the top right of your page.
  3. Select Python.
  4. Under Blueprint variables, click Add Variable.
  5. Under display name, enter dbt CLI Command.
  6. Under reference name, enter dbt_command.
  7. Under default value, enter dbt run.
  8. Click the check box for required
  9. Under placeholder, enter Enter the command for dbt.
  10. Click Next
  11. Click Git.
  1. Select the repository where your dbt files sit.
  2. Click the source that you want the files pulled from. Generally main or master.
  3. Under file to run, enter execute_dbt.py.
  4. Under Git Clone Location, select the option for Unpack into Current Working Directory.
  5. Click Next Step on the bottom right of the screen.
  6. Next to Environment Variable, click the plus sign to add an environment variable.

Add Environment Variables

The environment variables that need to be added will vary based on the cloud database that you use.

Variable Name Value
snowflake_trial_account snowflake account name
dbt_user_password password from snowflake
DBT_PROFILES_DIR .

Python Packages

  1. Click the plus sign next to Python Packages.
  2. In the Name field, enter dbt-snowflake. In the version field, enter ==1.0.0.
  3. Click Next.

Blueprint Settings

  1. Under Blueprint Name, enter dbt - Execute CLI Command.
  2. Under synopsis, enter This Blueprint runs a dbt core command.
  3. Click Save.
  4. In the top right of your screen, click Use this Blueprint. This will take you over to the Fleet Builder and prompt you to select a project.

Build dbt Core Fleet

  1. On the Select a Project prompt, click the drop down menu to expand it and select Create a New Project.
  2. Under project name, enter dbt Core Testing.
  3. Under timezone, enter your timezone.
  1. Click Create Project.
  2. Select dbt Core Testing and click Select Project. This will create a new Fleet in the project. The Fleet Builder will now visible with one Vessel located inside of the Fleet.
  3. Click on the Vessel in the Fleet Builder and you will see the settings for the Vessel pop up on the left of your screen.
  1. Under Vessel Name, enter dbt Core CLI Command.
  2. Under dbt CLI Command, enter dbt debug.
  3. Click the gear on the sidebar to open Fleet Settings.
  1. Under Fleet Name, enter dbt Core.
  2. Click Save & Finish on the bottom right of your screen.
  3. This should take you to a page showing that your Fleet was created successfully.
  1. Click Run Your Fleet. This will take you over to the Fleet Log.
  1. You can click on the bar to get the output from your run.

If you scroll to the top of the output, you will see that the environment variables that were put in during the Blueprint creation process are hidden from the user.

If dbt debug succeeds, we are ready to move into part three of the guide. If it fails, please go back to the steps above and make sure everything is setup correctly. Feel free to send an Intercom message to us at anytime using the widget on the bottom right of the Shipyard screen.

In the meantime, please consider subscribing to our weekly newsletter, "All Hands on Data." You'll get insights, POVs, and inside knowledge piped directly into your inbox. See you there!