Update - July 2022 - We've updated this guide to include a video tutorial and written instructions for all of the major databases. For the latest version of the guide, please visit one of the following links:
In this tutorial, we'll walk you through the steps it takes to deploy and automate dbt models in the cloud using the models that you create in the dbt's own jaffle-shop tutorial. However, you can use this as a general guide to deploy ANY dbt models that you may have created for your organization.
Complete the dbt Tutorial
Work your way through the above dbt tutorial, following all of the steps related to the dbt CLI until you reach the step for "Deploying your Project".
Alternatively, you can skip this step by forking our dbt-tutorial repository and using the code found on the finished-dbt-tutorial
branch. However, you'll still need to provide your own Bigquery credentials.
Make Adjustments for Running dbt in Production
- Update your
profiles.yml
file to use environment variables in place of sensitive data or references to local files and directories.
For the dbt tutorial, yourprofiles.yml
will only need to exchange thekeyfile
location with"{{ env_var('BIGQUERY_KEYFILE') }}"
. The final result will look something like this:
jaffle_shop: # this needs to match the profile: in your dbt_project.yml file
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: dbt-demos # Replace this with your project id
dataset: dbt_shipyard # Replace this with dbt_your_name, e.g. dbt_bob
threads: 4
timeout_seconds: 300
location: US
priority: interactive
keyfile: "{{ env_var('BIGQUERY_KEYFILE') }}"
2. Move your profiles.yml
file to the root directory of your dbt project.
3. Remove the target
and logs
folders, alongside their contents.
4. Add the following code snippet, named execute_dbt.py
to the root directory of your dbt project.
This script accomplishes 3 main things.
- Switches the current working directory to the location where the
execute_dbt.py
script lives.
This ensures that when you rundbt <command>
, as long as this file lives in the root directory of your dbt project, it will always be able to execute properly. - Creates a Bigquery credential file, named
bigquery_creds.json
using any JSON passed through an environment variable namedBIGQUERY_CREDS
.
This is only necessary for Bigquery connections (which the dbt tutorial uses). In Shipyard, you can't upload a credential file, so instead we have to build it with the provided JSON string. You cannot useservice-account-json
to connect to Bigquery, due to limitations in passing multi-line private keys as environment variables. - Runs a dbt CLI command using an environment variable of
DBT_COMMAND
. By default, running this script without providing the environment variable will executedbt run
.
With these 4 updates in place, we're ready to deploy dbt on Shipyard.
Sign up for Shipyard
To get started, sign up with your work email and your company name.
Once you've created an account and logged in you can start deploying and automating your dbt models at scale.
Connect your Github Account to Shipyard
In order to use the dbt Blueprint to its fullest potential, you'll need to set up an integration to sync your Github account to Shipyard. You can follow the first half of this guide.
While you can upload your dbt project directly to Shipyard, we recommend connecting to Github so you can always stay in sync with your dbt code updates.
Copy the dbt Blueprint from our Blueprint Library
- On the sidebar, click the "Blueprint Library" button.
2. Search for "dbt" and click "Add to Org" on the dbt- Execute CLI Command Blueprint.
3. On the next pop-up, you have the chance to re-name the Blueprint before it gets added to your organization. You can leave this as the default for this tutorial.
4. On the sidebar, click the Blueprints button and click on the name of the recently created "dbt - Execute CLI Command" Blueprint.
5. At this point, you should have landed on the "Inputs" tab of the dbt Blueprint. Switch to the "Code" tab.
Edit the Code of the dbt Blueprint
- Select the option for "Git" on the Code tab.
2. Select the repo where your dbt project lives, select the branch or tag you want to sync with, and leave the Git Clone Location as "default".
3. Edit the "File to Run" field to contain <your-repo-name>/execute_dbt.py
4. Click Save and switch to the requirements tab.
Edit the Requirements of the dbt Blueprint
On this tab, you'll want to add and edit any environment variables that may be used in your dbt project. For the dbt tutorial project, you'll need to make the following adjustments:
- Copy/Paste the contents of your bigquery credential file into the environment variable named
BIGQUERY_CREDS
. - Don't touch the environment variable of
BIGQUERY_KEYFILE
orDBT_PROFILES_DIR
. They are set to./bigquery_creds.json
and.
respectively.
3. Update the version of dbt to the one you would prefer to use. Alternatively, you can remove the version of ==0.18.1
altogether and the latest version of dbt will always be installed.
Note: If you don't want to manage packages directly in Shipyard, you can remove them and instead include a `requirements.txt` file in your dbt repository that contains dbt.
Read our documentation if you're interested in learning more about how we treat Environment Variables and Packages.
4. Click "Save".
Create a Vessel to Execute dbt in the Cloud
- In the top-right corner, click "Use this Blueprint".
- Fill out the dbt Command you want to run. If left blank,
dbt run
will be used by default.
NOTE: We support running multiple commands successively. e.g. dbt compile && dbt run
. However, we recommend splitting out commands into separate Vessels that are part of a larger Fleet to allow for better visibility into each function.
3. Click "Next Step" at the bottom.
4. Add any schedule triggers that you may need to run dbt on. We recommend choosing Daily for starters.
5. Click "Next Step" at the bottom.
6. Give your Vessel a name and select the Project where you want to create it.
7. Add any emails that you want to receive notifications if the Vessel errors.
8. Update your guardrails to let the Vessel automatically retry if it runs into errors. We recommend at least 1 retry and a 5 minute retry delay.
9. Click "Save & Finish" at the bottom. You've successfully made a Vessel that automates your dbt project in the cloud! Click "Run your Vessel" to test it out with an On Demand voyage.
10. If you followed this guide and the dbt tutorial to a T, you should see the following information in the output.
You can see that Shipyard prints out the git commit hash being used when running your Vessel and continues by printing out all of the same error logging that you would get while running dbt locally. Logs can be accessed and troubleshot at any time.
Running into Issues?
Check out the main branch of our dbt-tutorial repository for an example of what your final repository should look like. If you still can't figure it out, click the chat bubble in the bottom right or email us at support@shipyardapp.com
Next Steps
Set up your organization's dbt project
Now that you've successfully deployed the dbt tutorial of jaffle-shop to the cloud, you can set up your organization's dbt project! By selecting your dbt repository and making a few changes to your setup, you can ensure that your team is always running the latest dbt models.
NOTE: Each dbt Blueprint can only be connected to a single Github repository. If you need to manage multiple Blueprints for multiple Github repos, you can either duplicate the Blueprint made in this tutorial or you can go to the Blueprint Library and add the dbt Blueprint to your organization with a different name.
Create More Vessels with your dbt Blueprint
Since the dbt Blueprint allows you to execute any dbt command against your dbt repository, you can create multiple Vessels using the same Blueprint. Set up multiple Vessels to:
- Run compile, test, run, and execute a portion of your models.
- Create different Vessels for running QA, Staging, and Production.
Use a requirements.txt file for package installation
By default, the dbt Blueprint in Shipyard includes the installation of the dbt package. If you would prefer not to manage package installation through Shipyard, you can include a requirements.txt
file in your dbt project's root directory. If you do this, make sure to remove any overlapping packages from the Shipyard UI.
Connect with other databases
This tutorial walked through a very specific path to connect to Bigquery, but connecting to other databases is even easier!
First, update your profiles.yml
to include a profile from any other supported database type.
Next, update specific credential fields to use the jinja template for environment variables of "{{ env_var('DATABASE_CREDENTIAL') }}"
Then, add those specific environment variables and their associated values directly to Shipyard. These will be stored securely and passed to your script every time the Vessel runs.
Store Your dbt Logs and dbt Targets
By default, Shipyard wipes all of your data from the platform as soon as a voyage completes. That means that all of the targets, logging, and documentation created by dbt are immediately deleted. However, it doesn't have to be that way!
If you would like to store the targets or logs generated, you'll have to make a Fleet with Vessels that run after your dbt Vessel The good news is that this process is extremely easy in Shipyard!
We recommend searching the Blueprint Library for any Blueprints with the name "Upload File". These Blueprints will be able to swiftly upload files in <dbt-repo-name>/logs
or <dbt-repo-name>/target
to your cloud storage location of choice.
Create Fleets to Connect dbt to your entire Data Stack
Fleets are a powerful way to share data between Vessels and create complex workflows for any use case. As long as your scripts are written in Python or Bash, they'll be able to run on Shipyard.
With dbt Blueprints under your belt, you can set up Fleets to:
- Kick off data loading jobs with Fivetran and immediately start running your dbt projects upon their completion.
- Tie your action-based scripts (reporting, ML models, API updates, etc.) to the status of individual dbt models being run.
- Sends custom emails or slack messages to vendors when data issues are found with dbt.
We hope this guide has helped you quickly deploy your dbt projects in the cloud with Shipyard. If you have any questions about automating dbt in the cloud, reach out to us at support@shipyardapp.com
About Shipyard:
Shipyard is a modern data orchestration platform for data engineers to easily connect tools, automate workflows, and build a solid data infrastructure from day one.
Shipyard offers low-code templates that are configured using a visual interface, replacing the need to write code to build data workflows while enabling data engineers to get their work into production faster. If a solution can’t be built with existing templates, engineers can always automate scripts in the language of their choice to bring any internal or external process into their workflows.
The Shipyard team has built data products for some of the largest brands in business and deeply understands the problems that come with scale. Observability and alerting are built into the Shipyard platform, ensuring that breakages are identified before being discovered downstream by business teams.
With a high level of concurrency and end-to-end encryption, Shipyard enables data teams to accomplish more without relying on other teams or worrying about infrastructure challenges, while also ensuring that business teams trust the data made available to them.
For more information, visit www.shipyardapp.com or get started for free.