Data Build Tool, better known as dbt, is slowly sweeping the data world off its feet. As many organizations shift their data operations from ETL to ELT, dbt makes it easier for analysts that work with SQL every day to define and document data models for the organization. It's becoming a staple technology that makes it easy to transform massive amounts of raw data into usable tables and views for the whole organization.
While dbt is easy to start using, we've found that for many teams, getting dbt set up and running effectively can be a cumbersome process. Running models tied to a team member's local laptop isn't sustainable. Configuration on custom servers often requires DevOps knowledge to ensure a sustainable, error-free setup. Executing with existing workflow tools requires you to learn new proprietary setups and workarounds to get dbt working.
These complexities make it clear why dbt themselves offer their dbt Cloud service just for running dbt. However, most data teams need dbt to be interconnected with the rest of their data operations processes. Each set of dbt models rely on specific data sources being loaded effectively and each table or view powers reporting, ML models, and last-mile actions. We wanted to create an easier way to launch dbt Core in the cloud and connect it to your entire data stack.
Already using dbt Cloud? We have Blueprints to integrate with that setup too!
Introducing the dbt Core Blueprint Guide
Today, we're excited to tackle this problem head-on with the launch of our new guide for creating a dbt Core Blueprint. This Blueprint will streamline the steps required to get dbt up and running in the cloud, allowing data teams to deploy their latest dbt models to production rapidly with an incredible amount of control and visibility over their setup. Team members can execute dbt code by simply providing the command to run. There's no need to fiddle with infrastructure or touch the underlying code.
Deploy to Production 10x Faster
To create a dbt Core Blueprint, you'll need to sync your dbt repository to Shipyard, using our Github Code Sync integration, to start automating all of your data models in the cloud in minutes. Once synced, the dbt Core Blueprint will allow your team to run any dbt CLI command against up-to-date code living on Github. From there, it's just a matter of scheduling your dbt run
and dbt test
commands to run independently or as a part of your larger data workflows.
For in-depth instructions, you can follow this guide on deploying dbt with Shipyard here.
Optimize your CI/CD Flow
Due to our unique integration with Github, the dbt Core Blueprint enables your team to continue building and updating data models with their existing git flow, all while letting Shipyard handle the constant execution of their work in the background.
This model means that your version control continues to live within Github while Shipyard helps you better keep track of how your dbt code is being used and how it's connected to the larger picture of your data operations. If you want to know what version of the code ran at any moment in time, Shipyard will show the commit hash alongside runtime metadata and dbt's logged output.
Connect All Of Your Data Tools
With the creation of a dbt Core Blueprint, you can quickly connect the execution of dbt to any other script that you write in Bash, Node.js, or Python. Additionally, you can connect it to other common processes that need to be run against external data tools (Snowflake, Redshift, Bigquery, etc.) using our Blueprint Library.
Shipyard is designed to automate and connect ANY code and packages that you might be using - not just dbt. With the Shipyard platform, your has a greater flexibility to create a pipeline where each step shares data and talks to each other, rather than relying on flimsy timing-based pipelines between siloed systems.
Here's a few examples of how you can connect dbt to other services to create a seamless solution for your Data Team.
- Run a Fleet that kicks off data loading jobs with Fivetran and immediately starts running your dbt projects upon their completion.
- Tie your action-based scripts (reporting, ML models, API updates, etc.) to the status of individual dbt models. Develop a Fleet with conditional paths where downstream Vessels won't execute unless the upstream dbt models are run successfully.
- Run a Fleet that sends custom emails or slack messages to vendors when dbt returns data issues.
- Run a Fleet that executes all of your dbt models and stores the logs externally.
Scale your dbt Usage
Your team can use the same dbt Blueprint repeatedly, tracking each distinct usage across the organization. If you want to run different commands, create a new Vessel with the same Blueprint. If you want to use a slightly different version of the code, or different environment variables, duplicate the Blueprint and make adjustments.
This setup makes it easy to scale how effectively your team uses dbt.
- Split out your dbt commands to run subsets of models in your projects simultaneously, each with its own multi-threading. With Shipyard's dynamically scaling infrastructure, there's no longer a need to run your entire project as a single operation.
- Build out projects and workflows that are specific to subsets of your dbt model, empowering your team to better model end-to-end data touchpoints while eliminating errors downstream.
- Run different dbt versions (QA, development, production, etc.) on the same infrastructure without any setup changes. Quickly test and compare how code updates affect the overall output.
- Update all of your dbt Vessels to the latest tagged release with one click. Made a mistake? Rollback your changes in minutes.
Get Started Today
The dbt Core Blueprint is now available to all subscribers and can be tested with any account. Shipyard is making it easier than ever to automate your dbt repository in the cloud. Sign up for our free Developer Plan to get started automating your dbt projects and follow our guide for deploying dbt in the Cloud.
We're looking forward to seeing how users will take advantage of this new blueprint to implement dbt in production quickly and deploy data solutions across their modern data stack.
About Shipyard:
Shipyard is a modern data orchestration platform for data engineers to easily connect tools, automate workflows, and build a solid data infrastructure from day one.
Shipyard offers low-code templates that are configured using a visual interface, replacing the need to write code to build data workflows while enabling data engineers to get their work into production faster. If a solution can’t be built with existing templates, engineers can always automate scripts in the language of their choice to bring any internal or external process into their workflows.
The Shipyard team has built data products for some of the largest brands in business and deeply understands the problems that come with scale. Observability and alerting are built into the Shipyard platform, ensuring that breakages are identified before being discovered downstream by business teams.
With a high level of concurrency and end-to-end encryption, Shipyard enables data teams to accomplish more without relying on other teams or worrying about infrastructure challenges, while also ensuring that business teams trust the data made available to them.
For more information, visit www.shipyardapp.com or get started for free.