dbt Coalesce 2021 - Day 2 Takeaways

Did you miss out on the second day of the dbt Coalesce conference? Want a quick recap? We've got you covered!  There's still 3 more days of content (only 1 left for US timezones) that you can register to attend for free.

Here are our recaps for Day 1 and Day 3.

After attending most of the sessions yesterday, here's what stood out to us the most.

To better explain timelines, break things into smaller Data Products

As data people, we get excited by the prospect of reaching a state of nirvana with all of our tools. We can build things cleaner and faster than ever before with the modern data stack! That results in us focusing a lot on how we accomplish the end goal of setting up clean data that drives action in the business. But when talking to stakeholders, they don't understand the tools or how they piece together. They only care about the end result.

Stephen Bailey, Director of Data & Analytics at Immuta, shared a great perspective on getting stakeholders to better understand the data team's efforts by breaking down what your team is working on into various smaller products. Instead of talking about the dbt models you built, talk about the "Data as a Product" you were able to create. Instead of talking about the Slack API endpoints you automated, talk about the holistic "Data Alerts" you were able to create. You can easily educate the organization what each of these data products are, why they matter, and where they live.

Examples of Data Products

By training users internally on all of these different data products, you can also help them better understand how they all piece together. That way, when they inevitably ask you to build another dashboard and you scope it as 3 weeks of work, you're able to help them better understand that a request for a dashboard is more than just that. It's a request for a Data Replication product, Data as a Product, an Interactive App, and a Key Visualization.

Breaking down a request for a dashboard into multiple data products

We thought this was a super unique way to deconstruct the internal workings of an end product in a way that anyone internally can understand.

Words Matter. Titles Matter. Stop Using Data Scientist.

Emilie Schario, Data Strategist in Residence at Amplify Partners, gave an impassioned talk about how Data Scientist is a title that just doesn't make sense in the data ecosystem. Science is all about experimentation... but experimentation is only part of the job and hardly ever found in the job description itself.

Data Scientist is a bad job title and Data Science is a bad descriptor because it's highly unspecific and not reflective of the majority of the work being done.

By now, I think most people can agree that "Data Scientist" is a bad title, but it's what everyone has rallied behind as the mysterious, powerful data role. At some organizations, this could be an individual building Machine Learning algorithms from scratch. At other organizations, it could be an individual doing analysis in Excel. The problem is that the expectations are inconsistent, so you end up with unhappy, underutilized employees and disappointed companies. If you're in a data role right now, it's your responsibility to call out our own organizations for creating bad job descriptions and titles - including your own!

From Emilie's perspective, there should only be four titles in data with varying levels of seniority. Data Engineer, Analytics Engineer, Data Analyst, and Machine Learning Engineer. We should work towards consolidation of titles into these different segments based on the skills being employed. Additionally, we should be building out better career ladders and mastery tracks to help people grow in these very specific areas for years to come.

The core four data roles

Analyzing dbt Metadata Yields Strong Improvements

Kevin Chan and Jonathan Talmi from Snapcommerce had a great rapid-fire presentation all about increasing observability of your dbt models. As your efforts with dbt scale, you want to keep track of how effectively everything is running so you can answer questions like:

  • How can you speed up your models?
  • Which models are becoming less performant down over time?
  • Which models need to be split out because they're causing bottlenecks?
  • How much is each model costing?

The basic process involves storing dbt artifacts after each run (regardless of dbt was successful or not). You then dump the data to the warehouse of your choice, clean and join it to your warehouse performance data, and finally visualize the underlying data in a BI tool. This gives you easily digestible ways to explore the information and build alerts so relevant teams know when there are issues with the models they rely on. The Snapcommerce team had beautiful dashboards for sifting through this log-level data with ease.

Runtime Metrics per Model
An example of dbt "modelnecks"

Some of our customers are already using Shipyard to facilitate this process for all of their dbt runs and tests - so it was great to see this use case pop up again in the larger dbt community!

Prototyping is more than just an MVP

When you think of prototyping, you typically think of building out an MVP (minimum viable product) that helps you test if an idea will actually be viable at solving a problem. In the software space, we're familiar with this being used as a way to build quick, scrappy solutions. Alex Viana, VP of Data at HealthJoy, shared a different perspective. To him, a prototype is simply a way to figure out the ways that someone intends to interact with something.

Let's say someone requests a specific data point that doesn't exist right now. When you first get this type of request, it's easy to just jump into the end goal, thinking you'll need to automatically pull some external data, clean it, test it, store it somewhere save, then connect it to a BI platform. However, a prototype doesn't need to look anything like the anticipated end product. A prototype can be as simple as stating a fake statistic out loud and asking "If I came back to you with this information, what would you do with it"? This line of questioning can open up how people intend to use the data product you're about to build, without you having to put in the weeks of effort to build it.

You can go one step further and create a more advanced prototype by placing some fake data in a dashboard and handing it to someone to see how they react and use it. Executives don't really care if data is faked - they just want to be sure that the end product works the way they intended to use it.

The goal with a prototype is to avoid failures that seem obvious in hindsight. Things like the data being too messy, requirements being poorly scoped, or the end result not being able to truly answer what someone wants to. By changing what you consider to be a prototype, you can save everyone's time and build better data products for the organization.


That's a wrap! We hope this gives you a taste of what you missed out on from day 2 of the conference. Stay tuned for our upcoming recap of the 3rd day of the Coalesce 2021 conference.

And be sure to check out some of freebies you can get from attending this free conference!