Who Does Data Orchestration?
This is part five of a six part series on ‘Simplifying Data Orchestration.’ Expertise is not found by using complexity, but in the ability to take a complex topic and break it down for broader audiences.
Introduction
In the first article in this series, I mentioned the discovery questions. Some common questions while researching a new tool are: who does data orchestration, what is data orchestration, when does data orchestration happen, where does data orchestration happen, why does data orchestration exist, and how do you do data orchestration? These are hard to find concrete answers to when evaluating a new product, tool, or way of doing things. In this article, I'll answer the "who" question.
I went back and forth on the title of this article. The alternate title was “who is data orchestration for?” That would have been a very different article. To summarize my answer to the alternate title question, data orchestration is for everyone. If you ever look at any internal data in your company, then data orchestration happens for you. It makes sure you have the right data, at the right time, in the right place.
Job Titles vs Skills
However, the “who” question I address instead is about finding the orchestrators (pun intended) of data orchestration. One way to answer who is to look at what sort of job titles are responsible for the task. It’s no secret that job titles in data are confusing. There’s a lot of overlap and organizational nuance that factor into different job descriptions and responsibilities. For an in-depth overview that includes a map of different data roles, I highly recommend checking out this article.
Explaining “who" can execute data orchestration using job titles involves a lot of exceptions and dependencies. Instead, let's look at the knowledge and skills needed for orchestrating. A job might not sound related to data in any way based on the title, but it could involve the skills and responsibilities that lend themselves to data orchestration.
Skills and Knowledge Found in Data Orchestration Users
These are some skills that you commonly find in people who do data orchestration.
- Coding Languages: cron, Python, Bash, SQL, node.js, and others
- Data Storage: databases, data warehouses, and other data storage systems
- Data Modeling
- APIs and Web Services
- Containerization Tools & Frameworks: Docker, Kubernetes, and others
- Data Governance: data lineage, metadata management, and data security
- Version Control: Git, SVN, Mercurial, and others
This is not an exhaustive list. You don’t need all of these skills to do data orchestration in every platform, and some platforms require their own special skills that aren’t listed.
If you read through that list and thought, “wow, I’m not nearly technical enough to do data orchestration,” you’re not alone. But how can this be a series about simplifying data orchestration if you need technical experts to execute it in the first place? That’s a great question. You don’t need every single one of these skills to be involved in data orchestration. It’s highly dependent on the specific data orchestration tool you’re using to get the job done. Let’s break down some of those list items so you can see what I mean.
Coding Languages
Most, if not all, of the other data orchestration platforms require coding prowess. Can’t write code? Then you can’t use their tool. That’s what makes Shipyard different. You can use some of our open source low-code and no-code blueprints within our interface to create and orchestrate a data pipeline. I know that some data experts read that sentence and groaned. One advantage of writing code is that you can customize workflows in ways that a low-code platform doesn’t allow. Well, surprise! You can also write everything in code in Shipyard if that’s what you want to do.
So do you need to know how to code in order to do orchestration? No. However, there are many customizations and specific needs that can be addressed if you know how to code. It’s important to look at the organizational need and skills when selecting what tool you want to use. We at Shipyard are continually adding to our 100s of open source low-code blueprints to continue to make orchestration more accessible to users that aren’t code-savvy.
The Necessary and Recommended Skills
In theory, you could use Shipyard’s low code-blueprints and build a pipeline with some credentials and a dream. However, this is the bare minimum level of skill required in Shipyard. All of the other skills on that list? Don’t need ‘em.
Woah, now. Let’s not be hasty. Just because you can doesn’t mean that you should! Some of the items on the list that are more knowledge than skill-based are important if you want to avoid making future problems. Out of the aforementioned skills, the two I’d recommend anyone have a base understanding of are their data platforms, and data modeling.
Knowledge of Data Platforms helps you:
- find the data you want
- see the granularity or aggregation level of data
- identify what metrics you need to create or calculate later on
Knowledge of Data Modeling helps you:
- ensure data integrity using clear rules and constraints for how data should be related
- avoid errors like overcounting or misattributing rows of data
- increase performance of your data platform, BI tool, and many other areas of the pipeline
Realistically, you wouldn’t want someone with no prior technical knowledge or experience to be the only one in charge of your data orchestration pipelines. However, bridging the gaps between the levels of technical skill from principal data engineer to business users allows for increased business value. And it’s possible using Shipyard.
Conclusion
So, who does data orchestration? People who understand data sources and/or platforms and knowledge of data modeling. It’s also people who have every technical skill I listed above and then some. Just because you can’t do it all in data orchestration, doesn’t mean you should do none. Transparency and collaboration are valuable assets to any data team that wants to prove their ROI to the organization.
If you've been following along with this series, we've tackled the who, what, when, why and how of data orchestration. That leaves only one last question - where? Stay tuned for that article next week! In the interim, check out our Substack of articles that our internal team curates weekly from all across the data space.