When you first start building data pipelines, you want a solution that gets you up and running as quickly as possible. The goal is speed and simplicity in the name of driving business value. You need fast access to data and the ability to rapidly experiment.
As your data pipelines inevitably grow and your requirements become more complex, you need a system that affords you the ability to make large scale changes with minimal impact to stability. You need to make bulk edits, stage pipeline edits, and track minor changes over time.
At Shipyard, we recognize that every organization is at a different stage in their data operations journey and every team member has a unique working style. We believe that the data orchestration platform you choose shouldn't force you to sacrifice one option for another. As the only data orchestration platform that's truly language agnostic, we decided to make the format you create workflows in agnostic as well.
Building Workflows with Code
Today, we're revealing our new YAML editor that allows you to build and edit your workflows with code. Now, with the flip of a switch, you can seamlessly navigate between our visual editor and our YAML editor. You're always in control of how you choose to build your workflows, selecting the approach that best fits the problem you need to solve.
Why YAML?
YAML is quickly becoming the configuration syntax of choice for data teams, spurred largely in part by dbt Core's reliance on it for structuring automated data models. We chose YAML because it's easy to read, less prone to syntax entry errors, and allows the platform to continue to deliver on the promise of being language agnostic.
How Shipyard Defines Workflows as Code
When looking at the YAML file, Fleets (our version of workflows) can be broken down into a few components:
- Vessels - What are the individual tasks that need to be run?
- Inputs/Code - What defines how the Vessel should run? Are you running custom code in Python, Bash, or Node? Or are you providing inputs to a low-code Blueprint?
- Environment Variables/Arguments - What external information needs to be securely passed to the Vessel at runtime?
- Language Packages/System Packages - What needs to be installed in a virtual environment to ensure the Vessel will run successfully?
- Guardrails - How should the Vessel handle potential errors?
- Connections - What order should the Vessels run in and what status should be looked for?
- Triggers - When should the Fleet be scheduled to run?
- Notifications - Who should receive alerts if something goes wrong (or right) with the Fleet?
Check out our documentation for an in-depth overview of how you can start working with our YAML editor.
Build with Ease
While you can always choose to write your YAML definition from the ground up, we want to make it as easy as possible to get it right. That's why our Fleet Builder gives you the option to select pre-defined code snippets so you can quickly add in a new Vessel without ever needing to look up a single page of documentation. Selecting this option provides you with pre-filled fields and comments to make the setup process a snap.
Built-in Security
Keeping true to our promise that your environment variables and passwords are securely stored and never revealed outside of the first entry-point, we show these fields as SHIPYARD_HIDDEN
within the YAML. This gives you the confidence that anyone interacting with a Fleet using YAML will never expose credentials to outside parties, whether sharing their screen or sharing the YAML definition internally.
Workflow as Code Benefits
The addition of our YAML editor opens up a new world of possibilities for teams that want to build and share powerful data pipelines with their organization. Here are few use cases our customers have shared with us that we're excited to now support.
Copy and Paste Fleets with Ease
Have a Fleet that you want to move to a different project? Want to share a Fleet definition securely with someone else in the data community? Want to run the same workflow with only a few changed steps? The new YAML editor makes it easy to copy and paste your workflow elsewhere, make a few tweaks, and publish the results.
Edit Fleets in Bulk
By building out our YAML editor with the same interface as VSCode, most data team members should feel right at home with built-in syntax highlighting, collapsable elements, code minimaps and more.
The most notable feature is the built-in find and replace module. Need to update your credentials across multiple vessels? Or update Python package versions across the board? Our new YAML editor lets you find and replace with regex across the entire workflow.
Two-Way Syncing
When building workflows with code, it can sometimes be difficult to visualize what the end result will look like. Shipyard makes it easy to make quick edits to your workflow with YAML, then flip the editor mode to see the visual representation of what you just built. If the visual, drag-and-drop editor is easier for a team member, they can continue building workflows that way and be confident that the underlying code will get generated for other team members to access.
Increased Transparency for Workflows
The addition of workflow as code within Shipyard makes the execution of your workflows even more transparent. You can now easily store workflow definitions in an external platform for safekeeping, see in writing how Vessels are connected together, and evaluate the inputs and code of every Vessel in a Fleet at once.
A Glimpse at Shipyard's Future
Opening up Shipyard to workflow as code is only the beginning of our journey to make Shipyard more developer friendly and powerful for advanced use cases. Here are some of the many features we have planned over the next year that are now possible with the release of this update.
In-App Version Control
Providing workflows as code allows us to take the next step of showing you the visual diff between every update made to your Fleet. In the future, you'll be able to compare the exact updates made in the platform and restore your Fleets to a historical state.
Pre-Built Solution Templates
As the solutions you can build with Shipyard continue to expand, we want to make it easier than ever for new data teams to get started building Fleets with the platform. In the future, we plan to provide "pre-built solution templates" either by allowing you to copy/paste YAML into the editor, or by selecting from a list of solutions directly from the application.
Webhook Variable Overrides
With enhanced workflow definitions and version control, Shipyard will eventually have the ability to trigger a Fleet via a webhook while overriding specific environment variables. This will allow users to run the same process in slightly different ways based on different clients, users running the process, or times of access.
Git Controlled Workflows
To better integrate with data team's engineering flow, we're planning on adding the ability to sync YAML-defined workflows directly with GitHub. In the same way that we currently allow users to sync their code with a specific GitHub repository and branch/tag, we would enable teams to control their Fleets directly from GitHub, with the live definition being pulled in every time the Fleet is triggered.
Import from any DAG
If you're using a different tool for orchestration that also supports workflow as code (or code as workflow), we plan on making it as easy as possible to convert your actively running DAGs in other popular orchestration tools directly to something recognizable by Shipyard, increasing the interoperability of the platform.
The Workflow as Code YAML Editor is now available to all subscribers and can be tested with any account. Sign up for our free Developer Plan to start creating and editing data pipelines with complex logic.
We're looking forward to seeing how users will take advantage of this functionality to quickly launch, monitor, and share data pipelines!
About Shipyard:
Shipyard is a modern data orchestration platform for data engineers to easily connect tools, automate workflows, and build a solid data infrastructure from day one.
Shipyard offers low-code templates that are configured using a visual interface, replacing the need to write code to build data workflows while enabling data engineers to get their work into production faster. If a solution can’t be built with existing templates, engineers can always automate scripts in the language of their choice to bring any internal or external process into their workflows.
The Shipyard team has built data products for some of the largest brands in business and deeply understands the problems that come with scale. Observability and alerting are built into the Shipyard platform, ensuring that breakages are identified before being discovered downstream by business teams.
With a high level of concurrency and end-to-end encryption, Shipyard enables data teams to accomplish more without relying on other teams or worrying about infrastructure challenges, while also ensuring that business teams trust the data made available to them.
For more information, visit www.shipyardapp.com or get started with our free Developer Plan.