Version Control for Every Data Workflow
Change is a part of the natural lifecycle of data operations. Your data pipelines should be built to ebb and flow as business needs evolve.
There many reasons why you might need to update your pipelines over time. Your company starts working with a new vendor that needs data to be formatted. You need new tests to be added to the pipeline address potentials errors. You discover a simpler way to connect your data tools together. These are all common situations faced by data professionals today that require data orchestration platforms to be flexible.
But sometimes, change goes wrong. Someone could delete an integral task that cleaned data, causing errors across the entire pipeline. Whatever the case, you don't just want to know that something errored out. You want to know why it errored out so you can quickly resolve the issue.
At Shipyard, we recognize that every organization has different reasons for making changes. The important part is being able to recognize and track those changes to understand how they impact performance over time.
Version Control for Workflows
Today, we're revealing our new Version Control experience for all paid users of the platform. Every time you save a Fleet (workflow), a new version will be created, giving your team an effective audit log of changes over time. That means every change, whether you're updating code, changing a name, adjusting connections, or adding new schedules, will be tracked.
Everything in One Place
By selecting the Version Control tab, you'll be able to see every update made to a specific Fleet, who made the update, and when it was made. Every version of the Fleet is represented in YAML, letting you can quickly see all of the underlying details of your Vessels, Triggers, and Connections at a glance.
Other orchestrators place the burden of version control squarely on your team's shoulders. They expect you to version control your workflows (often on GitHub, GitLab, Bitbucket, etc) before uploading and registering them to their platform. Because those orchestration platforms do not know the content of previous versions, it is on you to context switch and determine the exact updates that were made and fix issues if needed.
While version controlling your code is still a best practice that we encourage, we believe that the process of version controlling your workflows should be intimately tied to the build process. With this setup, you can more clearly see and restore the changes all from the same location. For more information about the specifics of this new feature, read our documentation.
Version Control Benefits
Comparing Versions
Version Control is at its most powerful when you're able to understand the exact changes made that can affect performance. That's why we have built-i functionality to compare any version of a Fleet to the latest version.
Comparing versions opens a pane that shows the diff between the selected versions. Quickly see the what fields were added (in green) or removed (in red) to identify key changes that were made to the Fleet.
Restoring Versions
Have an old version of the Fleet that seemed to be working better? With a single click, you can revert the entire Fleet back to that version.
You can even opt to create an entirely new Fleet from an old version, giving you the ability to test changes to the structure of your Fleet beforehand!
Tying Logs to Versions
In a few weeks, we'll also be updating all of our Logs to indicate the Fleet version number that was used at runtime. This information will help teams better identify if changes in status or duration were caused by updates in the Shipyard platform. If issues are seen more frequently without a change in Shipyard, it may indicate that code updates being cloned from GitHub could be the culprit.
Version Control is now available to all paid subscribers and can be tested with any account. Sign up for our free Developer Plan to start creating and editing data pipelines with complex logic.
We're looking forward to seeing how users will take advantage of this functionality to quickly launch, monitor, and share data pipelines!
About Shipyard:
Shipyard is a modern data orchestration platform for data engineers to easily connect tools, automate workflows, and build a solid data infrastructure from day one.
Shipyard offers low-code templates that are configured using a visual interface, replacing the need to write code to build data workflows while enabling data engineers to get their work into production faster. If a solution can’t be built with existing templates, engineers can always automate scripts in the language of their choice to bring any internal or external process into their workflows.
The Shipyard team has built data products for some of the largest brands in business and deeply understands the problems that come with scale. Observability and alerting are built into the Shipyard platform, ensuring that breakages are identified before being discovered downstream by business teams.
With a high level of concurrency and end-to-end encryption, Shipyard enables data teams to accomplish more without relying on other teams or worrying about infrastructure challenges, while also ensuring that business teams trust the data made available to them.
For more information, visit www.shipyardapp.com or get started with our free Developer Plan.