One of the most common data orchestration tasks involves connecting your tools together. Teams need to trigger existing tools like Fivetran, dbt Cloud, and Tableau one after the other. In this type of data workflow, you can’t trigger the next tool until the previous tool has finished completely.
When you’re dealing with large datasets, these tools can sometimes take multiple hours to finish the job. All the while, your orchestration tool has to just sit and wait.
As a tool that charges for runtime, we felt that it was unfair to charge our customers for time that our servers are spending idling, waiting for someone else’s service to finish running.
The Initial Solution
At Shipyard, we have a dedicated team that builds and maintains our open source low-code Blueprints (templates) so that users of our platform can build Vessels (tasks) with these Blueprints and connect them together to make a Fleet (workflow).
At first, we tried to solve this problem with Blueprints that were solely designed to “Check Status.” These Blueprints quickly checked the service for a job ID and returned with the status. If it was a success, it would stop. If it was anything other than a success, it would fail and trigger a retry which could be delayed for five minutes or up to an hour.
The end result was that a job could take multiple hours to complete but users would only be charged for mere minutes.
However, in the process of trying to combat the problem, we ended up creating more problems.
- Users who wanted to trigger external jobs didn’t realize that the Blueprint they chose didn’t check for job status. This meant that workflows would be created where dbt models were running before Fivetran finished loading data, or Tableau dashboards were refreshing extracts before models had finished running.
- Any workflow that required triggering an external job always required a two-step setup process with adjustments to guardrails to set it up the “right way.” This increased the complexity of building on our platform.
- Since Shipyard only retries on failures, we designed every check status Blueprint to fail by default. This meant that there was no great way to distinguish between a successful check where the external job had errored or an actual error that occurred in the process of checking a job.
- We mapped final job statuses to exit codes in the hopes that we could prevent retries from occurring when we knew the job had completed with an error. While this worked on paper, when retries were set to 10, but only five retries had been completed, it was never obvious to users how to interpret the logs when the final retry count was less than expected and all they could see was an error.
Over time, we learned from feedback that we were creating a suboptimal solution to get around restrictions that we were responsible for imposing. So we went back to the drawing board.
The Latest Solution
All-in-one Blueprints
As of today, all of our Blueprints that trigger external jobs now have a built in option to “Wait for Completion.” When this is enabled, Shipyard will automatically “poke” the external tool again and again until the executed job has completed, mirroring the final status of the job.
This update has an added benefit of ensuring that Vessels can always finish within a minute of the external job completing, resulting in Fleets that execute faster and data getting delivered sooner.
We’ve also updated the logging of our trigger Blueprints to indicate every time we check against the external service so you can better understand what’s going on under the hood. This means that if something takes 30 minutes, you’ll see multiple logged attempts and the associated status of each attempt.
This is the first step towards improving these Blueprints, but we have one more exciting update in store for our customers.
Free Runtime for all Trigger Blueprints
As of today, all Vessels built with Blueprints that trigger external jobs through 3rd party partners are 100% free to run. There is no longer billable runtime associated with these Vessels.
All of our free Blueprints include Trigger in the name. This includes:
- Airbyte - Trigger Sync
- Census - Trigger Sync
- Coalesce - Trigger Job
- dbt Cloud - Trigger Job
- Domo - Trigger Dataset Refresh
- Fivetran - Trigger Sync
- Hex - Trigger Project
- Hightouch - Trigger Sync
- Mode - Trigger Report Refresh
- Rudderstack - Trigger Sync
- Tableau - Trigger Datasource Refresh
- Tableau - Trigger Workbook Refresh
Now that these Blueprints are free to use, the following changes are effective immediately:
- All Trigger Blueprints will wait for completion by default.
- The Billable Runtime value for Vessels built with Trigger Blueprints will display as 0 in the application and in exports from our API.
- Existing “Check Status” Blueprints are now deprecated and unavailable to select when building a Vessel. They can still be used in existing Fleets, but we are no longer supporting the development of these Blueprints. We are encouraging users to update their Fleets to use the latest “Wait for Completion” functionality which will be more cost effective and run faster.
Despite this extra benefit, if you create your own Blueprint or run your own script to trigger external jobs, these will still accumulate billable runtime as normal. The best way around this is to submit your Blueprint requests to our team so we can build out the same functionality natively.
Impact and Examples
For some of our customers, this change will result in no difference. For others, you may see close to 50% total savings on runtime. It depends entirely on your usage habits.
We’ve created two examples with the same exact data pipeline goal to help you understand the sizable impact that this new update will provide you.
Example 1 - All-in-one:
- Trigger Fivetran to ingest data from Intercom (1hr 5m)
- Run dbt Core models using Python (30m)
- Trigger a Hightouch job to update Salesforce with Intercom data (17m)
In this example, Vessel 1 and 3 would both be free, with dbt Core models still incurring billable runtime.
Example 2 - With Check Status:
- Trigger Fivetran to ingest data from Intercom (1m)
- Check Fivetran job status for success (1hr 15m with 15m retries, each attempt takes 30s)
- Run dbt Core models using Python (30m)
- Trigger a Hightouch job to update Salesforce with Intercom data (1m)
- Check Hightouch job status for Success (20m with 5m retries, each attempt takes 30s)
In this similar example, Vessel 1 and 4 would be free, while Vessel 2, 3, and 5 would still incur billable runtime.
In both examples, the customer saves money, but the most savings comes from using the new all-in-one Trigger Blueprints to their full potential. As a result, we recommend removing any “Check Status” Blueprints from your Fleets to reduce your overall duration and billable runtime!
- Duration = Length of Time Passed. Relates to the speed of the job completing.
** Billable Runtime = Cumulative Sum of Active Server Time. Relates to the amount we charge for.
Going Forward
This change signifies a dramatic shift in our Blueprint Library development philosophy. Our commitment is to continue making Trigger Blueprints that primarily “sit and wait” for an external job to finish free to use. We’re prioritizing making our Blueprints easier to use and all-in-one, rather than building out modular components whose only purpose is to help you save on runtime.
We invite you to test out the new functionality, connect your data tools together, and pay absolutely nothing to do so.
If you have any questions about this update, please reach out to our support team at support@shipyardapp.com.