Turning Your Peloton Data into a Daily Stretching Reminder: A Data-Driven Approach
Greetings and welcome to a thorough exploration of data processing with me, your data enthusiast, Steven Johnson. I'm your guide on behalf of the vibrant Shipyard data community, and today we are setting sail on an immersive tutorial journey that transcends the standard quick tips and tricks.
In this exhaustive guide, we'll traverse the path from raw API data to a fully functional data project. Our mission? To tap into the rich data streams of the Peloton fitness ecosystem, transforming it into a useful daily prompt that ensures we maintain our well-being while navigating the work-from-home landscape.
If you'd rather watch this process as opposed to reading through it, check out the YouTube video below:
Rising to the Challenge
My Peloton bike, discreetly occupying a corner of my home office, isn't merely a fitness tool. It's a gateway to a wide-ranging fitness community offering an array of classes—running, rowing, strength training, and more. Today, our lens focuses on a particular slice of this expansive platform: the stretching and yoga classes.
Like many other professionals riding the remote work wave, I find myself tethered to my desk for long stretches of time. It's not ideal, but often it's the reality. To counteract the pitfalls of this stationary lifestyle, I've aimed to weave several brief, 5-10 minute stretching or yoga sessions into my daily routine.
The challenge we're grappling with today is devising a system that pushes out a daily alert via email if I haven't clocked in a stretching class on that day.
Charting Our Course
We'll address this challenge through four main stages:
- Extracting Data from Peloton: We kick off by pulling data from Peloton, requiring a dash of Python know-how. I'll illuminate the coding process, empowering you to extract your data from Peloton.
- Purifying the Data: Once we've harvested the data, we'll need to weed out any inconsistencies to ensure the dataset's integrity. We'll achieve this with Python and Pandas.
- Transferring Data to Snowflake: After our data is polished and primed, we'll employ a handy Shipyard blueprint for seamless CSV uploads into Snowflake as a table.
- Constructing the Daily Alert: The final stride involves establishing our daily alert system using Shipyard and SQL. We'll draft a SQL query to fetch the data from Snowflake, tailor it to meet our requirements, and then put Shipyard to work to oversee the operation. To guarantee this system's daily diligence, we'll also configure a trigger in Shipyard.
Skills and Assumptions
This tutorial is tailored for data engineers and data analysts—you don't need to be an industry veteran to draw value from it. Nevertheless, a fundamental grasp of Python and SQL will come in handy. I'll unpack each step and clarify the process as we proceed.
Additionally, we'll be utilizing Shipyard as our central tool—a powerful orchestration platform we'll harness to craft and automate this process, ensuring it sails smoothly daily to dispatch our alerts.
With the preamble out of the way, let's dive into the task of extracting data from Peloton.
Navigating Peloton's Interface
Our journey begins with a tour of Peloton's user interface. A quick look at my profile, named "Hybrid Theory" (any fellow Linkin Park fans out there?), shows a comprehensive log of all completed workouts. The data depth is impressive, ranging from an overview of workout counts to the minutiae of each session.
However, there's a hitch. Although Peloton provides a "Download Workouts" feature that exports a CSV of your workouts, it's unfortunately not programmatically accessible. This calls for an alternative strategy, and that's where our trusty ally Shipyard enters the picture.
Setting Sail with Shipyard
Switching gears to Shipyard, we start with a blank canvas, or as we like to call it, a fresh Fleet. We initiate by constructing some Python code. While we currently lack a dedicated Peloton blueprint in Shipyard, our aim here is to render the Python code flexible and reusable.
It's worth mentioning that the Python code we'll deploy has drawn inspiration from Al Chin's work with the Peloton API. I've made necessary modifications to align it with our project, but I'm indebted to the foundation his code provided. For those curious, check out Al's analysis here.
Crafting the Code
Instead of leading you through the process of composing the code from scratch, I'll dissect it into manageable segments, elucidating each one and underlining any modifications required within Shipyard.
We'll begin by importing a few essential packages: request, JSON, pandas, and OS. In Shipyard, we must install these packages individually. You can accomplish this under Python packages. For our current project, the versions of request and pandas aren't critically important.
Next, we'll establish our Peloton username and password. It's imperative, from a security perspective, to ensure that these details remain confidential. We'll utilize environment variables to conceal these values. Shipyard facilitates this in the environment variables section. Here, I've designated the variables 'username' and 'password', matching 'Peloton username' and 'Peloton password' in the code.
Harvesting the Data
With these preliminary steps behind us, we can now refocus on harvesting our data from Peloton. This code segment, understandably, gets a tad complex. The code employs 'request' to sign in to the Peloton API, gather workout data, and accumulate it into a dictionary dubbed workout_data.
From there, we process the workout_data dictionary. We first extract the keys from our data, converting them into a list. This allows us to cycle through the workouts.
Subsequently, we compile a list of workouts and convert that list into a Pandas data frame. This data frame is exported as a CSV file named 'Peloton_data.csv', which we will utilize in the second part of this project—cleaning up the CSV.
To verify the success of the data extraction process, we can use print df.head(3) to display the first three records of the data frame. If all goes according to plan, we'll have our workout data neatly formatted and prepped for the next step.
Stay tuned as we transition into the next phase of our project—cleaning the CSV data to ensure its readiness for the subsequent stages. Check out the code for the first Vessel below:
import requests
import json
import pandas as pd
import os
peloton_username = os.environ.get('username')
peloton_password = os.environ.get('password')
s = requests.Session()
base_url = 'https://api.onepeloton.com'
payload = {'username_or_email': peloton_username, 'password': peloton_password}
s.post(base_url + '/auth/login', json=payload)
userID = s.get(base_url + '/api/me').json()['id']
workouts = json.loads(json.dumps(s.get(base_url + '/api/user/' + userID + '/workouts?limit=1000').json()).replace('null', '""'))
workout_data = {}
for workout in workouts['data']:
workout_id = workout['id']
workout_data[workout_id] = {}
workout_summary = json.loads(json.dumps(s.get(base_url + '/api/workout/' + workout_id).json()).replace('null', '""'))
workout_performance = json.loads(json.dumps(s.get(base_url + '/api/workout/' + workout_id + '/performance_graph?every_n=1000').json()).replace('null', '""'))
workout_data[workout_id]['summary'] = workout_summary
workout_data[workout_id]['performance'] = workout_performance
workout_keys = list(workout_data.keys())
workouts = []
for key in workout_keys:
workouts.append(workout_data[key]['summary'])
df = pd.DataFrame(workouts)
print(df.head(3))
df.to_csv('peloton_data.csv', index = None)
Python Transformations: Bridging Peloton and Snowflake Data
In the previous section, I walked you through the process of extracting data and prepping it for transformations. Now let's dive into how we can elevate that data with a few transformations.
I began by porting the raw data harvested from Peloton's API into a Google Sheets to visually demonstrate what we'll be dealing with. You'll notice columns like "created_at," "in_time," "start_time," among others, all representing time in epoch format, which isn't exactly user-friendly.
Our objective in the transformation phase is to transpose these epoch time values into more comprehensible date-time formats. This is a crucial step for setting up the alerts we plan to integrate later.
Leveraging Shipyard for Data Transformation
To accomplish the epoch time conversion, I used the Shipyard platform, where I already had an existing 'vessel' set up. A 'vessel' is a unit of work in Shipyard, essentially a script that executes a specific function. I added a second vessel that will handle our data transformation using Python and pandas, a widely-used data manipulation library.
The primary steps in this process involved importing the necessary Python packages (pandas and datetime), reading in our previously extracted data, and then applying functions to convert epoch times to date-time in the specified columns.
An important point to note for beginners using Shipyard is the ease of referencing files. Files created in a previous vessel, like our 'Peloton_data.csv' file, can be accessed just as if you're working on your local file system.
Incorporating a 'Load Date' Column
One additional transformation we implemented was adding a 'load date' column to our data. This column will aid us in handling duplicate data entries that can surface as we continually pull the latest workouts. By having a clear indication of when the data was loaded, we can effectively manage these duplicates in our SQL queries.
After all transformations, we then saved the updated dataframe into a new CSV file, 'Peloton_data_datetime.csv'.
Take a look at the code for the second Vessel below:
import pandas as pd
from datetime import datetime
peloton_data = pd.read_csv('peloton_data.csv')
time_columns = ['created_at','end_time','start_time','created','device_time_created_at']
for column in time_columns:
peloton_data[column] = pd.to_datetime(peloton_data[column], unit='s')
peloton_data['load_date'] = datetime.now()
peloton_data.to_csv('peloton_data_datetime.csv', index = None)
Uploading Transformed Data into Snowflake
With our transformed data at the ready, the next step was to upload it to a Snowflake table. The process of appending our new CSV data to an existing Snowflake table was performed using a Shipyard's native low-code Blueprint.
It's important to note that this process will generate duplicates as we continually append new workout data. However, thanks to the 'load date' column we added earlier, we will address these duplicates in the subsequent steps.
Setting Up Daily Alerts
The final part of our pipeline was setting up daily alerts based on our Snowflake data. To accomplish this, I used another Snowflake blueprint in Shipyard to store query results to CSV. The SQL query involved here is slightly complicated as it uses a window function to handle duplicate rows and filters the data to only return stretching and yoga classes taken on the current day.
The alert system is set up in such a way that if no yoga or stretching class has been done on a particular day, an email alert will be sent to remind me to stretch. This clever use of Shipyard's email sending functionality allows for personalized alerts based on our data.
The SQL query that I am using for this fucntion is below:
WITH ranked_data AS (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY
ID
ORDER BY LOAD_DATE DESC
) AS rn
FROM "DEMO_DB"."SNOWFLAKE_VIDEO"."PELOTON"
)
SELECT *
FROM ranked_data
WHERE rn = 1 and FITNESS_DISCIPLINE in ('yoga', 'stretching') and DATE(START_TIME) = CURRENT_DATE();
Automating the Pipeline
Once the entire pipeline was established—from extracting data from Peloton, transforming it, loading it into Snowflake, and setting up alerts—I set up a trigger to make the fleet run daily. This way, if I haven't done my daily stretching by 1 p.m., I'll get an email alert nudging me to do so.
Testing the Pipeline and Version Control
Finally, I tested the pipeline to ensure that it runs as expected. As expected, since I hadn't done my stretching for the day, I received an email alert reminding me to do so.
Conclusion
In conclusion, this exercise offered a practical application of integrating data from an API (in this case, Peloton), conducting data transformations with Python, and uploading that data into Snowflake for further use. While the specifics of this example pertain to fitness and health, the principles and procedures can be applied to any data project within your organization. Shipyard simplifies setting up these data pipelines, either through its low-code blueprints or by using your Python scripts.