Cyclistic Bike Share Analysis

Introduction

Cyclistic is a bike-share company based in Chicago. Their bike-share program features over 5,800 bicycles that are geotracked and locked into a network of 692 docking stations across Chicago. Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. These pricing plans helped the company categorize its customers into two types;

Casual Riders: Customers who purchase single-ride or full-day passes
Cyclistic Members (Member Riders): Customers who purchase annual memberships

The finance analysts concluded that Member riders are much more profitable than Casual riders and the manager believed that maximizing the number of annual members will be key to future growth. She also believed that rather than creating a marketing campaign that targets all-new customers, there is a very good chance to convert Casual riders into members.

As the data analyst, I was required to identify how member riders and casual riders use cyclistic bikes differently by analyzing historical bike trip data.

Business Task

The business task of this study was to perform descriptive analysis to spot trends and identify differences between the casual riders and member riders.

Preparing the Data

The data was made available as CSV files which were zipped according to their respective months. To get the data I needed for my analysis, I retrieved 12 previous months of Cyclistic historical bike data from here. I retrieved data from April 2021 to March 2022 for my analysis.

The data was gathered firsthand by the company on all its users. The data is reliable, organized, comprehensive, current and frequently updated, and cited under an open license.

Data-privacy issues prohibit the use of personally identifiable information of the riders’, therefore credit card information was not included in this analysis.

Tools Used

Python: Data cleaning, data transformation, and data analysis
Tableau: Data visualization

Processing the Data

Importing libraries needed for the analysis

import pandas as pd
import numpy as np
import datetime as dt

Loading the historical bike trip data retrieved into dataframes

jan = pd.read_csv("202201-divvy-tripdata.csv")
feb = pd.read_csv("202202-divvy-tripdata.csv")
mar = pd.read_csv("202203-divvy-tripdata.csv")
apr = pd.read_csv("202104-divvy-tripdata.csv")
may = pd.read_csv("202105-divvy-tripdata.csv")
jun = pd.read_csv("202106-divvy-tripdata.csv")
jul = pd.read_csv("202107-divvy-tripdata.csv")
aug = pd.read_csv("202108-divvy-tripdata.csv")
sep = pd.read_csv("202109-divvy-tripdata.csv")
octo = pd.read_csv("202110-divvy-tripdata.csv")
nov = pd.read_csv("202111-divvy-tripdata.csv")
dec = pd.read_csv("202112-divvy-tripdata.csv")

Merging all dataframes into one dataframe

data_list = [apr, may, jun, jul, aug, sep, octo, nov, dec, jan, feb, mar]
merged_df = pd.concat(data_list)

To find the length of each ride, I converted the started_at and ended_at columns to datetime objects so I could compute their difference

merged_df['started_at'] = pd.to_datetime(merged_df['started_at'])
merged_df['ended_at'] = pd.to_datetime(merged_df   ['ended_at'])
merged_df['ride_length'] = merged_df['ended_at'] - merged_df['started_at']

To get the day of the week that each ride started

merged_df["day_of_week"] = merged_df["started_at"].dt.dayofweek

The days are returned as numbers e.g, Monday = 0, Tuesday = 1, and so on. To change it to the days proper, I created a dictionary of the days of the week and wrote a lambda function to apply it to each column

weekday = {
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday',
'Sunday'
}
merged_df['day_of_week_name'] = merged_df   ['day_of_week'].apply(lambda x:weekday[x])

I created columns for the month and year for each ride from the started_at column

merged_df['year'] = pd.DatetimeIndex(merged_df['started_at']).year
merged_df['month'] = pd.DatetimeIndex(merged_df['started_at']).month

Analyzing the Data

To find out the number of riders per user type

total_members = merged_df['member_casual'].value_counts()
total_members

A visualization of this output is shown below

To find out the average duration of a ride for each user type

casual_member = merged_df[merged_df['member_casual'] == 'casual']
avg_casual_member_ride_length = casual_member   ['ride_length'].mean()
avg_casual_member_ride_length

Average Ride Time for Casual Riders

paid_member = merged_df[merged_df['member_casual'] == 'member']
avg_paid_member_ride_length = paid_member   ['ride_length'].mean()
avg_paid_member_ride_length

Average Ride Time for Member Riders

From the outputs, it is obvious that the average duration of a ride for casual riders is higher than that of member riders

To find out the maximum duration of a ride for each user type

max_casual_member_ride_length = casual_member['ride_length'].max()
max_casual_member_ride_length

Maximum Duration of Rides for Casual Riders

max_paid_member_ride_length = paid_member['ride_length'].max()
max_paid_member_ride_length

Maximum Duration of Rides for Member Riders

From the outputs, casual riders have a higher maximum ride length than the member riders

To find out the most frequent day of the week that riders ride for each user type

mode_ride_day_casual_member = casual_member['day_of_week_name'].mode()
mode_ride_day_casual_member

Most Frequent Ride Day for Casual Riders

mode_ride_day_paid_member = paid_member['day_of_week_name'].mode()
mode_ride_day_paid_member

Most Frequent Ride Day for Member Riders

Now let’s take a look at the number of rides per day for each user type

ride_per_day_casual = casual_member.groupby('day_of_week_name')['ride_id'].count().sort_values(ascending=False)
ride_per_day_casual

Number of Rides per Day for Casual Riders

ride_per_day_paid = paid_member.groupby('day_of_week_name')['ride_id'].count().sort_values(ascending=False)
ride_per_day_paid

Number of Rides per Day for Member Riders

Plotting these outputs against each other;

We can see that member riders are more active during weekdays than on weekends while casual riders use the program more often during the weekend than on weekdays.

To find out the average ride length per day for each user type

avg_casual_length_per_day = casual_member[['day_of_week_name', 'ride_length']]
avg_casual_length_per_day = avg_casual_length_per_day.groupby('day_of_week_name')['ride_length'].mean()

#Convert above result to dataframe
avg_casual_length_per_day_df = pd.DataFrame({"day_of_week":avg_casual_length_per_day.index,
                                "casual_ride_length_avg": avg_casual_length_per_day.values})

avg_casual_length_per_day_df['casual_ride_length_avg'] = avg_casual_length_per_day_df['casual_ride_length_avg']/pd.Timedelta(minutes=1) #To get ride length in minutes
avg_casual_length_per_day_df

Average Ride Length per Day for Casual Riders

avg_paid_length_per_day = paid_member[['day_of_week_name', 'ride_length']]
avg_paid_length_per_day =   avg_paid_length_per_day.groupby   ('day_of_week_name')['ride_length'].mean()

avg_paid_length_per_day_df = pd.DataFrame   ({"day_of_week":avg_paid_length_per_day.index,
"paid_ride_length_avg": avg_paid_length_per_day.values})

avg_paid_length_per_day_df  ['paid_ride_length_avg'] =    avg_paid_length_per_day_df ['paid_ride_length_avg']/pd.Timedelta(minutes=1) #To get ride length in minutes

avg_paid_length_per_day_df

Average Ride Length per Day for Member Riders

Plotting these two outputs together;

It can be seen that casual riders ride for longer periods per day when compared to member riders and this difference is quite huge.

The plot below shows the types of bikes used by each user type

The classic bike is the most preferred type of bike by both user types followed by the electric bike. The docked bike is the least preferred bike type and is not even used by any member rider.

Lastly, we will check for any monthly trends in the data.

monthly_trend = merged_df.groupby("month")["member_casual"].value_counts()
monthly_trend_df = monthly_trend.to_frame()
monthly_trend_df
#Convert to dataframe then to CSV file
monthly_trend_df = monthly_trend.unstack(level=1)
monthly_trend_df

The visualization of the result is shown below

There is a general rise in the use of Cyclistic’s program between June to August for both user types. This rise may be a result of vacations or the warm weather that comes with the summer season. It is also worth noting that between the months of September and May, there were more active member riders than casual riders. 30% of Cyclistic users use the program to commute to work and these months normally are active months in most workplaces and schools meaning less vacations and lesser time to ride for leisure and hence the gap between the number of rides among the user types.

The notebook used for my analysis can be found here

The dashboard for these visualizations is shown below and can be found here

Recommendations

The company should consider reducing the price charged on weekend rides for member riders and make a reasonable price gap between both user types since there are more casual riders active during the weekend. This price gap would attract casual riders and may in turn cause them to convert to members.
Discounts could be given to member riders based on ride duration. Since casual riders ride for longer, these discounts can help attract casual riders to purchasing membership subscriptions.
There should be heavy incentives and privileges for member riders during the summer season.
There could be a discount on the price charged for the docked bike for member riders. Since only casual riders use this bike, it could attract them to converting to member riders.