The Glowing Python: Visualizing the trend of a time series with Pandas

Saturday, March 23, 2019

Visualizing the trend of a time series with Pandas

The trend of time series is the general direction in which the values change. In this post we will focus on how to use rolling windows to isolate it. Let's download from Google Trends the interest of the search term Pancakes and see what we can do with it:

import pandas as pd
import matplotlib.pyplot as plt
url = './data/pancakes.csv' # downloaded from https://trends.google.com
data = pd.read_csv(url, skiprows=2, parse_dates=['Month'], index_col=['Month'])
plt.plot(data)

Looking at the data we notice that there's some seasonality (Pancakes day! yay!) and an increasing trend. What if we want to visualize just the trend of this curve? We only need to slide a rolling window through the data and compute the average at each step. This can be done in just one line if we use the method rolling:

y_mean = data.rolling('365D').mean()
plt.plot(y_mean)

The parameter passed to rolling '365D' means that our rolling window will have size 365 days. Check out the documentation of the method to know more.
We can also add highlight the variation each year adding to the chart a shade with the amplitude of the standard deviation:

y_std = data.rolling('365D').std()
plt.plot(y_mean)
plt.fill_between(y_mean.index,
                 (y_mean - y_std).values.T[0],
                 (y_mean + y_std).values.T[0], alpha=.5)

Warning: the visualization above assumes that the distribution of the data each year follows a normal distribution, which is not entirely true.

2 comments:

Coastal0April 23, 2019 at 6:27 AM
One of the trickier aspects I've encountered is where data is not regularly sampled, and messes up the rolling statistics a little. I'd love to see your approach for solving that issue!
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Saturday, March 23, 2019

Visualizing the trend of a time series with Pandas

2 comments:

Quote