import pandas as pd import matplotlib.pyplot as plt url = './data/pancakes.csv' # downloaded from https://trends.google.com data = pd.read_csv(url, skiprows=2, parse_dates=['Month'], index_col=['Month']) plt.plot(data)
Looking at the data we notice that there's some seasonality (Pancakes day! yay!) and an increasing trend. What if we want to visualize just the trend of this curve? We only need to slide a rolling window through the data and compute the average at each step. This can be done in just one line if we use the method rolling:
y_mean = data.rolling('365D').mean() plt.plot(y_mean)
The parameter passed to rolling '365D' means that our rolling window will have size 365 days. Check out the documentation of the method to know more.
We can also add highlight the variation each year adding to the chart a shade with the amplitude of the standard deviation:
y_std = data.rolling('365D').std() plt.plot(y_mean) plt.fill_between(y_mean.index, (y_mean - y_std).values.T[0], (y_mean + y_std).values.T[0], alpha=.5)
Warning: the visualization above assumes that the distribution of the data each year follows a normal distribution, which is not entirely true.