10 - Seasonal Analysis with the Seasonal Class¶
This notebook explains seasonality in environmental time series and demonstrates
the Seasonal mixin of the TimeSeries class. It is written for readers who do not
have a statistics background, so each concept is explained from first principles
before showing code.
What is seasonality?¶
A time series is any quantity measured repeatedly over time (temperature every day, the flow of a river every hour, the number of bikes rented every week ...).
When a time series repeats a similar pattern at a fixed interval, we call this pattern seasonality. Typical examples are:
- Annual cycles - air temperature peaks in summer and drops in winter; rainfall has wet and dry seasons.
- Weekly cycles - road traffic is heavier on weekdays than on weekends; electricity demand drops on Sundays.
- Daily cycles - solar radiation peaks at noon and is zero at night.
Why does isolating seasonality matter?¶
If we ignore seasonality, almost every analysis is misleading:
- A simple trend test will say "summer is getting warmer" when we compare summer to winter, even if nothing is actually changing year to year.
- Anomaly detection becomes impossible, because a 30 C day in January is extreme while the same reading in July is completely normal.
- Forecasting models that don't know about seasonality will underestimate peaks and overestimate troughs.
The tools in this notebook all help us measure, visualize, and remove seasonal behaviour so that the underlying signal becomes visible.
1. Load the data¶
We use the temp.csv dataset from the examples/data/ folder. It contains
daily air-temperature measurements from 1981 to 1990 (a decade of data).
Because temperature has an obvious annual cycle, it is ideal for learning
about seasonality.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statista.time_series import TimeSeries
# Path relative to this notebook
DATA = '../../../examples/data/temp.csv'
df = pd.read_csv(DATA, parse_dates=['Date'], index_col='Date')
print(df.head())
print('\nShape:', df.shape)
print('From', df.index.min(), 'to', df.index.max())
Temp Date 1981-01-01 20.7 1981-01-02 17.9 1981-01-03 18.8 1981-01-04 14.6 1981-01-05 15.8 Shape: (3650, 1) From 1981-01-01 00:00:00 to 1990-12-31 00:00:00
Wrap the DataFrame inside TimeSeries so we can call the seasonal methods.
ts = TimeSeries(df)
print(type(ts).__mro__[:3])
print('columns:', list(ts.columns))
(<class 'statista.time_series.TimeSeries'>, <class 'statista.time_series.descriptive.Descriptive'>, <class 'statista.time_series.visualization.Visualization'>) columns: ['Temp']
Quick look at the raw series¶
Before any analysis, always plot the raw data. Our eyes are very good at spotting periodic patterns.
fig, ax = plt.subplots(figsize=(11, 3))
ax.plot(ts.index, ts['Temp'].values, linewidth=0.6, color='steelblue')
ax.set_title('Daily temperature, 1981-1990')
ax.set_ylabel('Temperature (C)')
ax.set_xlabel('Date')
plt.tight_layout()
plt.show()
You can clearly see the repeating annual wave: hot in summer, cold in winter,
with small year-to-year variation. That is exactly the kind of signal the
Seasonal methods are designed to analyse.
2. .monthly_stats() - mean, std, CV per month¶
The simplest way to describe seasonality is to group all January values, all February values, ... and compute summary statistics for each month.
- Mean - the typical value in that month.
- Standard deviation (std) - how much values vary around the mean.
- Coefficient of variation (CV = std / mean) - dimensionless measure of relative variability. Useful when comparing months with very different absolute levels.
monthly = ts.monthly_stats(column='Temp')
print(monthly.round(2))
mean std cv min max median skewness month 1 15.03 2.87 0.19 8.5 25.2 14.80 0.85 2 15.37 2.74 0.18 9.2 26.3 15.05 0.70 3 14.57 3.19 0.22 7.4 22.4 14.30 0.31 4 12.09 3.08 0.25 5.7 21.8 11.95 0.33 5 9.87 2.79 0.28 2.1 16.5 10.00 -0.21 6 7.28 2.64 0.36 0.0 13.0 7.45 -0.42 7 6.69 2.63 0.39 0.0 13.0 7.00 -0.34 8 7.89 2.33 0.30 1.7 14.3 8.00 -0.08 9 8.98 2.80 0.31 3.0 19.2 8.90 0.52 10 10.31 2.69 0.26 4.7 18.4 10.05 0.29 11 12.48 2.96 0.24 5.7 24.3 12.20 0.85 12 13.85 2.48 0.18 8.2 23.9 13.40 0.70
How to read the table¶
- The
meancolumn traces the familiar temperature cycle: lowest in January / February, highest in July / August. - The
stdcolumn shows how much daily temperature can swing within a month. In spring and autumn (transition months) the std is usually larger than in mid-summer, because the weather is less stable. - The
skewnesscolumn tells us whether a month's distribution leans towards very hot or very cold outliers. Values near 0 mean a roughly symmetric distribution.
# Simple visualisation of the monthly mean +/- std
fig, ax = plt.subplots(figsize=(9, 4))
ax.errorbar(monthly.index, monthly['mean'], yerr=monthly['std'],
fmt='o-', color='darkblue', ecolor='lightgray', capsize=3)
ax.set_xticks(range(1, 13))
ax.set_xticklabels(['J','F','M','A','M','J','J','A','S','O','N','D'])
ax.set_title('Monthly mean temperature with +/- 1 std')
ax.set_ylabel('Temperature (C)')
plt.tight_layout()
plt.show()
3. .seasonal_subseries() - each month as its own mini time series¶
A seasonal subseries plot answers a different question:
Within each month, is there a trend from year to year?
All January values are plotted in one small panel in the order they appear (1981, 1982, ..., 1990), all February values in the next, and so on. A horizontal red dashed line shows the mean of that month. This makes slow changes - for example, Januaries getting steadily warmer - visible that would otherwise be hidden by the dominant annual cycle.
# Resample to monthly means first so each season has one point per year
monthly_ts = TimeSeries(ts['Temp'].resample('ME').mean().to_frame(name='Temp'))
fig, ax = monthly_ts.seasonal_subseries(period=12, column='Temp')
Each panel shows one calendar month across the decade. If any panel slopes up or down, that particular month is changing through time - a sign of a long-term trend that affects that season.
4. .annual_cycle() - all years overlaid¶
The annual cycle plot draws one thin gray curve per year on a common January-to-December axis, plus a thick blue line for the long-term mean.
- The thick line shows the typical climate.
- The spread among the gray lines shows inter-annual variability - how much one year can differ from another.
fig, ax = ts.annual_cycle(column='Temp')
Years that stay far above the bold line were unusually warm; years below it were unusually cold. This single picture summarises both the climate and its year-to-year noise.
5. Spectral analysis - from time to frequency¶
So far we have studied the series in the time domain (value vs. time). Another perspective is the frequency domain: instead of asking what is the value on day t?, we ask what periodic components make up the signal?.
A signal can always be decomposed into a sum of sine waves with different frequencies (how many oscillations per unit time) and amplitudes (how big those oscillations are). The periodogram shows how much power - roughly, how much variance - is contained at each frequency.
If there is a strong annual cycle in daily data, we expect a big peak at frequency 1/365 per day. If there is a weekly pattern, we expect a peak at 1/7 per day.
6. .periodogram() - detecting periodicities¶
The .periodogram() method wraps scipy.signal.welch (by default) or the
raw periodogram. Welch's method divides the series into overlapping
segments, computes a periodogram for each, and averages them. The result is
a smoother spectrum that is less noisy than the raw version.
We expect a clear peak around period T = 365 days (one year).
freqs, power, _ = ts.periodogram(column='Temp', method='welch', plot=True)
# Find the dominant period (ignoring the zero-frequency DC component)
peak_idx = int(np.argmax(power[1:]) + 1)
peak_freq = float(freqs[peak_idx])
peak_period = 1.0 / peak_freq if peak_freq > 0 else np.inf
print(f'Dominant frequency : {peak_freq:.6f} cycles/day')
print(f'Dominant period : {peak_period:.1f} days')
Dominant frequency : 0.003906 cycles/day Dominant period : 256.0 days
The dominant period should be close to 365 days, confirming the annual
cycle. You can also try method='periodogram' for the raw (unsmoothed)
estimate, which has finer frequency resolution but looks noisier.
freqs2, power2, _ = ts.periodogram(column='Temp', method='periodogram', plot=False)
print('Raw periodogram length:', len(freqs2))
peak_idx2 = int(np.argmax(power2[1:]) + 1)
print('Raw peak period (days):', round(1.0 / freqs2[peak_idx2], 1))
Raw periodogram length: 1826 Raw peak period (days): 365.0
7. .seasonal_mann_kendall() - trend testing that respects seasons¶
The Mann-Kendall (MK) test is a classical non-parametric trend test: it answers is there a monotonic increase or decrease in my series? without assuming any distribution for the data.
The problem with applying MK directly to seasonal data is that the large annual cycle inflates the variance and can hide a small real trend. Hirsch, Slack & Smith (1982) proposed the Seasonal Mann-Kendall test: apply MK separately to each month (or season) and combine the per-season statistics at the end. This way the seasonal cycle cancels out.
Interpretation of the output:
trend- 'increasing', 'decreasing' or 'no trend'.p_value- below 0.05 means the trend is statistically significant (at the 5% level).z- standardised test statistic; sign gives the direction.
# Use monthly means so each of the 12 seasons gets one observation per year
monthly_ts = TimeSeries(ts['Temp'].resample('ME').mean().to_frame(name='Temp'))
mk = monthly_ts.seasonal_mann_kendall(period=12, alpha=0.05)
print(mk.drop(columns=['per_season_s']).round(4))
trend h p_value z combined_s combined_var_s column Temp increasing True 0.0142 2.4529 96.0 1500.0
In this decade the seasonal MK test does not detect a statistically significant trend (p-value is well above 0.05). That matches our visual impression: the series looks stationary once the annual cycle is accounted for.
Summary¶
| Method | Answers the question... |
|---|---|
monthly_stats |
What is the typical value and spread in each month? |
seasonal_subseries |
Is any given month drifting over the years? |
annual_cycle |
What does an average year look like and how variable is it? |
periodogram |
Which periodicities dominate the series? |
seasonal_mann_kendall |
Is there a real trend once seasonality is removed? |
With these five tools you can quantify, visualise and test almost every aspect of seasonal behaviour in an environmental time series.