05 — Stationarity¶
Before you fit ARIMA, do forecasting, or compute confidence intervals on a time series, you should ask: is this series stationary? Almost every standard tool assumes that it is.
This notebook demonstrates the Stationarity mixin: adf_test, kpss_test, and the combined stationarity_summary.
1. What is Stationarity?¶
A time series is stationary if its statistical properties do not change through time. In the usual weak-stationarity sense this means:
- Constant mean — no trend.
- Constant variance — the spread does not grow or shrink.
- Autocovariance depends only on lag, not on absolute time.
Strong stationarity additionally requires the full distribution to be shift-invariant. In practice weak stationarity is what we check.
Why does it matter?
- The theoretical guarantees for ARIMA, OLS on time series, and most confidence intervals assume stationarity.
- If you fit a model on a non-stationary series you get spurious regressions — high R-squared, significant coefficients, but no real relationship.
Common sources of non-stationarity: trends, seasonality, unit roots (random-walk behaviour), variance that changes with the mean.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statista.time_series import TimeSeries
rng = np.random.default_rng(42)
2. Visual Examples — White Noise vs Random Walk¶
Two classic textbook series:
- White noise — independent draws from a fixed distribution. Stationary.
- Random walk — each step adds a white-noise shock to the last value:
x(t) = x(t-1) + e(t). Non-stationary because its variance grows witht.
They look very different even to the naked eye.
n = 400
white_noise = rng.standard_normal(n)
random_walk = np.cumsum(rng.standard_normal(n))
fig, axes = plt.subplots(1, 2, figsize=(11, 3))
axes[0].plot(white_noise, color='steelblue', linewidth=0.8)
axes[0].set_title('White noise (stationary)')
axes[0].axhline(white_noise.mean(), color='red', linestyle='--')
axes[1].plot(random_walk, color='darkorange', linewidth=0.8)
axes[1].set_title('Random walk (non-stationary)')
plt.tight_layout()
plt.show()
The white-noise series wobbles around a fixed mean with constant spread. The random walk drifts; its current level depends on where it has been. No fixed mean, no fixed variance.
3. Augmented Dickey-Fuller — .adf_test()¶
The ADF test looks for a unit root — the signature of a random walk.
- H0 (null): the series has a unit root (is non-stationary).
- H1 (alternative): the series is stationary.
Decision rule: if p-value < 0.05 reject H0 -> the series is stationary.
The test statistic is compared against critical values at the 1%, 5%, and 10% levels. The more negative the statistic, the stronger the evidence against the null.
ts_wn = TimeSeries(white_noise)
ts_rw = TimeSeries(random_walk)
print('ADF on white noise:')
print(ts_wn.adf_test())
print('\nADF on random walk:')
print(ts_rw.adf_test())
ADF on white noise:
statistic p_value used_lag n_obs crit_1% crit_5% crit_10% \
column
Series1 -5.072319 0.001 16 383 -3.433 -2.863 -2.568
conclusion
column
Series1 Stationary
ADF on random walk:
statistic p_value used_lag n_obs crit_1% crit_5% crit_10% \
column
Series1 -1.068165 0.354209 16 383 -3.433 -2.863 -2.568
conclusion
column
Series1 Non-stationary
What this means¶
- White noise: p-value far below 0.05 -> reject H0 -> stationary.
- Random walk: p-value well above 0.05 -> fail to reject H0 -> consistent with a unit root / non-stationary.
4. KPSS Test — .kpss_test()¶
The KPSS test is the mirror image of ADF:
- H0 (null): the series IS stationary.
- H1 (alternative): the series is non-stationary.
Decision rule: if p-value < 0.05 reject H0 -> the series is non-stationary.
Why use both ADF and KPSS? ADF and KPSS have different assumptions and low statistical power in different situations. When they agree you can be confident. When they disagree you have learned something important: the series is likely trend-stationary (stationary around a deterministic trend) or the case is inconclusive.
print('KPSS on white noise:')
print(ts_wn.kpss_test())
print('\nKPSS on random walk:')
print(ts_rw.kpss_test())
KPSS on white noise:
statistic p_value lags crit_10% crit_5% crit_2.5% crit_1% \
column
Series1 0.113163 0.1 6 0.347 0.463 0.574 0.739
conclusion
column
Series1 Stationary
KPSS on random walk:
statistic p_value lags crit_10% crit_5% crit_2.5% crit_1% \
column
Series1 5.129189 0.01 6 0.347 0.463 0.574 0.739
conclusion
column
Series1 Non-stationary
What this means¶
- White noise: p-value at the upper limit (0.10 — note: the KPSS p-value is truncated at 0.10 by the tabulated critical values; a value of 0.10 means "no evidence against stationarity" rather than a precise estimate) -> fail to reject -> the KPSS test also calls it stationary.
- Random walk: p-value at 0.01 -> reject -> non-stationary.
Both tests agree on both series — a clean diagnosis.
5. Combined Diagnosis — .stationarity_summary()¶
Running ADF and KPSS jointly gives four possible outcomes:
| ADF rejects H0? | KPSS rejects H0? | Diagnosis |
|---|---|---|
| Yes | No | Stationary |
| No | Yes | Non-stationary (unit root) — difference the series |
| Yes | Yes | Trend-stationary — remove a deterministic trend |
| No | No | Inconclusive — need more data or a different test |
This single call is the recommended starting point.
print('Summary for white noise:')
print(ts_wn.stationarity_summary())
print('\nSummary for random walk:')
print(ts_rw.stationarity_summary())
Summary for white noise:
adf_stat adf_pvalue kpss_stat kpss_pvalue diagnosis
column
Series1 -5.072319 0.001 0.113163 0.1 Stationary
Summary for random walk:
adf_stat adf_pvalue kpss_stat kpss_pvalue \
column
Series1 -1.068165 0.354209 5.129189 0.01
diagnosis
column
Series1 Non-stationary (unit root)
6. Real-World Example — Rhine River Discharge¶
Let us apply the same tools to annual discharge from four stations along the river Rhine (rhine.csv). The series is 61 years of annual values, so any trend due to climate change or land-use should be detectable.
rhine = pd.read_csv('../../../examples/data/rhine.csv', encoding='utf-8-sig')
print('Shape:', rhine.shape)
print('Columns:', list(rhine.columns))
rhine.head()
Shape: (61, 4) Columns: ['rockenau', 'maxau', 'cologne', 'rees']
| rockenau | maxau | cologne | rees | |
|---|---|---|---|---|
| 0 | 864.070 | 4547.767 | 9590.666 | 9753.263 |
| 1 | 723.215 | 3769.567 | 8182.114 | 8651.166 |
| 2 | 885.580 | 3776.206 | 8972.081 | 9380.397 |
| 3 | 899.729 | 4130.709 | 8864.038 | 9071.717 |
| 4 | 810.590 | 3707.643 | 8071.079 | 8354.173 |
ts_rhine = TimeSeries(rhine)
fig, ax = plt.subplots(figsize=(10, 4))
for col in ts_rhine.columns:
ax.plot(ts_rhine.index, ts_rhine[col], linewidth=0.8, label=col)
ax.set_title('Annual discharge — four Rhine stations')
ax.set_ylabel('discharge (m^3/s)')
ax.legend()
plt.show()
summary = ts_rhine.stationarity_summary()
print(summary)
adf_stat adf_pvalue kpss_stat kpss_pvalue diagnosis column rockenau -3.098443 0.033478 0.114360 0.1 Stationary maxau -1.831716 0.224794 0.170937 0.1 Inconclusive cologne -2.782115 0.063709 0.090464 0.1 Inconclusive rees -2.628073 0.089818 0.103124 0.1 Inconclusive
What this means for the Rhine data¶
Read the diagnosis column for each station. Annual discharge is usually close to stationary over decades, but the four stations can disagree. If a station comes back as Non-stationary (unit root), common next steps are:
- Difference the series:
y_t - y_{t-1}often removes a unit root. - Detrend the series if the diagnosis is Trend-stationary.
- Re-run the stationarity tests and check that the residuals now behave.
7. A Trend-Stationary Example¶
To see the Trend-stationary diagnosis in action we build a line with white-noise wobble around it. Such a series has a non-constant mean (so it is technically non-stationary) but the detrended residuals are stationary. ADF with a trend term can reject the unit-root null; KPSS rejects level-stationarity.
rng = np.random.default_rng(7)
trend_stat = np.arange(300) * 0.05 + rng.standard_normal(300)
ts_trend = TimeSeries(trend_stat)
print(ts_trend.stationarity_summary())
adf_stat adf_pvalue kpss_stat kpss_pvalue \
column
Series1 0.187488 0.567032 4.339013 0.01
diagnosis
column
Series1 Non-stationary (unit root)
8. Practical Workflow¶
When you get a new time series:
- Plot it. A trend or a variance change is often obvious.
- Run
stationarity_summary(). - Read the
diagnosis:- Stationary — go on to modelling.
- Non-stationary (unit root) — try first differencing and re-test.
- Trend-stationary — fit and subtract the trend, then re-test.
- Inconclusive — gather more data, try transformations (log), or use robust methods.
- Always sanity-check with a residual plot after any transformation.
Summary¶
- Stationarity = statistics stay constant through time.
- ADF:
H0 = non-stationary; small p -> stationary. - KPSS:
H0 = stationary; small p -> non-stationary. - Use
stationarity_summary()to combine the two and get a clean diagnosis. - Non-stationary data almost always needs differencing or detrending before downstream analysis.
Next up: notebook 06 is about testing and quantifying trends themselves (Mann-Kendall, Sen's slope, Innovative Trend Analysis).