06 — Trend Detection¶
In environmental and climate science, trend detection answers questions like:
- Are summer temperatures rising at this station?
- Is annual streamflow decreasing over the last 50 years?
- How fast is the trend, and is it real or just noise?
This notebook walks through the Trend mixin: Mann-Kendall, Sen's slope, detrending, and Innovative Trend Analysis.
1. What is a Trend?¶
A trend is a long-term directional change in the mean of a series.
- Linear trend — constant rate of change (a straight line).
- Monotonic trend — the series keeps moving in one direction, but not necessarily at a constant rate.
Parametric vs non-parametric tests¶
- Parametric (e.g. linear regression t-test) assumes residuals are normal with constant variance. Powerful when the assumptions hold but fragile when they don't.
- Non-parametric (e.g. Mann-Kendall) makes no distributional assumption. Rank-based, so it is robust to outliers and skewed data.
Environmental data is almost always skewed (rainfall, streamflow) and often has heavy tails. Mann-Kendall is the field standard because of that.
A subtle but critical issue: autocorrelation¶
The classical Mann-Kendall assumes the observations are independent. But daily or even annual environmental data is almost always autocorrelated. Ignoring autocorrelation inflates the z-statistic and makes the test reject the null too often — you see significant trends that are not really there.
Variants like Hamed-Rao adjust the variance of the test statistic to account for autocorrelation.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statista.time_series import TimeSeries
2. Load the Data¶
We use two real datasets:
temp.csv— daily temperature, ~10 years. Strong autocorrelation (each day close to the previous) and a seasonal cycle.rhine.csv— annual discharge at four Rhine stations, 61 years. Classic hydrology example for trend analysis.
temp = pd.read_csv('../../../examples/data/temp.csv', parse_dates=['Date'], index_col='Date')
rhine = pd.read_csv('../../../examples/data/rhine.csv', encoding='utf-8-sig')
ts_temp = TimeSeries(temp)
ts_rhine = TimeSeries(rhine)
print('Temperature:', ts_temp.shape)
print('Rhine:', ts_rhine.shape)
Temperature: (3650, 1) Rhine: (61, 4)
fig, axes = plt.subplots(2, 1, figsize=(10, 6))
axes[0].plot(ts_temp.index, ts_temp['Temp'], linewidth=0.4)
axes[0].set_title('Daily temperature')
for col in ts_rhine.columns:
axes[1].plot(ts_rhine[col], label=col, linewidth=1.0)
axes[1].set_title('Annual Rhine discharge')
axes[1].legend(fontsize=8)
plt.tight_layout()
plt.show()
3. Mann-Kendall (original) — .mann_kendall()¶
The Mann-Kendall (MK) test is the workhorse of non-parametric trend detection.
- H0: no monotonic trend.
- H1: a monotonic trend exists.
It works by counting, for every pair of observations (i, j) with i < j, whether x_j > x_i (score +1) or x_j < x_i (score -1). Summing these gives the S statistic.
Result columns to know:
trend—increasing,decreasing, orno trend.z— standardised test statistic (the further from 0, the stronger).p_value— probability of seeing anSat least this extreme under H0.tau— Kendall's tau, a -1..+1 correlation-like measure of trend strength.slope— the Sen slope (returned as a bonus).
We run the test on annual Rhine discharge. 61 annual observations is modest but usable.
mk_rhine = ts_rhine.mann_kendall()
print(mk_rhine.round(4))
trend h p_value z tau s var_s slope \
column
rockenau no trend False 0.9653 -0.0436 -0.0044 -8.0 25823.3333 -0.0609
maxau no trend False 0.5628 -0.5787 -0.0514 -94.0 25823.3333 -5.8974
cologne no trend False 0.9653 0.0436 0.0044 8.0 25823.3333 0.7109
rees no trend False 0.9553 0.0560 0.0055 10.0 25823.3333 1.7348
intercept
column
rockenau 682.3449
maxau 4021.9684
cologne 7308.3625
rees 7487.5008
What this means for the Rhine data¶
Read the trend column per station. Positive z and small p_value mean an increasing trend is statistically significant; negative z a decreasing trend. If p_value > 0.05, the test says: not enough evidence to declare a monotonic trend.
4. Why Autocorrelation Matters — MK Variants¶
If the data is autocorrelated, the classical MK test treats nearby-in-time observations as if they were independent. They are not — so the effective sample size is smaller than n, and the true variance of S is larger. This inflates the z-statistic and increases the rate of false positives.
Method options:
method='original'— classical MK (assumes independence).method='hamed_rao'— Hamed & Rao (1998) variance correction. Recommended for autocorrelated environmental data.method='yue_wang'— Yue & Wang (2004) alternative correction.method='pre_whitening'/method='trend_free_pre_whitening'— remove lag-1 autocorrelation before testing.
Let us compare on the daily temperature series, which is highly autocorrelated.
mk_orig = ts_temp.mann_kendall(method='original')
mk_hr = ts_temp.mann_kendall(method='hamed_rao')
compare = pd.DataFrame({
'original_z': mk_orig['z'],
'original_p': mk_orig['p_value'],
'hamed_rao_z': mk_hr['z'],
'hamed_rao_p': mk_hr['p_value'],
})
print(compare.round(4))
original_z original_p hamed_rao_z hamed_rao_p column Temp 1.1722 0.2411 0.5447 0.586
What this means¶
Expect the Hamed-Rao z-statistic to be smaller in magnitude (p-value larger) than the original. That is the correction doing its job — accounting for the fact that yesterday and today are almost the same observation.
Rule of thumb: for any daily, monthly, or even annual environmental series with visible autocorrelation (check with ts.acf()), use method='hamed_rao'.
5. Sen's Slope — .sens_slope()¶
Mann-Kendall tells you if there is a trend. Sen's slope tells you how big it is, in the same units as your data (e.g. degrees per day, m^3/s per year).
Sen's slope is the median of all pairwise slopes (x_j - x_i) / (j - i). Because it uses the median, it is robust to outliers — a few extreme values do not pull it off course. Compare that to ordinary least squares, where a single outlier can change the slope drastically.
The confidence interval is computed from Kendall's tau distribution.
sens = ts_rhine.sens_slope()
print(sens.round(4))
slope intercept slope_lower_ci slope_upper_ci column rockenau -0.0609 681.5015 -3.2287 3.0694 maxau -5.8974 3953.1269 -21.1382 13.2901 cologne 0.7109 7319.0257 -37.4042 40.7173 rees 1.7348 7513.5229 -43.1647 43.0171
What this means for the Rhine data¶
slope— change per year (discharge is in m^3/s, so slope is m^3/s per year).slope_lower_ci/slope_upper_ci— 95% confidence bounds on that slope.
If the interval contains 0, the trend is not significant at the 5% level. This is consistent with the Mann-Kendall p-value and is easy to report in a paper: "discharge increased by 2.3 m^3/s per year (95% CI: 0.4 to 4.1)".
# Overlay Sen's slope on the data for one Rhine station
col = ts_rhine.columns[0]
y = ts_rhine[col].values
x = np.arange(len(y))
slope = float(sens.loc[col, 'slope'])
intercept = float(sens.loc[col, 'intercept'])
fig, ax = plt.subplots(figsize=(10, 3))
ax.plot(x, y, 'o-', color='steelblue', markersize=3, label=col)
ax.plot(x, intercept + slope * x, 'r--', label=f'Sen slope = {slope:.2f}')
ax.set_title(f'Sen trend line — {col}')
ax.legend()
plt.show()
6. Detrending — .detrend()¶
Once a trend is established, you often want to remove it so that you can study the remaining variability (seasonality, cycles, residual autocorrelation, extreme-value behaviour). Removing a trend also makes many series stationary — a prerequisite for ARIMA and other models.
Available methods:
method='linear'— subtract an OLS fit.method='constant'— subtract the mean.method='polynomial'(+order=) — for curved trends.method='sens'— subtract a Sen slope line. Robust to outliers and matches the MK framework above.
detrended = ts_rhine.detrend(method='sens')
fig, axes = plt.subplots(2, 1, figsize=(10, 5), sharex=True)
col = ts_rhine.columns[0]
axes[0].plot(ts_rhine[col].values, color='steelblue')
axes[0].set_title(f'Original — {col}')
axes[1].plot(detrended[col].values, color='darkorange')
axes[1].axhline(0, color='red', linestyle='--')
axes[1].set_title(f'Detrended (Sen) — {col}')
plt.tight_layout()
plt.show()
print('Detrended mean:', round(float(detrended[col].mean()), 4))
Detrended mean: 80.2894
When should you detrend?¶
- Before fitting an ARIMA where you want to model residuals only.
- Before studying anomalies ("how unusual was this year, beyond the trend?").
- When comparing sites with different long-term baselines.
Don't detrend if the trend itself is what you want to study, or if the diagnosis was unit root (non-stationary in a random-walk sense) — in that case, difference instead.
7. Innovative Trend Analysis (ITA) — .innovative_trend_analysis()¶
ITA, introduced by Sen (2012), is a visual method complementary to Mann-Kendall.
Procedure: sort the series, split it in half, scatter the first half (x-axis) against the second half (y-axis), and compare against the 1:1 line.
Interpretation:
- Points above the 1:1 line -> the later half is larger than the earlier half -> an increasing trend.
- Points below the 1:1 line -> decreasing trend.
- Points on the line -> no trend.
A key strength of ITA is that you can see trends in parts of the distribution separately. It may be, for example, that only the extremes are trending — something a single MK p-value would hide.
col = ts_rhine.columns[0]
result_df, (fig, ax) = ts_rhine.innovative_trend_analysis(column=col)
print(result_df)
C:\gdrive\algorithms\statistics\statista\src\statista\time_series\trend.py:321: UserWarning: Column 'rockenau' has odd length. Last observation dropped for analysis. warnings.warn(
trend_indicator column rockenau 356.6936
What this means¶
A positive trend_indicator means the sorted second half tends to exceed the sorted first half -> overall increase. A negative value means the opposite. The magnitude has the same units as the data and can be compared across stations.
8. Putting It All Together — A Recommended Workflow¶
When someone hands you a new environmental series and asks "is there a trend?":
- Plot the data.
- Check autocorrelation (
ts.acf(), notebook 04). If present, switch tomethod='hamed_rao'for Mann-Kendall. - Run
ts.mann_kendall(method='hamed_rao')-> is the trend real? - Run
ts.sens_slope()-> how big is it, with confidence bounds? - Optionally run
ts.innovative_trend_analysis()for a visual cross-check and to inspect different parts of the distribution. - If you need to model the residuals:
ts.detrend(method='sens'), then check stationarity (notebook 05) on the detrended series.
Summary¶
| Method | Tells you | Robust to |
|---|---|---|
mann_kendall |
Is there a monotonic trend? | Outliers, skewed data |
mann_kendall(method='hamed_rao') |
Same, corrected for autocorrelation | Autocorrelated data |
sens_slope |
Magnitude of the trend + 95% CI | Outliers |
detrend |
Remove the trend for residual analysis | — |
innovative_trend_analysis |
Visual trend check, works on parts of the distribution | — |
You now have a complete trend-analysis toolkit. Combine it with notebook 05 (stationarity) and notebook 04 (autocorrelation) for a full diagnostic workflow.