06 — Trend Detection¶

In environmental and climate science, trend detection answers questions like:

Are summer temperatures rising at this station?
Is annual streamflow decreasing over the last 50 years?
How fast is the trend, and is it real or just noise?

This notebook walks through the Trend mixin: Mann-Kendall, Sen's slope, detrending, and Innovative Trend Analysis.

1. What is a Trend?¶

A trend is a long-term directional change in the mean of a series.

Linear trend — constant rate of change (a straight line).
Monotonic trend — the series keeps moving in one direction, but not necessarily at a constant rate.

Parametric vs non-parametric tests¶

Parametric (e.g. linear regression t-test) assumes residuals are normal with constant variance. Powerful when the assumptions hold but fragile when they don't.
Non-parametric (e.g. Mann-Kendall) makes no distributional assumption. Rank-based, so it is robust to outliers and skewed data.

Environmental data is almost always skewed (rainfall, streamflow) and often has heavy tails. Mann-Kendall is the field standard because of that.

A subtle but critical issue: autocorrelation¶

The classical Mann-Kendall assumes the observations are independent. But daily or even annual environmental data is almost always autocorrelated. Ignoring autocorrelation inflates the z-statistic and makes the test reject the null too often — you see significant trends that are not really there.

Variants like Hamed-Rao adjust the variance of the test statistic to account for autocorrelation.

In [1]:

Copied!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from statista.time_series import TimeSeries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from statista.time_series import TimeSeries

2. Load the Data¶

We use two real datasets:

temp.csv — daily temperature, ~10 years. Strong autocorrelation (each day close to the previous) and a seasonal cycle.
rhine.csv — annual discharge at four Rhine stations, 61 years. Classic hydrology example for trend analysis.

In [2]:

Copied!





temp = pd.read_csv('../../../examples/data/temp.csv', parse_dates=['Date'], index_col='Date')
rhine = pd.read_csv('../../../examples/data/rhine.csv', encoding='utf-8-sig')

ts_temp = TimeSeries(temp)
ts_rhine = TimeSeries(rhine)

print('Temperature:', ts_temp.shape)
print('Rhine:', ts_rhine.shape)
temp = pd.read_csv('../../../examples/data/temp.csv', parse_dates=['Date'], index_col='Date')
rhine = pd.read_csv('../../../examples/data/rhine.csv', encoding='utf-8-sig')

ts_temp = TimeSeries(temp)
ts_rhine = TimeSeries(rhine)

print('Temperature:', ts_temp.shape)
print('Rhine:', ts_rhine.shape)

Temperature: (3650, 1)
Rhine: (61, 4)

In [3]:

Copied!





fig, axes = plt.subplots(2, 1, figsize=(10, 6))
axes[0].plot(ts_temp.index, ts_temp['Temp'], linewidth=0.4)
axes[0].set_title('Daily temperature')
for col in ts_rhine.columns:
    axes[1].plot(ts_rhine[col], label=col, linewidth=1.0)
axes[1].set_title('Annual Rhine discharge')
axes[1].legend(fontsize=8)
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(2, 1, figsize=(10, 6))
axes[0].plot(ts_temp.index, ts_temp['Temp'], linewidth=0.4)
axes[0].set_title('Daily temperature')
for col in ts_rhine.columns:
    axes[1].plot(ts_rhine[col], label=col, linewidth=1.0)
axes[1].set_title('Annual Rhine discharge')
axes[1].legend(fontsize=8)
plt.tight_layout()
plt.show()

No description has been provided for this image

3. Mann-Kendall (original) — `.mann_kendall()`¶

The Mann-Kendall (MK) test is the workhorse of non-parametric trend detection.

H0: no monotonic trend.
H1: a monotonic trend exists.

It works by counting, for every pair of observations (i, j) with i < j, whether x_j > x_i (score +1) or x_j < x_i (score -1). Summing these gives the S statistic.

Result columns to know:

trend — increasing, decreasing, or no trend.
z — standardised test statistic (the further from 0, the stronger).
p_value — probability of seeing an S at least this extreme under H0.
tau — Kendall's tau, a -1..+1 correlation-like measure of trend strength.
slope — the Sen slope (returned as a bonus).

We run the test on annual Rhine discharge. 61 annual observations is modest but usable.

In [4]:

Copied!

mk_rhine = ts_rhine.mann_kendall()
print(mk_rhine.round(4))
mk_rhine = ts_rhine.mann_kendall()
print(mk_rhine.round(4))

             trend      h  p_value       z     tau     s       var_s   slope  \
column                                                                         
rockenau  no trend  False   0.9653 -0.0436 -0.0044  -8.0  25823.3333 -0.0609   
maxau     no trend  False   0.5628 -0.5787 -0.0514 -94.0  25823.3333 -5.8974   
cologne   no trend  False   0.9653  0.0436  0.0044   8.0  25823.3333  0.7109   
rees      no trend  False   0.9553  0.0560  0.0055  10.0  25823.3333  1.7348   

          intercept  
column               
rockenau   682.3449  
maxau     4021.9684  
cologne   7308.3625  
rees      7487.5008

What this means for the Rhine data¶

Read the trend column per station. Positive z and small p_value mean an increasing trend is statistically significant; negative z a decreasing trend. If p_value > 0.05, the test says: not enough evidence to declare a monotonic trend.

4. Why Autocorrelation Matters — MK Variants¶

If the data is autocorrelated, the classical MK test treats nearby-in-time observations as if they were independent. They are not — so the effective sample size is smaller than n, and the true variance of S is larger. This inflates the z-statistic and increases the rate of false positives.

Method options:

method='original' — classical MK (assumes independence).
method='hamed_rao' — Hamed & Rao (1998) variance correction. Recommended for autocorrelated environmental data.
method='yue_wang' — Yue & Wang (2004) alternative correction.
method='pre_whitening' / method='trend_free_pre_whitening' — remove lag-1 autocorrelation before testing.

Let us compare on the daily temperature series, which is highly autocorrelated.

In [5]:

Copied!





mk_orig = ts_temp.mann_kendall(method='original')
mk_hr = ts_temp.mann_kendall(method='hamed_rao')

compare = pd.DataFrame({
    'original_z': mk_orig['z'],
    'original_p': mk_orig['p_value'],
    'hamed_rao_z': mk_hr['z'],
    'hamed_rao_p': mk_hr['p_value'],
})
print(compare.round(4))
mk_orig = ts_temp.mann_kendall(method='original')
mk_hr = ts_temp.mann_kendall(method='hamed_rao')

compare = pd.DataFrame({
    'original_z': mk_orig['z'],
    'original_p': mk_orig['p_value'],
    'hamed_rao_z': mk_hr['z'],
    'hamed_rao_p': mk_hr['p_value'],
})
print(compare.round(4))

        original_z  original_p  hamed_rao_z  hamed_rao_p
column                                                  
Temp        1.1722      0.2411       0.5447        0.586

What this means¶

Expect the Hamed-Rao z-statistic to be smaller in magnitude (p-value larger) than the original. That is the correction doing its job — accounting for the fact that yesterday and today are almost the same observation.

Rule of thumb: for any daily, monthly, or even annual environmental series with visible autocorrelation (check with ts.acf()), use method='hamed_rao'.

5. Sen's Slope — `.sens_slope()`¶

Mann-Kendall tells you if there is a trend. Sen's slope tells you how big it is, in the same units as your data (e.g. degrees per day, m^3/s per year).

Sen's slope is the median of all pairwise slopes (x_j - x_i) / (j - i). Because it uses the median, it is robust to outliers — a few extreme values do not pull it off course. Compare that to ordinary least squares, where a single outlier can change the slope drastically.

The confidence interval is computed from Kendall's tau distribution.

In [6]:

Copied!

sens = ts_rhine.sens_slope()
print(sens.round(4))
sens = ts_rhine.sens_slope()
print(sens.round(4))

           slope  intercept  slope_lower_ci  slope_upper_ci
column                                                     
rockenau -0.0609   681.5015         -3.2287          3.0694
maxau    -5.8974  3953.1269        -21.1382         13.2901
cologne   0.7109  7319.0257        -37.4042         40.7173
rees      1.7348  7513.5229        -43.1647         43.0171

What this means for the Rhine data¶

slope — change per year (discharge is in m^3/s, so slope is m^3/s per year).
slope_lower_ci / slope_upper_ci — 95% confidence bounds on that slope.

If the interval contains 0, the trend is not significant at the 5% level. This is consistent with the Mann-Kendall p-value and is easy to report in a paper: "discharge increased by 2.3 m^3/s per year (95% CI: 0.4 to 4.1)".

In [7]:

Copied!





# Overlay Sen's slope on the data for one Rhine station
col = ts_rhine.columns[0]
y = ts_rhine[col].values
x = np.arange(len(y))
slope = float(sens.loc[col, 'slope'])
intercept = float(sens.loc[col, 'intercept'])

fig, ax = plt.subplots(figsize=(10, 3))
ax.plot(x, y, 'o-', color='steelblue', markersize=3, label=col)
ax.plot(x, intercept + slope * x, 'r--', label=f'Sen slope = {slope:.2f}')
ax.set_title(f'Sen trend line — {col}')
ax.legend()
plt.show()
# Overlay Sen's slope on the data for one Rhine station
col = ts_rhine.columns[0]
y = ts_rhine[col].values
x = np.arange(len(y))
slope = float(sens.loc[col, 'slope'])
intercept = float(sens.loc[col, 'intercept'])

fig, ax = plt.subplots(figsize=(10, 3))
ax.plot(x, y, 'o-', color='steelblue', markersize=3, label=col)
ax.plot(x, intercept + slope * x, 'r--', label=f'Sen slope = {slope:.2f}')
ax.set_title(f'Sen trend line — {col}')
ax.legend()
plt.show()

6. Detrending — `.detrend()`¶

Once a trend is established, you often want to remove it so that you can study the remaining variability (seasonality, cycles, residual autocorrelation, extreme-value behaviour). Removing a trend also makes many series stationary — a prerequisite for ARIMA and other models.

Available methods:

method='linear' — subtract an OLS fit.
method='constant' — subtract the mean.
method='polynomial' (+ order=) — for curved trends.
method='sens' — subtract a Sen slope line. Robust to outliers and matches the MK framework above.

In [8]:

Copied!





detrended = ts_rhine.detrend(method='sens')

fig, axes = plt.subplots(2, 1, figsize=(10, 5), sharex=True)
col = ts_rhine.columns[0]
axes[0].plot(ts_rhine[col].values, color='steelblue')
axes[0].set_title(f'Original — {col}')
axes[1].plot(detrended[col].values, color='darkorange')
axes[1].axhline(0, color='red', linestyle='--')
axes[1].set_title(f'Detrended (Sen) — {col}')
plt.tight_layout()
plt.show()

print('Detrended mean:', round(float(detrended[col].mean()), 4))
detrended = ts_rhine.detrend(method='sens')

fig, axes = plt.subplots(2, 1, figsize=(10, 5), sharex=True)
col = ts_rhine.columns[0]
axes[0].plot(ts_rhine[col].values, color='steelblue')
axes[0].set_title(f'Original — {col}')
axes[1].plot(detrended[col].values, color='darkorange')
axes[1].axhline(0, color='red', linestyle='--')
axes[1].set_title(f'Detrended (Sen) — {col}')
plt.tight_layout()
plt.show()

print('Detrended mean:', round(float(detrended[col].mean()), 4))

Detrended mean: 80.2894

When should you detrend?¶

Before fitting an ARIMA where you want to model residuals only.
Before studying anomalies ("how unusual was this year, beyond the trend?").
When comparing sites with different long-term baselines.

Don't detrend if the trend itself is what you want to study, or if the diagnosis was unit root (non-stationary in a random-walk sense) — in that case, difference instead.

7. Innovative Trend Analysis (ITA) — `.innovative_trend_analysis()`¶

ITA, introduced by Sen (2012), is a visual method complementary to Mann-Kendall.

Procedure: sort the series, split it in half, scatter the first half (x-axis) against the second half (y-axis), and compare against the 1:1 line.

Interpretation:

Points above the 1:1 line -> the later half is larger than the earlier half -> an increasing trend.
Points below the 1:1 line -> decreasing trend.
Points on the line -> no trend.

A key strength of ITA is that you can see trends in parts of the distribution separately. It may be, for example, that only the extremes are trending — something a single MK p-value would hide.

In [9]:

Copied!

col = ts_rhine.columns[0]
result_df, (fig, ax) = ts_rhine.innovative_trend_analysis(column=col)
print(result_df)
col = ts_rhine.columns[0]
result_df, (fig, ax) = ts_rhine.innovative_trend_analysis(column=col)
print(result_df)

C:\gdrive\algorithms\statistics\statista\src\statista\time_series\trend.py:321: UserWarning: Column 'rockenau' has odd length. Last observation dropped for analysis.
  warnings.warn(

          trend_indicator
column                   
rockenau         356.6936

What this means¶

A positive trend_indicator means the sorted second half tends to exceed the sorted first half -> overall increase. A negative value means the opposite. The magnitude has the same units as the data and can be compared across stations.

8. Putting It All Together — A Recommended Workflow¶

When someone hands you a new environmental series and asks "is there a trend?":

Plot the data.
Check autocorrelation (ts.acf(), notebook 04). If present, switch to method='hamed_rao' for Mann-Kendall.
Run ts.mann_kendall(method='hamed_rao') -> is the trend real?
Run ts.sens_slope() -> how big is it, with confidence bounds?
Optionally run ts.innovative_trend_analysis() for a visual cross-check and to inspect different parts of the distribution.
If you need to model the residuals: ts.detrend(method='sens'), then check stationarity (notebook 05) on the detrended series.

Summary¶

Method	Tells you	Robust to
`mann_kendall`	Is there a monotonic trend?	Outliers, skewed data
`mann_kendall(method='hamed_rao')`	Same, corrected for autocorrelation	Autocorrelated data
`sens_slope`	Magnitude of the trend + 95% CI	Outliers
`detrend`	Remove the trend for residual analysis	—
`innovative_trend_analysis`	Visual trend check, works on parts of the distribution	—

You now have a complete trend-analysis toolkit. Combine it with notebook 05 (stationarity) and notebook 04 (autocorrelation) for a full diagnostic workflow.

06 — Trend Detection¶

1. What is a Trend?¶

Parametric vs non-parametric tests¶

A subtle but critical issue: autocorrelation¶

2. Load the Data¶

3. Mann-Kendall (original) — .mann_kendall()¶

What this means for the Rhine data¶

4. Why Autocorrelation Matters — MK Variants¶

What this means¶

5. Sen's Slope — .sens_slope()¶

What this means for the Rhine data¶

6. Detrending — .detrend()¶

When should you detrend?¶

7. Innovative Trend Analysis (ITA) — .innovative_trend_analysis()¶

What this means¶

8. Putting It All Together — A Recommended Workflow¶

Summary¶

3. Mann-Kendall (original) — `.mann_kendall()`¶

5. Sen's Slope — `.sens_slope()`¶

6. Detrending — `.detrend()`¶

7. Innovative Trend Analysis (ITA) — `.innovative_trend_analysis()`¶