02 — Visualizing Time Series with TimeSeries¶
Before doing any formal statistics, always look at the data. This notebook demonstrates the Visualization mixin of statista.time_series.TimeSeries on 60+ years of annual flow records for four gauges on the Rhine river (Rockenau, Maxau, Cologne, Rees).
You will learn how to read a box plot, violin, raincloud, histogram, KDE, and a rolling statistics plot, with intuitive explanations of each.
1. Why visualize before analyzing?¶
A famous example is Anscombe's quartet: four datasets with nearly identical mean, variance, correlation, and regression line, but when plotted they look completely different.
Plots reveal things that summary numbers hide:
- Shape of the distribution (symmetric, skewed, bimodal).
- Outliers (single extreme values).
- Trends / change points over time.
- Comparisons between series that share a scale.
Always plot before you run tests.
2. Load the Rhine multi-gauge dataset¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statista.time_series import TimeSeries
df = pd.read_csv('../../../examples/data/rhine.csv')
print('Shape:', df.shape)
print('Columns:', df.columns.tolist())
ts = TimeSeries(df)
ts.head()
Shape: (61, 4) Columns: ['rockenau', 'maxau', 'cologne', 'rees']
| rockenau | maxau | cologne | rees | |
|---|---|---|---|---|
| 0 | 864.070 | 4547.767 | 9590.666 | 9753.263 |
| 1 | 723.215 | 3769.567 | 8182.114 | 8651.166 |
| 2 | 885.580 | 3776.206 | 8972.081 | 9380.397 |
| 3 | 899.729 | 4130.709 | 8864.038 | 9071.717 |
| 4 | 810.590 | 3707.643 | 8071.079 | 8354.173 |
3. Box plot — the five-number summary at a glance¶
A box plot compresses the distribution into five numbers:
Q1-1.5*IQR Q1 median Q3 Q3+1.5*IQR
|-----|-----|
o |-----------| | |-----------| o o
|-----|-----|
outliers box outliers
<--- IQR --->
- Box = middle 50% of the data (from Q1 to Q3).
- Line in the box = median.
- Whiskers = usually extend to 1.5 x IQR beyond the box.
- Dots = outliers beyond the whiskers.
If the median line is far from the middle of the box, the data is skewed.
fig, ax = ts.box_plot(mean=True, grid=True, title='Rhine gauges — box plot')
What this means for your data. All four gauges have a number of high-flow outliers — these are flood years that sit far above the typical annual flow. Cologne and Rees show a very similar spread and magnitude, consistent with them being close to each other on the lower Rhine.
4. Violin plot — shape on top of quartiles¶
A violin plot is a box plot wrapped inside a mirrored kernel density estimate (KDE). The width of the violin at any value tells you how many observations occur near that value.
Compared to a box plot, violins reveal:
- Multi-modality (two bumps = two typical regimes).
- Whether the bulk of the data sits high or low in the range.
- The overall shape (skewed, heavy-tailed, uniform).
fig, ax = ts.violin(mean=True, median=True, title='Rhine gauges — violin plot')
What this means for your data. The violins are wider near the lower half of each series: most years have modest flow, with a long tail upward toward flood years. This is the classic right-skewed shape expected for river discharge.
5. Raincloud — violin + scatter + box¶
A raincloud plot stacks three complementary views on top of each other:
- A violin shows the smoothed density.
- A scatter of raw points ('the rain') shows every actual observation.
- A mini box plot shows the quartiles.
This is the gold standard for exploratory analysis: you see shape, raw points, and summary statistics simultaneously.
fig, ax = ts.raincloud(title='Rhine gauges — raincloud plot')
6. Histogram — the classic distribution view¶
A histogram bins the values into equal-width intervals and counts how many observations fall in each bin. The shape tells you:
- Where the data is concentrated (the mode).
- Whether it is symmetric or skewed.
- Whether there are multiple peaks (mixture of regimes).
Warning: the shape of a histogram depends on the number of bins. Too few bins hide features; too many create noise.
n_values, bin_edges, fig, ax = ts.histogram(
bins=15,
legend=['Rockenau', 'Maxau', 'Cologne', 'Rees'],
title='Rhine annual flows — histogram',
xlabel='Annual flow',
ylabel='Frequency',
)
print('Bin edges:', np.round(bin_edges, 1))
/home/runner/work/statista/statista/src/statista/time_series/visualization.py:538: UserWarning: Multiple columns detected. Please provide a list of colors for each column, Otherwise the givencolor will be ignored. warnings.warn(
Bin edges: [ 0. 1427.8 2855.6 4283.4 5711.2 7138.9 8566.7 9994.5 11422.3 12850.1 14277.9 15705.7 17133.5 18561.3 19989.1 21416.8]
7. Density (KDE) — smooth, non-parametric shape¶
Kernel density estimation (KDE) places a tiny Gaussian 'bump' on each data point and sums them up. The result is a smooth curve that approximates the true underlying probability density function (PDF).
Advantages over a histogram:
- No arbitrary bin edges.
- Smoother, easier to compare multiple series.
Caveat: the smoothness depends on the bandwidth; too small gives noise, too large over-smooths real features.
fig, ax = ts.density(title='Rhine gauges — kernel density estimate', xlabel='Annual flow')
What this means for your data. The four gauges form two clear groups: the two upstream stations (Rockenau, Maxau) peak at lower flows, while Cologne and Rees (downstream, larger catchment) peak at much higher flows. All four are right-skewed.
8. Rolling statistics — are the mean and variance stable over time?¶
A rolling window computes a statistic (mean or std) over the most recent N observations, then slides forward one step and recomputes. The result is a time-varying mean/std.
This plot is the visual counterpart of a stationarity test:
- If the rolling mean drifts up or down, there is a trend.
- If the rolling standard deviation changes over time, the data is heteroscedastic.
- A flat rolling mean and flat rolling std suggest the process is stationary.
Here we use a 10-year window.
fig, ax = ts.rolling_statistics(
window=10,
title='Rhine gauges — 10-year rolling mean & std',
ylabel='Annual flow',
)
What this means for your data. The 10-year rolling means wobble somewhat but do not show a strong monotonic drift across the record. The rolling standard deviations are more erratic — big flood years temporarily inflate the std. Formal stationarity tests (ADF, KPSS) would confirm whether the apparent stability is real.
9. Summary¶
You have seen six complementary visualisations of the same data:
| Method | Best for |
|---|---|
box_plot() |
Quartiles, outliers, compact comparison |
violin() |
Shape and modality |
raincloud() |
Raw points + shape + quartiles in one plot |
histogram() |
Counts in equal-width bins |
density() |
Smooth non-parametric PDF |
rolling_statistics() |
Time-varying mean and std |
Recommendation. Start every analysis with a box plot and a rolling-statistics plot. If the data looks odd, add a raincloud. Only then reach for formal tests.