Skip to content

Commit 33f353f

Browse files
authored
Hurst exponent with OHLC (#18)
1 parent 5bb1148 commit 33f353f

File tree

14 files changed

+176
-32
lines changed

14 files changed

+176
-32
lines changed

notebooks/api/ta/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ Timeseries Analysis
88
:maxdepth: 1
99

1010
ohlc
11+
paths

notebooks/api/utils/paths.rst renamed to notebooks/api/ta/paths.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Paths
33
===========
44

5-
.. module:: quantflow.utils.paths
5+
.. module:: quantflow.ta.paths
66

77
.. autoclass:: Paths
88
:members:

notebooks/api/utils/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,4 @@ Utils
77
.. toctree::
88
:maxdepth: 1
99

10-
paths
1110
marginal1d

notebooks/applications/hurst.md

Lines changed: 141 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,38 +6,55 @@ jupytext:
66
format_version: 0.13
77
jupytext_version: 1.16.6
88
kernelspec:
9-
display_name: Python 3 (ipykernel)
9+
display_name: .venv
1010
language: python
1111
name: python3
1212
---
1313

1414
# Hurst Exponent
1515

16-
The [Hurst exponent](https://en.wikipedia.org/wiki/Hurst_exponent) is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases.
17-
16+
The [Hurst exponent](https://en.wikipedia.org/wiki/Hurst_exponent) is used as a measure of long-term memory of time series. It relates to the auto-correlations of the time series, and the rate at which these decrease as the lag between pairs of values increases.
1817
It is a statistics which can be used to test if a time-series is mean reverting or it is trending.
18+
19+
The idea idea behind the Hurst exponent is that if the time-series $x_t$ follows a Brownian motion (aka Weiner process), than variance between two time points will increase linearly with the time difference. that is to say
20+
21+
\begin{align}
22+
\text{Var}(x_{t_2} - x_{t_1}) &\propto t_2 - t_1 \\
23+
&\propto \Delta t^{2H}\\
24+
H &= 0.5
25+
\end{align}
26+
27+
where $H$ is the Hurst exponent.
28+
1929
Trending time-series have a Hurst exponent H > 0.5, while mean reverting time-series have H < 0.5. Understanding in which regime a time-series is can be useful for trading strategies.
2030

31+
These are some references to understand the Hurst exponent and its applications:
32+
2133
* [Hurst Exponent for Algorithmic Trading](https://robotwealth.com/demystifying-the-hurst-exponent-part-1/)
34+
* [Basics of Statistical Mean Reversion Testing](https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing/)
2235

2336
## Study with the Weiner Process
2437

25-
We want to construct a mechanism to estimate the Hurst exponent via OHLC data because it is widely available from data provider and easily constructed as an online signal during trading.
38+
We want to construct a mechanism to estimate the Hurst exponent via OHLC data because it is widely available from data providers and easily constructed as an online signal during trading.
2639

2740
In order to evaluate results against known solutions, we consider the Weiner process as generator of timeseries.
2841

2942
The Weiner process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often also called Brownian motion due to its historical connection with the physical model of Brownian motion of particles in water, named after the botanist Robert Brown.
3043

44+
We use the **WeinerProcess** from the stochastic process library and sample one path over a time horizon of 1 (day) with a time step every second.
45+
3146
```{code-cell} ipython3
3247
from quantflow.sp.weiner import WeinerProcess
33-
from quantflow.utils.dates import start_of_day
34-
p = WeinerProcess(sigma=0.5)
35-
paths = p.sample(1, 1, 24*60*60)
36-
paths.plot()
48+
p = WeinerProcess(sigma=2.0)
49+
paths = p.sample(n=1, time_horizon=1, time_steps=24*60*60)
50+
paths.plot(title="A path of Weiner process with sigma=2.0")
3751
```
3852

53+
In order to down-sample the timeseries, we need to convert it into a dataframe with dates as indices.
54+
3955
```{code-cell} ipython3
40-
df = paths.as_datetime_df(start=start_of_day()).reset_index()
56+
from quantflow.utils.dates import start_of_day
57+
df = paths.as_datetime_df(start=start_of_day(), unit="d").reset_index()
4158
df
4259
```
4360

@@ -50,9 +67,19 @@ The value should be close to the **sigma** of the WeinerProcess defined above.
5067
float(paths.paths_std(scaled=True)[0])
5168
```
5269

53-
### Range-base Variance estimators
70+
The evaluation of the hurst exponent is done by calculating the variance for several time windows and by fitting a line to the log-log plot of the variance vs the time window.
5471

55-
We now turn our attention to range-based volatility estimators. These estimators depends on OHLC timeseries, which are widely available from data providers such as [FMP](https://site.financialmodelingprep.com/).
72+
```{code-cell} ipython3
73+
paths.hurst_exponent()
74+
```
75+
76+
As expected, the Hurst exponent should be close to 0.5, since we have calculated the exponent from the paths of a Weiner process.
77+
78+
+++
79+
80+
### Range-based Variance Estimators
81+
82+
We now turn our attention to range-based variance estimators. These estimators depends on OHLC timeseries, which are widely available from data providers such as [FMP](https://site.financialmodelingprep.com/).
5683
To analyze range-based variance estimators, we use he **quantflow.ta.OHLC** tool which allows to down-sample a timeserie to OHLC series and estimate variance with three different estimators
5784

5885
* **Parkinson** (1980)
@@ -65,14 +92,111 @@ For this we build an OHLC estimator as template and use it to create OHLC estima
6592

6693
```{code-cell} ipython3
6794
import pandas as pd
95+
import polars as pl
96+
import math
6897
from quantflow.ta.ohlc import OHLC
69-
ohlc = OHLC(serie="0", period="10m", rogers_satchell_variance=True, parkinson_variance=True, garman_klass_variance=True)
98+
template = OHLC(serie="0", period="10m", rogers_satchell_variance=True, parkinson_variance=True, garman_klass_variance=True)
99+
seconds_in_day = 24*60*60
100+
101+
def rstd(pdf: pl.Series, range_seconds: float) -> float:
102+
"""Calculate the standard deviation from a range-based variance estimator"""
103+
variance = pdf.mean()
104+
# scale the variance by the number of seconds in the period
105+
variance = seconds_in_day * variance / range_seconds
106+
return math.sqrt(variance)
70107
71108
results = []
72-
for period in ("2m", "5m", "10m", "30m", "1h", "4h"):
73-
operator = ohlc.model_copy(update=dict(period=period))
74-
result = operator(df).sum()
75-
results.append(dict(period=period, pk=result["0_pk"].item(), gk=result["0_gk"].item(), rs=result["0_rs"].item()))
76-
vdf = pd.DataFrame(results)
109+
for period in ("10s", "20s", "30s", "1m", "2m", "3m", "5m", "10m", "30m"):
110+
ohlc = template.model_copy(update=dict(period=period))
111+
rf = ohlc(df)
112+
ts = pd.to_timedelta(period).to_pytimedelta().total_seconds()
113+
data = dict(period=period)
114+
for name in ("pk", "gk", "rs"):
115+
estimator = rf[f"0_{name}"]
116+
data[name] = rstd(estimator, ts)
117+
results.append(data)
118+
vdf = pd.DataFrame(results).set_index("period")
77119
vdf
78120
```
121+
122+
These numbers are different from the realized variance because they are based on the range of the prices, not on the actual prices. The realized variance is a more direct measure of the volatility of the process, while the range-based estimators are more robust to market microstructure noise.
123+
124+
The Parkinson estimator is always higher than both the Garman-Klass and Rogers-Satchell estimators, the reason is due to the use of the high and low prices only, which are always further apart than the open and close prices. The GK and RS estimators are similar and are more accurate than the Parkinson estimator, especially for greater periods.
125+
126+
```{code-cell} ipython3
127+
pd.options.plotting.backend = "plotly"
128+
fig = vdf.plot(markers=True, title="Weiner Standard Deviation from Range-based estimators - correct value is 2.0")
129+
fig.show()
130+
```
131+
132+
To estimate the Hurst exponent with the range-based estimators, we calculate the variance of the log of the range for different time windows and fit a line to the log-log plot of the variance vs the time window.
133+
134+
```{code-cell} ipython3
135+
from typing import Sequence
136+
import numpy as np
137+
from quantflow.ta.ohlc import OHLC
138+
from collections import defaultdict
139+
from quantflow.ta.base import DataFrame
140+
141+
default_periods = ("10s", "20s", "30s", "1m", "2m", "3m", "5m", "10m", "30m")
142+
143+
def ohlc_hurst_exponent(
144+
df: DataFrame,
145+
series: Sequence[str],
146+
periods: Sequence[str] = default_periods,
147+
) -> DataFrame:
148+
results = {}
149+
estimator_names = ("pk", "gk", "rs")
150+
for serie in series:
151+
template = OHLC(
152+
serie=serie,
153+
period="10m",
154+
rogers_satchell_variance=True,
155+
parkinson_variance=True,
156+
garman_klass_variance=True
157+
)
158+
time_range = []
159+
estimators = defaultdict(list)
160+
for period in periods:
161+
ohlc = template.model_copy(update=dict(period=period))
162+
rf = ohlc(df)
163+
ts = pd.to_timedelta(period).to_pytimedelta().total_seconds()
164+
time_range.append(ts)
165+
for name in estimator_names:
166+
estimators[name].append(rf[f"{serie}_{name}"].mean())
167+
results[serie] = [float(np.polyfit(np.log(time_range), np.log(estimators[name]), 1)[0])/2.0 for name in estimator_names]
168+
return pd.DataFrame(results, index=estimator_names)
169+
```
170+
171+
```{code-cell} ipython3
172+
ohlc_hurst_exponent(df, series=["0"])
173+
```
174+
175+
The Hurst exponent should be close to 0.5, since we have calculated the exponent from the paths of a Weiner process. But the Hurst exponent is not exactly 0.5 because the range-based estimators are not the same as the realized variance. Interestingly, the Parkinson estimator gives a Hurst exponent closer to 0.5 than the Garman-Klass and Rogers-Satchell estimators.
176+
177+
## Mean Reverting Time Series
178+
179+
We now turn our attention to mean reverting time series, where the Hurst exponent is less than 0.5.
180+
181+
```{code-cell} ipython3
182+
from quantflow.sp.ou import Vasicek
183+
import pandas as pd
184+
pd.options.plotting.backend = "plotly"
185+
186+
p = Vasicek(kappa=2)
187+
paths = {f"kappa={k}": Vasicek(kappa=k).sample(n=1, time_horizon=1, time_steps=24*60*6) for k in (1.0, 10.0, 50.0, 100.0, 500.0)}
188+
pdf = pd.DataFrame({k: p.path(0) for k, p in paths.items()}, index=paths["kappa=1.0"].dates(start=start_of_day()))
189+
pdf.plot()
190+
```
191+
192+
We can now estimate the Hurst exponent from the realized variance. As we can see the Hurst exponent decreases as we increase the mean reversion parameter.
193+
194+
```{code-cell} ipython3
195+
pd.DataFrame({k: [p.hurst_exponent()] for k, p in paths.items()})
196+
```
197+
198+
And we can also estimate the Hurst exponent from the range-based estimators. As we can see the Hurst exponent decreases as we increase the mean reversion parameter along the same lines as the realized variance.
199+
200+
```{code-cell} ipython3
201+
ohlc_hurst_exponent(pdf.reset_index(), list(paths), periods=("10m", "20m", "30m", "1h"))
202+
```

quantflow/sp/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
from pydantic import BaseModel, ConfigDict, Field
88
from scipy.optimize import Bounds
99

10+
from quantflow.ta.paths import Paths
1011
from quantflow.utils.marginal import Marginal1D, default_bounds
1112
from quantflow.utils.numbers import sigfig
12-
from quantflow.utils.paths import Paths
1313
from quantflow.utils.transforms import lower_bound, upper_bound
1414
from quantflow.utils.types import FloatArray, FloatArrayLike, Vector
1515

quantflow/sp/bns.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from pydantic import Field
55
from scipy.special import xlogy
66

7-
from ..utils.paths import Paths
7+
from ..ta.paths import Paths
88
from ..utils.types import FloatArrayLike, Vector
99
from .base import Im, StochasticProcess1D
1010
from .ou import GammaOU

quantflow/sp/cir.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
from quantflow.utils.types import FloatArrayLike, Vector
99

10-
from ..utils.paths import Paths
10+
from ..ta.paths import Paths
1111
from .base import Im, IntensityProcess
1212

1313

quantflow/sp/heston.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
import numpy as np
44
from pydantic import Field
55

6+
from ..ta.paths import Paths
67
from ..utils.distributions import DoubleExponential, Exponential
7-
from ..utils.paths import Paths
88
from ..utils.types import FloatArrayLike, Vector
99
from .base import StochasticProcess1D
1010
from .cir import CIR

quantflow/sp/jump_diffusion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
import numpy as np
66
from pydantic import Field
77

8+
from ..ta.paths import Paths
89
from ..utils.distributions import Normal
9-
from ..utils.paths import Paths
1010
from ..utils.types import FloatArrayLike, Vector
1111
from .base import StochasticProcess1D
1212
from .poisson import CompoundPoissonProcess, D

quantflow/sp/ou.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
from scipy.optimize import Bounds
88
from scipy.stats import gamma, norm
99

10+
from ..ta.paths import Paths
1011
from ..utils.distributions import Exponential
11-
from ..utils.paths import Paths
1212
from ..utils.types import Float, FloatArrayLike, Vector
1313
from .base import Im, IntensityProcess
1414
from .poisson import CompoundPoissonProcess, D

quantflow/sp/poisson.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
from scipy.optimize import Bounds
88
from scipy.stats import poisson
99

10+
from ..ta.paths import Paths
1011
from ..utils.distributions import Distribution1D
1112
from ..utils.functions import factorial
12-
from ..utils.paths import Paths
1313
from ..utils.transforms import TransformResult
1414
from ..utils.types import FloatArray, FloatArrayLike, Vector
1515
from .base import Im, StochasticProcess1D, StochasticProcess1DMarginal

quantflow/sp/weiner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from pydantic import Field
55
from scipy.stats import norm
66

7-
from ..utils.paths import Paths
7+
from ..ta.paths import Paths
88
from ..utils.types import FloatArrayLike, Vector
99
from .base import StochasticProcess1D
1010

quantflow/utils/paths.py renamed to quantflow/ta/paths.py

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@
99
from pydantic import BaseModel, Field
1010
from scipy.integrate import cumulative_trapezoid
1111

12-
from . import plot
13-
from .bins import pdf as bins_pdf
14-
from .dates import utcnow
15-
from .types import FloatArray
12+
from quantflow.utils import plot
13+
from quantflow.utils.bins import pdf as bins_pdf
14+
from quantflow.utils.dates import utcnow
15+
from quantflow.utils.types import FloatArray
1616

1717

1818
class Paths(BaseModel, arbitrary_types_allowed=True):
@@ -58,6 +58,10 @@ def ys(self) -> list[list[float]]:
5858
"""Paths as list of list (for visualization tools)"""
5959
return self.data.transpose().tolist() # type: ignore
6060

61+
def path(self, i: int) -> FloatArray:
62+
"""Path i"""
63+
return self.data[:, i]
64+
6165
def dates(
6266
self, *, start: datetime | None = None, unit: str = "d"
6367
) -> pd.DatetimeIndex:
@@ -116,6 +120,22 @@ def integrate(self) -> Paths:
116120
data=cumulative_trapezoid(self.data, dx=self.dt, axis=0, initial=0),
117121
)
118122

123+
def hurst_exponent(self, steps: int | None = None) -> float:
124+
"""Estimate the Hurst exponent from all paths
125+
126+
:param steps: number of lags to consider, if not provided it uses
127+
half of the time steps capped at 100
128+
"""
129+
ts = self.time_steps // 2
130+
n = min(steps or ts, 100)
131+
lags = []
132+
tau = []
133+
for lag in range(2, n):
134+
variances = np.var(self.data[lag:, :] - self.data[:-lag, :], axis=0)
135+
tau.extend(variances)
136+
lags.extend([lag] * self.samples)
137+
return float(np.polyfit(np.log(lags), np.log(tau), 1)[0]) / 2.0
138+
119139
def cross_section(self, t: float | None = None) -> FloatArray:
120140
"""Cross section of paths at time t"""
121141
index = self.time_steps

quantflow_tests/test_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import numpy as np
22

3+
from quantflow.ta.paths import Paths
34
from quantflow.utils.numbers import round_to_step, to_decimal
4-
from quantflow.utils.paths import Paths
55

66

77
def test_round_to_step():

0 commit comments

Comments
 (0)