Skip to content

Commit

Permalink
last changes before handover
Browse files Browse the repository at this point in the history
  • Loading branch information
fderyckel committed Jun 14, 2024
1 parent 9cb022e commit 1dee42f
Show file tree
Hide file tree
Showing 56 changed files with 493 additions and 184 deletions.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/blog.html
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ <h1>Posts</h1>
</div>
</div>
<div class="list quarto-listing-default">
<div class="quarto-post image-right" data-index="0" data-categories="Time-Series,ARIMA,Decomposition" data-listing-date-sort="1704733200000" data-listing-file-modified-sort="1717649683989" data-listing-date-modified-sort="1717952400000" data-listing-reading-time-sort="6">
<div class="quarto-post image-right" data-index="0" data-categories="Time-Series,ARIMA,Decomposition" data-listing-date-sort="1704733200000" data-listing-file-modified-sort="1717745512943" data-listing-date-modified-sort="1717952400000" data-listing-reading-time-sort="7">
<div class="body">
<a href="./posts/time-series/05-arima/index.html">
<h3 class="no-anchor listing-title">
Expand Down Expand Up @@ -258,7 +258,7 @@ <h3 class="no-anchor listing-title">
Jan 9, 2024
</div>
<div class="listing-reading-time">
6 min
7 min
</div>
</a>
</div>
Expand Down
274 changes: 192 additions & 82 deletions docs/blog.xml

Large diffs are not rendered by default.

228 changes: 142 additions & 86 deletions docs/posts/time-series/05-arima/index.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,7 @@
"href": "time-series.html",
"title": "Series: Time-series",
"section": "",
"text": "Order By\n Default\n \n Title\n \n \n Date - Oldest\n \n \n Date - Newest\n \n \n \n \n \n \n \n\n\n\n\n\n\n02 - Statistical Moments\n\n\n\n\n\nIntroducing the first 4 moments of statistical analysis: mean, standard deviation, skewness and kurtosis. Showing how to use R and Python on these concepts. We then provide 2 methods to transform data in order to bring it closer to a normal distribution.\n\n\n\n\n\n\nNov 2, 2022\n\n\n9 min\n\n\n\n\n\n\n\n\n03 - AutoCorrelation, Stationarity and Random-Walk - Part 1\n\n\n\n\n\nA dive into the concepts of autocorrelation and stationarity of time-series. We also get into how to plot correlogram using R and Python, random-walk, white-noise.\n\n\n\n\n\n\nSep 29, 2022\n\n\n7 min\n\n\n\n\n\n\n\n\n04 - Time-series decomposition\n\n\n\n\n\nIntroducing time-series decomposition. We first show how to compose time-series using linear trend, seasonality and then white nosie.\n\n\n\n\n\n\nOct 21, 2022\n\n\n3 min\n\n\n\n\n\n\n\n\n05 - AR, MA and ARIMA models\n\n\n\n\n\nIntroducing Arima - Autoregressive Integrated Moving Average.\n\n\n\n\n\n\nJan 9, 2024\n\n\n6 min\n\n\n\n\n\n\nNo matching items"
"text": "Order By\n Default\n \n Title\n \n \n Date - Oldest\n \n \n Date - Newest\n \n \n \n \n \n \n \n\n\n\n\n\n\n02 - Statistical Moments\n\n\n\n\n\nIntroducing the first 4 moments of statistical analysis: mean, standard deviation, skewness and kurtosis. Showing how to use R and Python on these concepts. We then provide 2 methods to transform data in order to bring it closer to a normal distribution.\n\n\n\n\n\n\nNov 2, 2022\n\n\n9 min\n\n\n\n\n\n\n\n\n03 - AutoCorrelation, Stationarity and Random-Walk - Part 1\n\n\n\n\n\nA dive into the concepts of autocorrelation and stationarity of time-series. We also get into how to plot correlogram using R and Python, random-walk, white-noise.\n\n\n\n\n\n\nSep 29, 2022\n\n\n7 min\n\n\n\n\n\n\n\n\n04 - Time-series decomposition\n\n\n\n\n\nIntroducing time-series decomposition. We first show how to compose time-series using linear trend, seasonality and then white nosie.\n\n\n\n\n\n\nOct 21, 2022\n\n\n3 min\n\n\n\n\n\n\n\n\n05 - AR, MA and ARIMA models\n\n\n\n\n\nIntroducing Arima - Autoregressive Integrated Moving Average.\n\n\n\n\n\n\nJan 9, 2024\n\n\n7 min\n\n\n\n\n\n\nNo matching items"
},
{
"objectID": "posts/proba-quant/jensen-inequality/index.html#simulation",
Expand Down Expand Up @@ -872,7 +872,7 @@
"href": "posts/time-series/05-arima/index.html",
"title": "05 - AR, MA and ARIMA models",
"section": "",
"text": "This post is about introducing ARIMA using the CPI data and various R framework for time series. Autoregressive because it is based on past value and moving average to smooth the time series data. Our previous post on autocorrelation and partial autocorelation could be considered as prior material.\n\n\nAutoregression is a class of linear model where the outcome variable is regressed on its previous values (lagged observations).\n\\[Y_t = \\delta + \\phi_1 Y_{t-1} + \\phi_2 Y_{t-2} + \\cdots + \\phi_p Y_{t-p} + \\epsilon_t\\] This AR model used \\(p\\) lags, hence we say it is of order \\(p\\) or \\(AR(p)\\).\n\n\\(\\delta\\) is an intercept like term\n\\(Y_{t-i}\\) are the regressors (time series own lagged observations) with parameters \\(\\phi_{t-i}\\)\n\\(\\epsilon\\) is the error term\n\nAR(1) is then define as \\[Y_t = \\delta + \\phi_1 Y_{t-1} + \\epsilon_t\\]\nFew characteristics of AR models\n\nfor stationary time-series, \\(-1 &lt; \\phi &lt; 1\\)\nnegative \\(\\phi\\) indicates mean-reversion series\npositive \\(\\phi\\) indicates momentum series\nthe auto-correlation ACF of the AR time-series decay at the rate of \\(\\phi\\). So small \\(\\phi\\) will lead to steeper decay in the auto-correlation. For instance, if \\(\\phi = -0.5\\); the first lag autocorrelation will be \\(-0.5\\), the second lag will be \\(0.25\\), the third lag will be \\(-0.125\\), etc.\n\nWe can simulate an AR(1) timeseries in both R and Python.\n\nPythonR\n\n\nIn python, we use the arima_process module from the statsmodels library.\nBecause these models have been developed with ARIMA in mind, we will set up the MA parameter to 1. Also, we set up the intercept to \\(1\\).\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom statsmodels.tsa.arima_process import ArmaProcess\n\nar1 = np.array([1, -0.9]) # the first term: the delta, the second is the phi_1\nma1 = np.array([1]) # as ma is mandatory for ArmaProcess, we set it up to 1\nar_obj1 = ArmaProcess(ar1, ma1)\nsim_data1 = ar_obj1.generate_sample(nsample = 500)\nplt.clf()\nplt.plot(sim_data1)\nplt.show()\n\n\n\n\nDoing it similar with a positive \\(\\phi\\)\n\nar2 = np.array([1, +0.9])\nma2 = np.array([1])\nar_obj2 = ArmaProcess(ar2, ma2)\nsim_data2 = ar_obj2.generate_sample(nsample = 500)\nplt.clf()\nplt.plot(sim_data2)\nplt.show()\n\n\n\n\n\n\nLooking at the auto-correlation decay.\n\nfrom statsmodels.graphics.tsaplots import plot_acf\n\nar3 = np.array([1, -0.5])\nma3 = np.array([1])\nar_obj3 = ArmaProcess(ar3, ma3)\nsim_data3 = ar_obj3.generate_sample(nsample = 500) # to show fast decay with negative phi\n\nplot_acf(sim_data1, alpha = 1, lags = 20)\nplt.show()\n\n\n\nplt.clf()\nplot_acf(sim_data2, alpha = 1, lags = 20)\nplt.show()\n\n\n\nplt.clf()\nplot_acf(sim_data3, alpha = 1, lags = 20)\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\nfrom statsmodels.tsa.arima.model import ARIMA\n\nmodel_ar = ARIMA(sim_data1, order = (1, 0, 0)) # the order ensure we are dealing with just AR model\nres = model_ar.fit()\n\nprint(res.summary())\n\n SARIMAX Results \n==============================================================================\nDep. Variable: y No. Observations: 500\nModel: ARIMA(1, 0, 0) Log Likelihood -693.698\nDate: Fri, 07 Jun 2024 AIC 1393.396\nTime: 10:43:31 BIC 1406.040\nSample: 0 HQIC 1398.357\n - 500 \nCovariance Type: opg \n==============================================================================\n coef std err z P&gt;|z| [0.025 0.975]\n------------------------------------------------------------------------------\nconst -0.5670 0.463 -1.225 0.221 -1.474 0.340\nar.L1 0.9068 0.020 45.580 0.000 0.868 0.946\nsigma2 0.9356 0.059 15.741 0.000 0.819 1.052\n===================================================================================\nLjung-Box (L1) (Q): 0.05 Jarque-Bera (JB): 1.76\nProb(Q): 0.82 Prob(JB): 0.42\nHeteroskedasticity (H): 1.15 Skew: 0.15\nProb(H) (two-sided): 0.37 Kurtosis: 3.00\n===================================================================================\n\nWarnings:\n[1] Covariance matrix calculated using the outer product of gradients (complex-step).\n\n\n\nprint(res.param_names)\n\n['const', 'ar.L1', 'sigma2']\n\nprint(res.params)\n\n[-0.56698664 0.90676447 0.93561666]\n\n\nOur simulated data had \\(0.9\\) as the parameter of the autoregressive term. The estimated parameters is quite close indeed.\n\n\n\nWe can also use the AR to make prediction.\n\nfrom statsmodels.graphics.tsaplots import plot_predict\n\nres.predict(start = 490, end = 510)\n\narray([-4.71568373, -5.63195649, -5.33985692, -4.23172735, -4.32650865,\n -3.53875143, -3.50737 , -3.03884854, -1.95986768, -0.9861391 ,\n -0.68232354, -0.67157004, -0.66181915, -0.65297739, -0.64496 ,\n -0.63769011, -0.63109804, -0.62512058, -0.61970043, -0.61478563,\n -0.61032907])\n\n#res.plot_predict(start = 400, end = 510)\nres.plot_diagnostics()\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nMoving Average (MA) is another class of linear model where the outcome variable is regressed using its own previous error terms. \\[Y_t = \\mu + \\theta_1 \\epsilon_{t-1} + \\theta_2 \\epsilon_{t-2} + \\cdots + \\theta_q \\epsilon_{t-q} + \\epsilon_t\\] This MA model used \\(q\\) lags, hence we say it is of order \\(q\\).\nPutting it all together, the outcome variable of an ARIMA model can be predicted: \\[Y_t = \\{\\delta + \\phi_1 Y_{t-1} + \\phi_2 Y_{t-2} + \\cdots + \\phi_p Y_{t-p} + \\epsilon_t \\} + \\{\\mu + \\theta_1 \\epsilon_{t-1} + \\theta_2 \\epsilon_{t-2} + \\cdots + \\theta_q \\epsilon_{t-q} + \\epsilon_t \\}\\] This can be simplify into: \\[Y_t = \\delta + \\sum_{i=1}^p \\phi_i Y_{t-i} + \\sum_{j=1}^q \\theta_j \\epsilon_{t-j} + \\epsilon_t\\]\nThe parameters of an ARIMA model are (p, d, q) :\n\n\\(p\\) - Autoregressive. The number of lagged observations in the model. Use the previous \\(n\\) observations as predictors.\n\\(d\\) - Integrated. The number of times the data is differenced to make the data stationary\n\n\\(q\\) - the size of the moving average. Use previous errors to predict \\(Y_t\\)\n\nTo apply an ARIMA model to a set of data, we will use the US CPI Energy component that we downloaded on the FED St-Louis website."
"text": "This post is about introducing ARIMA using the CPI data and various R framework for time series. Autoregressive because it is based on past value and moving average to smooth the time series data. Our previous post on autocorrelation and partial autocorelation could be considered as prior material. The assumption behind these models are that the time-series is stationary (or has been transformed to a stationary time series). Recall that a stationary time series has constant mean, variance and auto-correlation over time. In other words, the covariance between the i-th term and the i + m - th term is not a function of time."
},
{
"objectID": "posts/ts-forecast/hotel-booking/index.html",
Expand Down
6 changes: 3 additions & 3 deletions docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://fderyckel.github.io/blog.html/blog.html</loc>
<lastmod>2024-06-07T03:43:57.023Z</lastmod>
<lastmod>2024-06-14T01:51:45.588Z</lastmod>
</url>
<url>
<loc>https://fderyckel.github.io/blog.html/quant-part1.html</loc>
Expand Down Expand Up @@ -138,7 +138,7 @@
</url>
<url>
<loc>https://fderyckel.github.io/blog.html/time-series.html</loc>
<lastmod>2024-06-07T03:43:54.394Z</lastmod>
<lastmod>2024-06-14T01:51:43.076Z</lastmod>
</url>
<url>
<loc>https://fderyckel.github.io/blog.html/posts/proba-quant/discrete-simulations/index.html</loc>
Expand Down Expand Up @@ -166,7 +166,7 @@
</url>
<url>
<loc>https://fderyckel.github.io/blog.html/posts/time-series/05-arima/index.html</loc>
<lastmod>2024-06-07T03:43:53.969Z</lastmod>
<lastmod>2024-06-14T01:51:52.872Z</lastmod>
</url>
<url>
<loc>https://fderyckel.github.io/blog.html/posts/ts-forecast/hotel-booking/index.html</loc>
Expand Down
4 changes: 2 additions & 2 deletions docs/time-series.html
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="3" data-categories="Time-Series,ARIMA,Decomposition" data-listing-date-sort="1704733200000" data-listing-file-modified-sort="1717649683989" data-listing-date-modified-sort="1717952400000" data-listing-reading-time-sort="6">
<div class="quarto-post image-right" data-index="3" data-categories="Time-Series,ARIMA,Decomposition" data-listing-date-sort="1704733200000" data-listing-file-modified-sort="1717745512943" data-listing-date-modified-sort="1717952400000" data-listing-reading-time-sort="7">
<div class="body">
<a href="./posts/time-series/05-arima/index.html">
<h3 class="no-anchor listing-title">
Expand All @@ -285,7 +285,7 @@ <h3 class="no-anchor listing-title">
Jan 9, 2024
</div>
<div class="listing-reading-time">
6 min
7 min
</div>
</a>
</div>
Expand Down
52 changes: 47 additions & 5 deletions posts/time-series/05-arima/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,19 @@ date-modified: "2024-06-10"
# Introduction

This post is about introducing ARIMA using the CPI data and various R framework for time series. Autoregressive because it is based on past value and moving average to smooth the time series data.
Our [previous post](../03-autocorrelation/index.qmd) on autocorrelation and partial autocorelation could be considered as prior material.
Our [previous post](../03-autocorrelation/index.qmd) on autocorrelation and partial autocorelation could be considered as prior material. The assumption behind these models are that the time-series is stationary (or has been transformed to a stationary time series). Recall that **a stationary time series has constant mean, variance and auto-correlation over time**. In other words, **the covariance between the *i-th* term and the *i + m - th* term is not a function of time**.

## Autoregressive models
# Autoregressive models

Autoregression is a class of linear model where the outcome variable is regressed on its previous values (lagged observations).

$$Y_t = \delta + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \cdots + \phi_p Y_{t-p} + \epsilon_t$$
$$Y_t = \delta + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \cdots + \phi_p Y_{t-p} + \epsilon_t$$
$$Y_t = \delta + \sum_{i=1}^{p} \phi_i \cdot Y_{t-i} + \epsilon_t$$
This AR model used $p$ lags, hence we say it is of order $p$ or $AR(p)$.

* $\delta$ is an intercept like term
* $Y_{t-i}$ are the regressors (time series own lagged observations) with parameters $\phi_{t-i}$
* $\epsilon$ is the error term
* $\epsilon$ is an error term. Also $\epsilon_t \sim N \left( 0, \sigma^2 \right)$

AR(1) is then define as $$Y_t = \delta + \phi_1 Y_{t-1} + \epsilon_t$$

Expand Down Expand Up @@ -139,7 +140,48 @@ plt.show()

:::

## Moving Average Models
We could also use a specific dataset to fit an autoregressive model on.

Let's use the famous Nile data set

::: {.panel-tabset}

## Python

```{python}
#| label: py-load-Nile
import pandas as pd
import numpy as np
df_nile = pd.read_csv('../../../raw_data/Nile.csv')
```

## R

```{r}
#| label: r-load-Nile
library(dplyr)
library(readr)
df_nile <- read_csv('../../../raw_data/Nile.csv')
# fit a model wit AR = 1
model_ar1 <- arima(x = df_nile$Nile, order = c(1, 0, 0))
# chek the residuals of the model (should be normallly distributed)
acf(residuals(model_ar1), main = 'Residuals of AR(1) on Nile River.')
```



:::


# Moving Average Models

Moving Average (MA) is another class of linear model where the outcome variable is regressed using its own previous error terms.
$$Y_t = \mu + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t$$
Expand Down
101 changes: 101 additions & 0 deletions raw_data/Nile.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
"","time","Nile"
"1",1871,1120
"2",1872,1160
"3",1873,963
"4",1874,1210
"5",1875,1160
"6",1876,1160
"7",1877,813
"8",1878,1230
"9",1879,1370
"10",1880,1140
"11",1881,995
"12",1882,935
"13",1883,1110
"14",1884,994
"15",1885,1020
"16",1886,960
"17",1887,1180
"18",1888,799
"19",1889,958
"20",1890,1140
"21",1891,1100
"22",1892,1210
"23",1893,1150
"24",1894,1250
"25",1895,1260
"26",1896,1220
"27",1897,1030
"28",1898,1100
"29",1899,774
"30",1900,840
"31",1901,874
"32",1902,694
"33",1903,940
"34",1904,833
"35",1905,701
"36",1906,916
"37",1907,692
"38",1908,1020
"39",1909,1050
"40",1910,969
"41",1911,831
"42",1912,726
"43",1913,456
"44",1914,824
"45",1915,702
"46",1916,1120
"47",1917,1100
"48",1918,832
"49",1919,764
"50",1920,821
"51",1921,768
"52",1922,845
"53",1923,864
"54",1924,862
"55",1925,698
"56",1926,845
"57",1927,744
"58",1928,796
"59",1929,1040
"60",1930,759
"61",1931,781
"62",1932,865
"63",1933,845
"64",1934,944
"65",1935,984
"66",1936,897
"67",1937,822
"68",1938,1010
"69",1939,771
"70",1940,676
"71",1941,649
"72",1942,846
"73",1943,812
"74",1944,742
"75",1945,801
"76",1946,1040
"77",1947,860
"78",1948,874
"79",1949,848
"80",1950,890
"81",1951,744
"82",1952,749
"83",1953,838
"84",1954,1050
"85",1955,918
"86",1956,986
"87",1957,797
"88",1958,923
"89",1959,975
"90",1960,815
"91",1961,1020
"92",1962,906
"93",1963,901
"94",1964,1170
"95",1965,912
"96",1966,746
"97",1967,919
"98",1968,718
"99",1969,714
"100",1970,740

0 comments on commit 1dee42f

Please sign in to comment.