Introduction to R for Quantitative Finance
上QQ阅读APP看书,第一时间看更新

Cointegration

The idea behind cointegration, a concept introduced by Granger (1981) and formalized by Engle and Granger (1987), is to find a linear combination between non-stationary time series that result in a stationary time series. It is hence possible to detect stable long-run relationships between non-stationary time series (for example, prices).

Cross hedging jet fuel

Airlines are natural buyers of jet fuel. Since the price of jet fuel can be very volatile, most airlines hedge at least part of their exposure to jet fuel price changes. In the absence of liquid jet fuel OTC instruments, airlines use related exchange traded futures contracts (for example, heating oil) for hedging purposes. In the following section, we derive the optimal hedge ratio using first the classical approach of taking into account only the short-term fluctuations between the two prices; afterwards, we improve on the classical hedge ratio by taking into account the long-run stable relationship between the prices as well.

We first load the necessary libraries. The urca library has some useful methods for unit root tests and for estimating cointegration relationships.

> library("zoo")
> install.packages("urca")
> library("urca")

We import the monthly price data for jet fuel and heating oil (in USD per gallon).

> prices <- read.zoo("JetFuelHedging.csv", sep = ",",+ FUN = as.yearmon, format = "%Y-%m", header = TRUE)

Taking into account only the short-term behavior (monthly price changes) of the two commodities, one can derive the minimum variance hedge by fitting a linear model that explains changes in jet fuel prices by changes in heating oil prices. The beta coefficient of that regression is the optimal hedge ratio.

> simple_mod <- lm(diff(prices$JetFuel) ~ diff(prices$HeatingOil)+0)

The function lm (for linear model) estimates the coefficients for a best fit of changes in jet fuel prices versus changes in heating oil prices. The +0 term means that we set the intercept to zero; that is, no cash holdings.

> summary(simple_mod)
Call:
lm(formula = diff(prices$JetFuel) ~ diff(prices$HeatingOil) +
 0)
Residuals:
 Min 1Q Median 3Q Max
-0.52503 -0.02968 0.00131 0.03237 0.39602

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
diff(prices$HeatingOil) 0.89059 0.03983 22.36 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0846 on 189 degrees of freedom
Multiple R-squared: 0.7257, Adjusted R-squared: 0.7242
F-statistic: 499.9 on 1 and 189 DF, p-value: < 2.2e-16

We obtain a hedge ratio of 0.89059 and a residual standard error of 0.0846. The cross hedge is not perfect; the resulting hedged portfolio is still risky.

We now try to improve on this hedge ratio by using an existing long-run relationship between the levels of jet fuel and heating oil futures prices. You can already guess the existence of such a relationship by plotting the two price series (heating oil prices will be in red) using the following command:

> plot(prices$JetFuel, main = "Jet Fuel and Heating Oil Prices",+ xlab = "Date", ylab = "USD")
> lines(prices$HeatingOil, col = "red")

We use Engle and Granger's two-step estimation technique. Firstly, both time series are tested for a unit root (non-stationarity) using the augmented Dickey-Fuller test.

> jf_adf <- ur.df(prices$JetFuel, type = "drift")
> summary(jf_adf)
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

Test regression drift


Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
 Min 1Q Median 3Q Max 
-1.06212 -0.05015 0.00566 0.07922 0.38086 

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 0.03050 0.02177 1.401 0.16283 
z.lag.1 -0.01441 0.01271 -1.134 0.25845 
z.diff.lag 0.19471 0.07250 2.686 0.00789 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.159 on 186 degrees of freedom
Multiple R-squared: 0.04099, Adjusted R-squared: 0.03067
F-statistic: 3.975 on 2 and 186 DF, p-value: 0.0204


Value of test-statistic is: -1.1335 0.9865

Critical values for test statistics:
 1pct 5pct 10pct
tau2 -3.46 -2.88 -2.57
phi1 6.52 4.63 3.81

The null hypothesis of non-stationarity (jet fuel time series contains a unit root) cannot be rejected at the 1% significance level since the test statistic of -1.1335 is not more negative than the critical value of -3.46. The same holds true for heating oil prices (the test statistic is -1.041).

> ho_adf <- ur.df(prices$HeatingOil, type = "drift")
> summary(ho_adf)

We can now proceed to estimate the static equilibrium model and test the residuals for a stationary time series using an augmented Dickey-Fuller test. Please note that different critical values [for example, from Engle and Yoo (1987)] must now be used since the series under investigation is an estimated one.

> mod_static <- summary(lm(prices$JetFuel ~ prices$HeatingOil))
> error <- residuals(mod_static)
> error_cadf <- ur.df(error, type = "none")
> summary(error_cadf)

The test statistic obtained is -8.912 and the critical value for a sample size of 200 at the 1% level is -4.00; hence we reject the null hypothesis of non-stationarity. We have thus discovered two cointegrated variables and can proceed with the second step; that is, the specification of an Error-Correction Model (ECM). The ECM represents a dynamic model of how (and how fast) the system moves back to the static equilibrium estimated earlier and is stored in the mod_static variable.

> djf <- diff(prices$JetFuel)
> dho <- diff(prices$HeatingOil)
> error_lag <- lag(error, k = -1)
> mod_ecm <- lm(djf ~ dho + error_lag)
> summary(mod_ecm)

Call:
lm(formula = djf ~ dho + error_lag + 0)

Residuals:
 Min 1Q Median 3Q Max 
-0.19158 -0.03246 0.00047 0.02288 0.45117 

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
dho 0.90020 
0.03238 27.798 <2e-16 ***
error_lag -0.65540 0.06614 -9.909 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.06875 on 188 degrees of freedom
Multiple R-squared: 0.8198, Adjusted R-squared: 0.8179 
F-statistic: 427.6 on 2 and 188 DF, p-value: < 2.2e-16

By taking into account the existence of a long-run relationship between jet fuel and heating oil prices (cointegration), the hedge ratio is now slightly higher (0.90020) and the residual standard error significantly lower (0.06875). The coefficient of the error term is negative (-0.65540): large deviations between the two prices are going to be corrected and prices move closer to their long-run stable relationship.