### Volatility and the square root of time

When I was a rookie, I asked one of the senior members of my team how to compute the volatility of an asset. His answer was as follow:

That’s simply an annualized standard deviation. If you are using daily data:

1. Compute the daily returns of the asset,
2. Compute the standard deviation of these returns,
3. Multiply the standard deviation by the square root of 260 (because there are about 260 business days in a year).

“Of course, he added, if you are using weekly returns you have to multiply by the square root of 52 and if you are using monthly data you should multiply by the square root of 12. Simple as that.

And so I did for years, not even trying to understand why I was multiplying the standard deviation by the square root of time.

Time passed by and one day, I found time to solve that mystery. Then, I understood why this formula makes perfect sense but, more importantly, I realized that the calculations made by most people I know in the financial industry, including seasoned investment professionals, are dead wrong.

Let me explain.

### Discrete returns

Let $P_t$ be the price of an asset (that doesn’t pay any interim income such as dividends or coupons for the sake of simplicity) on day $t$. Then, the way most people would compute the return ($\delta_t$) on that asset from day $t-1$ to day $t$ is:

$$\delta_t = \frac{P_t}{P_{t-1}} - 1$$

Now suppose we have price data for $T$ days with $t \in{\{0, 1, 2, .., T\}}$ and, therefore, $T-1$ daily of $\delta_t$, and we want to compute the return over the whole period ($\delta_T$). We would use:

$$\delta_T = \prod_{t=1}^T(1+\delta_t)-1$$

For instance, using the data for the DAX from the EuStockMarkets dataset in R:

> P <- as.numeric(EuStockMarkets[, "DAX"])
> T <- length(P)
>
> dP <- P[-1]/P[-T]-1
> prod(1+dP)-1
 2.360688
>

Indeed, it's equivalent to:

> P[T]/P-1
 2.360688
>

And the mean return $\bar{\delta_t}$ over the same period is given by:

$$\bar{\delta_t} = (1+\delta_T)^{1/T}-1$$

In R:

> rT <- prod(1+dP)-1
> (1+rT)^(1/T)-1
 0.0006519036
>

Or:

$$\bar{\delta_t} = \left( \frac{P_T}{P_0} \right)^{1/T}-1$$

Note we use $P_0$ because $\delta_1$ is the return between $t=0$ and $t=1$. In R:

> (P[T]/P)^(1/T)-1
 0.0006519036
>

Using all this, it's easy to compute what the annualized return of the DAX was over that period. We have a daily mean return ($\bar{\delta_t}$) and we assume a year is 260 business days long; it follows that:

$$\bar{\delta_{260}} = (1+\bar{\delta_t})^{260}-1$$

In R:

> mT <- (1+rT)^(1/T)-1
> (1+mT)^260-1
 0.1846409
>

In words: over that period, the DAX has returned 18.5% per annum on average.

Now, back to the subject: what is the volatility — that is, the annualized standard deviation — of our daily returns ($\delta_t$)? We know how to compute the standard deviation of daily returns; in R:

> sd(dP)
 0.01028088
>

And, according to my former colleague, we should multiply this by $\sqrt{260}$:

> sd(dP)*sqrt(260)
 0.1657742
>

Well, guess what: this is wrong and, not only it is wrong, it doesn't mean anything.

### Continuous returns

Using discrete returns is absolutely correct for most uses but, when computating a volatility (and, therefore, a Sharpe ratio or an information ratio) you should use log returns or, as I like to call them, continuous returns:

$$\delta_t = \ln{\left(\frac{P_t}{P_{t-1}}\right)}$$

Or, equivalently:

$$\delta_t = \ln{(P_t)} - \ln{(P_{t-1})}$$

Let me explain why.

The critical property of continuous returns is that the total (continuous) return over the whole period (the $T$ days) is a sum:

$$\delta_T = \sum_{t=1}^T\delta_t$$

In R:

> dP <- diff(log(P))
> sum(dP)
 1.212146
>

Is, indeed, equivalent to:

> log(P[T]/P)
 1.212146
>

As a result, the mean daily (continuous) return over the whole period ($\bar{\delta_t}$) is simply the arithmetic mean of the daily (continuous) return:

> mean(dP)
 0.0006520417
>

And, you guessed it, the annualized (continuous) return of the DAX was over that period is given by:

> mean(dP)*260
 0.1695309
>

Starting from this, let's go for a (random) walk.

### Random walk

Following Jules Regnault , consider a random variable $\delta_t$ that follow some distribution (we don’t care which one) over $T$ periods and just make two basic assumptions: (i) the distribution is stable across time and (ii) the observations of $\delta_t$ are independent one from another.

Denoting $\mu$ the mean of the distribution which is supposed to be stable (assumption i), we know that the sum of $T$ observations will follow a mean $\mu_T$ given by:

$$\mu_T = \mu \times T$$

Now, that distribution also has a standard deviation ($\sigma$): what is the standard deviation of the sum of $T$ observations?

Thanks to the Bienaymé formula , we know that the variance of the sum of uncorrelated (assumption ii) random variables is the sum of their variances:

$$Var{\left(\sum_{t=1}^T\delta_t \right)} = \sum_{t=1}^T Var{(\delta_t)}$$

Since we have assumed the distribution is stable (assumption i), so does the variante ($\sigma^2$). From which:

$$Var{\left(\sum_{t=1}^T\delta_t \right)} = \sigma^2 \times T$$

Therefore, standard deviation of the sum of $T$ observations:

$$\sqrt{Var{\left(\sum_{t=1}^T\delta_t \right)}} = \sigma \times \sqrt{T}$$

Here you go: here is the square root of time.

What we are computing here is the standard deviation of a sum and this is indeed how we accumulate continuous returns; not the way we accumulate periodic returns (it's a product). In other words, that formula only make sense if your daily (weekly, monthly... whatever) returns are computed as continuous returns.

The volatility is simply that very same calculation over a standard period of one year (here, $T=260$ days). In R:

> P <- as.numeric(EuStockMarkets[, "DAX"])
> T <- 260
> dP <- diff(log(P))
> mean(dP)*T
 0.1695309
> sd(dP)*sqrt(T)
 0.166096
>

Even better, since we have assumed that our $\delta_t$ are independent, the Central Limit Theorem tells us that, after a large enough number of observations ($T$), their sum should follow (or, at least, be close of) — guess what — a normal distribution.

In other words, with our two assumptions and using continuous returns, we are able to compute the mean and the standard deviation of a normal distribution; which basically means that we knows everything else.

### A demo

Let's make a step-by-step demo with the DAX data.

P <- as.numeric(EuStockMarkets[, "DAX"])
dP <- diff(log(P))
T <- 260
# After T days, we should have:
mT <- mean(dP)*T
sT <- sd(dP)*sqrt(T)

Now, we’re going to use  ecdf to generate 1000 random series of length T that follow the empirical distribution of the DAX:

N <- 1000
dist <- ecdf(dP)
Rt <- matrix(quantile(dist, runif(T*N)), T, N)

Let's plot them:

Ct <- apply(Rt, 2, cumsum)
cols <- heat.colors(N)
op <- par(mar = rep(5, 4))
plot(1:T, Ct[, 1], type = "n", ylim = c(-.5, .7), cex.lab = .7,
cex.axis = .7, cex.main = .8, main = "Figure 1")
for(i in 1:N) lines(1:T, Ct[, i], col = cols[i])
par(op)

You shoud get something like:

Now let's see what the distribution looks like after T days:

# The mean:
mean(Ct[T, ])
# Compare to:
mT
# The standard deviation:
sd(Ct[T, ])
# Compare to:
sT

Lastly, compare the empirical distribution at time T with a normal distribution with mean mT and standard deviation sT:

# Density estimate:
d <- density(Ct[T, ], from = -.5, to = .8)

# Normal distribution with our estimated mean/sd (volatility):
y <- dnorm(d$x, mT, sT) # Plot: op <- par(mar = rep(5, 4)) plot(d$x, d$y, type = "l", xlim = c(-.5, .8), cex.lab = .7, cex.axis = .7, cex.main = .8, main = "Figure 2") lines(d$x, y, col = "red")
par(op)


It should look like this:

Pretty close right?

From this, one can compute the probability associated with any level of return and it is clear that the higher the volatility, the more likely you are to face losses .

### Takeaways

First and should you only remember one thing from that post: a volatility should always be computed using continuous (a.k.a log) returns. Any other calculation is false. Period.

Second, most financial models don’t assume anything about the distribution of (say) daily returns: saying that, after $T$ periods, the distribution will be normally distributed is just a consequence of the CLT.

Lastly, the whole random walk thing is based on just two assumptions (i and ii); if you’re looking for weeknesses, this is where you should start.

---
 Jules Augustin Frédéric Regnault, a French stock broker who first suggested the concept of a random walk of prices in 1863.
 Named after Irénée-Jules Bienaymé (1796-1878), one of the last great French statisticians.
 For price probabilities, you'll need to use the corresponding log-normal distribution (see ?dlnorm).

### Le prix de la baguette de 1954 à 2019

Le sujet n’en finit plus de faire débat : j’ai donc reconstruit une série du prix de la baguette (de 250g) en France (les données concernent...