Quant Basics 2: Vectorised Backtest

Tom Starke August 18, 2017 Quant Basics 0

Why Vectorise?

There are several ways to backtest a strategy on historical data. In this section we demonstrate vectorisation. This and the previous section will serve as preliminary exercises before we dive deeply into the quantitative section. However, one should not underestimate the pitfalls of backtesting. It is very easy to make mistakes here, so please be diligent but do not be discouraged if things don’t work the first time. Using out-of-the-box backtesters can be a good alternative, however, it’s often not clear what’s under the hood of these tools and in many cases you pay for a one-size-fits-all tool a large penalty in speed. This can hurt particularly when we need to run hundreds or thousands of different parameter sets.

The most obvious way to run a backtest over historical data is to create a loop and feed price information one-by-one to a decision engine that determines whether we buy, sell or do nothing based on, say, price action. While this is probably the most accurate way to backtest a strategy, loops in scripting languages such as Python are expensive and can be painfully slow if we were to run hundreds or thousands of these with different parameters.

This is where vectorisation comes into play. Here, we produce our trading signals prior to the backtest and just multiply these with the price data. This is phenomenally fast but it also has a few drawbacks. First, we are limited in the complexity of our strategies. In theory, we could vectorise any strategy but in practice, even moderately complex strategy quickly produce multi-dimensional arrays which can be extremely hard to implement. Secondly, since we pre-calculate signals, the strategy cannot react in response to intrinsic states such as PnL’s or drawdowns. Again, in theory it would be possible to implement this but in practice it is near-impossible due to its complexity.

The MA-Crossover Strategy

Despite those shortcomings, vectorised backtests are extremely useful to understand the dynamics of a strategy based on quantitative trading signals in great detail. Here we use a moving average crossover strategy as our underlying example. The trading logic of this is very simple:

We have two moving averages with different lookback periods
A trading signal is produced when the one with the short lookback crosses the long lookback moving average:
- If short MA is above the long MA -> LONG
- If short MA is below the long MA -> SHORT
This way our strategy exploits momentum effects
Reversing long and short means we are trading mean reversion

With the following function we will produce the trading signals for our vectorised backtest:

def calc_signals(tickers,p,a,b):
    sma = p.rolling(a).mean()
    smb = p.rolling(b).mean()
    signal = np.sign(sma - smb).diff()
    actual_signals = signal.dropna(how='all',axis=0)
    for col in actual_signals.columns:
        idx = actual_signals[col].first_valid_index()
        signal[col][idx] = signal[col][idx]/2.
    return signal

We only have a few lines of code here but there is a lot going on. So let’s look a bit closer into the details of that. With sma and smb we denote the two moving averages. rolling() is a member function for a Pandas DataFrame. Pandas is a very useful tool that makes it easy to handle time-stamped financial time series and it has a wealth of operations that we need to manipulate and analyse market data. The rolling() function moves a rolling window over the data and calculates a particular function on that window. In our case this is a simple average function mean() but others could be used such as sums or standard deviations. In rolling() we specify the window length we want and since we look at the MA-crossover we simply calculate a signal as the difference between the two MAs in the next line and differentiate the sign of the result again. In other words, with sign() we find out if one MA is above the other, a state that usually persists for some time. Differentiating this again, shows us when this state changes and this is our crossover signal.

Unfortunately, our data are often full of errors and missing values and this is something we need to deal with effectively. In the line that calculates the “actual_signals” we eliminate all of the ones that are non-numeric.

One more thing needs to be done as shown in the remaining lines and this is slightly challenging. Our current signals give us the occasional +2 or -2 as we change from +1 to -1 or vice versa. Depending on whether we start with a positive or a negative signal our cumulative position as a result of the signal would oscillate between +2 and zero or -2 and zero respectively and our trading strategy would be highly path-dependent. This means that we need to center the cumulative signals around the zero mark in order to avoid that. This is done by finding the first non-zero value, which is either +2 or -2 and divide it by 2. Now, our entry cumulative signal oscillates around zero and this will be very important later on.

Calculate Performance

All we have to do now is to combine our signals with the prices and we’re (almost) done. Too bad, it’s not quite so easy. As I mentioned before, there are quite a few gotcha’s that can bite us. In the first part of our series we calculated the prices for a number of stocks and the trading signals based on the moving average of the price action. Next, let’s have a look at calculating the PnL of our strategy:

def calc_pnl(sig,p):
    sig_up = sig.cumsum().apply(lambda x:x*(x>0))
    sig_dwn = sig.cumsum().apply(lambda x:x*(x<0))
    pnlx = np.cumsum(p.diff()*sig_up+p.diff()*sig_dwn).sum(axis=1)
    return pnlx

These few lines of code are actually quite complex. One way to calculate PnL would be to simply look at the trading signals and take the prices at the points where they are non-zero and add them up. However, this would not give us PnLs when we reverse our positions but for calculations of Sharpe ratios and drawdowns we need intermittent PnLs as well. We can see that this function takes the signal that we have calculated previously as well as the prices. The first two lines inside this function calculate the cumulative signal and filter out either all positive or all negative values. With apply() we can a custom function over our data. You can see that this function has a lambda in it. This is also called an “anonymous function”. In a nutshell, this is similar to def but we don’t give this function a name. It just exists for this particular purpose.

In the following line we sum up the price changes and multiply them with the long and short trades separately before taking the cumulative sum. That’s it! It might take a while to really grasp the concept but once understood it has the advantage that it’s lightning-fast. Furthermore, it can be used for any trading signals with different position values. The final value of pnlx is the overall return of our strategy.

Testing The Code

Finally, let’s test our code. In the test below we produce a simple step function as price by creating a vector of zero returns and setting one of the returns to one. The cumulative sum of this gives a price curve that is zero and then steps to one. We expect the moving average to cross over twice giving us a positive and negative signal with a final PnL of one.

import pandas as pd
import numpy as np
from pylab import plot,show,subplot,xlabel,ylabel

def test_pnl():
	rets = np.zeros(1000)
	rets[500] = 1
	pr = np.cumsum(rets)

	df = pd.DataFrame(pr)
	sig = calc_signals(1,df,10,20)
	pnl = calc_pnl(sig,df)
	subplot(3,1,1)
	plot(df)
	ylabel('price')
	subplot(3,1,2)
	plot(sig)
	ylabel('signal')
	subplot(3,1,3)
	plot(pnl)
	ylabel('pnl')
	xlabel('time')
	show()

test_pnl()

Lets have a look at the plot: