Lesson 3: Backtest with grid search using only 3 lines of code
Use fastquant to easily backtest a grid of parameters in a given trading strategy on Jollibee Food Corp. (JFC)!
- Backtest SMAC
- Define the search space
- Run grid search
- Visualize the period grid
- Built-in grid search in fastquant with only 3 lines of code!
- Final notes
# uncomment to install in colab
# !pip3 install fastquant
fastquant offers a convenient way to backtest several trading strategies. To backtest using Simple Moving Average Crossover (SMAC
), we do the following.
backtest('smac', dcv_data, fast_period=15, slow_period=40)
fast_period
and slow_period
are two SMAC
parameters that can be changed depending on the user's preferences. A simple way to fine tune these parameters is to run backtest
on a grid of values and find which combination of fast_period
and slow_period
yields the highest net profit.
First, we fetch JFC
's historical data comprised of date, close price, and volume.
from fastquant import get_stock_data, backtest
symbol='JFC'
dcv_data = get_stock_data(symbol,
start_date='2018-01-01',
end_date='2020-04-28',
format='cv',
)
dcv_data.head()
import matplotlib.pyplot as pl
pl.style.use("default")
from fastquant import backtest
results = backtest("smac",
dcv_data,
fast_period=15,
slow_period=40,
verbose=False,
plot=True
)
The plot above is optional. backtest
returns a dataframe of parameters and corresponding metrics:
results.head()
Second, we specify the range of reasonable values to explore for fast_period
and slow_period
. Let's take between 1 and 20 trading days (roughly a month) in steps of 1 day for fast_period
, and between 21 and 240 trading days (roughly a year) in steps of 5 days for slow_period
.
import numpy as np
fast_periods = np.arange(1,20,1, dtype=int)
slow_periods = np.arange(20,241,5, dtype=int)
# make a grid of 0's (placeholder)
period_grid = np.zeros(shape=(len(fast_periods),len(slow_periods)))
period_grid.shape
Third, we run backtest for each iteration over each pair of fast_period
and slow_period
, saving each time the net profit to the period_grid
variable.
For now, we will perform this the long way, but in a later section, we will perform the backtest in only 3 lines of code using a built in functionality within fastquant's backtest
function.
Note: Before running backtest over a large grid, try measuring how long it takes your machine to run one backtest instance.
%timeit
backtest(...)
In my machine with 8 cores, backtest
takes
101 ms ± 8.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from time import time
init_cash=100000
start_time = time()
for i,fast_period in enumerate(fast_periods):
for j,slow_period in enumerate(slow_periods):
results = backtest('smac',
dcv_data,
fast_period=fast_period,
slow_period=slow_period,
init_cash=100000,
verbose=False,
plot=False
)
net_profit = results.final_value.values[0]-init_cash
period_grid[i,j] = net_profit
end_time = time()
time_basic = end_time-start_time
print("Basic grid search took {:.1f} sec".format(time_basic))
Next, we visualize period_grid
as a 2D matrix.
import matplotlib.colors as mcolors
import matplotlib.pyplot as pl
pl.style.use("default")
fig, ax = pl.subplots(1,1, figsize=(8,4))
xmin, xmax = slow_periods[0],slow_periods[-1]
ymin, ymax = fast_periods[0],fast_periods[-1]
#make a diverging color map such that profit<0 is red and blue otherwise
cmap = pl.get_cmap('RdBu')
norm = mcolors.DivergingNorm(vmin=period_grid.min(),
vmax = period_grid.max(),
vcenter=0
)
#plot matrix
cbar = ax.imshow(period_grid,
origin='lower',
interpolation='none',
extent=[xmin, xmax, ymin, ymax],
cmap=cmap,
norm=norm
)
pl.colorbar(cbar, ax=ax, shrink=0.9,
label='net profit', orientation="horizontal")
# search position with highest net profit
y, x = np.unravel_index(np.argmax(period_grid), period_grid.shape)
best_slow_period = slow_periods[x]
best_fast_period = fast_periods[y]
# mark position
# ax.annotate(f"max profit={period_grid[y, x]:.0f}@({best_slow_period}, {best_fast_period}) days",
# (best_slow_period+5,best_fast_period+1)
# )
ax.axvline(best_slow_period, 0, 1, c='k', ls='--')
ax.axhline(best_fast_period+0.5, 0, 1, c='k', ls='--')
# add labels
ax.set_aspect(5)
pl.setp(ax,
xlim=(xmin,xmax),
ylim=(ymin,ymax),
xlabel='slow period (days)',
ylabel='fast period (days)',
title='JFC w/ SMAC',
);
print(f"max profit={period_grid[y, x]:.0f} @ ({best_slow_period},{best_fast_period}) days")
From the plot above, there are only a few period combinations which we can guarantee non-negative net profit using SMAC strategy. The best result is achieved with (105,30) for period_slow and period_fast, respectively.
In fact SMAC strategy is so bad such that there is only 9% chance it will yield profit when using any random period combinations in our grid, which is smaller than the 12% chance it will yield break even at least.
percent_positive_profit=(period_grid>0).sum()/np.product(period_grid.shape)*100
percent_positive_profit
percent_breakeven=(period_grid==0).sum()/np.product(period_grid.shape)*100
percent_breakeven
Anyway, let's check the results of backtest using the best_fast_period
and best_slow_period
.
results = backtest('smac',
dcv_data,
fast_period=best_fast_period,
slow_period=best_slow_period,
verbose=True,
plot=True
)
net_profit = results.final_value.values[0]-init_cash
net_profit
There are only 6 cross-over events of which only the latest transaction yielded positive gains resulting to a 7% net profit. Is 7% profit over a ~two-year baseline better than the market benchmark?
The good news is backtest
provides a built-in grid search if strategy parameters are lists. Let's re-run backtest
with a grid we used above.
Note that "3 lines" here refers to the fastquant import, the data pull (to generate dcv_data
), and actually running the backtest function.
from fastquant import backtest
start_time = time()
results = backtest("smac",
dcv_data,
fast_period=fast_periods,
slow_period=slow_periods,
verbose=False,
plot=False
)
end_time = time()
time_optimized = end_time-start_time
print("Optimized grid search took {:.1f} sec".format(time_optimized))
results
is automatically ranked based on rnorm
which is a proxy for performance. In this case, the best fast_period
,slow_period
=(8,200) d.
The returned parameters are should have len(fast_periods)
xlen(slow_periods)
(19x45=855 in this case).
results.shape
results.head()
Now, we recreate the 2D matrix before, but this time using scatter plot.
fig, ax = pl.subplots(1,1, figsize=(8,4))
#make a diverging color map such that profit<0 is red and blue otherwise
cmap = pl.get_cmap('RdBu')
norm = mcolors.DivergingNorm(vmin=period_grid.min(),
vmax = period_grid.max(),
vcenter=0
)
#plot scatter
results['net_profit'] = results['final_value']-results['init_cash']
df = results[['slow_period','fast_period','net_profit']]
ax2 = df.plot.scatter(x='slow_period', y='fast_period', c='net_profit',
norm=norm, cmap=cmap, ax=ax
)
ymin,ymax = df.fast_period.min(), df.fast_period.max()
xmin,xmax = df.slow_period.min(), df.slow_period.max()
# best performance (instead of highest profit)
best_fast_period, best_slow_period, net_profit = df.loc[0,['fast_period','slow_period','net_profit']]
# mark position
# ax.annotate(f"max profit={net_profit:.0f}@({best_slow_period}, {best_fast_period}) days",
# (best_slow_period-100,best_fast_period+1), color='r'
# )
ax.axvline(best_slow_period, 0, 1, c='r', ls='--')
ax.axhline(best_fast_period+0.5, 0, 1, c='r', ls='--')
ax.set_aspect(5)
pl.setp(ax,
xlim=(xmin,xmax),
ylim=(ymin,ymax),
xlabel='slow period (days)',
ylabel='fast period (days)',
title='JFC w/ SMAC',
);
# fig.colorbar(ax2, orientation="horizontal", shrink=0.9, label='net profit')
print(f"max profit={net_profit:.0f} @ ({best_slow_period},{best_fast_period}) days")
Note also that built-in grid search in backtest
is optimized and slightly faster than the basic loop-based grid search.
#time
time_basic/time_optimized
Final notes
While it is tempting to do a grid search over larger search space and finer resolutions, it is computationally expensive, inefficient, and prone to overfitting. There are better methods than brute force grid search which we will tackle in the next example.
As an exercise, it is good to try the following:
- Use different trading strategies and compare their results
- Use a longer data baseline