# uncomment to install in colab
# !pip3 install fastquant

Backtest SMAC

fastquant offers a convenient way to backtest several trading strategies. To backtest using Simple Moving Average Crossover (SMAC), we do the following.

backtest('smac', dcv_data, fast_period=15, slow_period=40)

fast_period and slow_period are two SMAC parameters that can be changed depending on the user's preferences. A simple way to fine tune these parameters is to run backtest on a grid of values and find which combination of fast_period and slow_period yields the highest net profit.

First, we fetch JFC's historical data comprised of date, close price, and volume.

from fastquant import get_stock_data, backtest

symbol='JFC'
dcv_data = get_stock_data(symbol, 
                    start_date='2018-01-01', 
                    end_date='2020-04-28',
                    format='cv',
                   )
dcv_data.head()
849it [04:08,  4.04it/s]
close volume
dt
2018-01-03 255.4 745780
2018-01-04 255.0 617010
2018-01-05 255.0 946040
2018-01-08 256.0 840630
2018-01-09 255.8 978180
import matplotlib.pyplot as pl
pl.style.use("default")
from fastquant import backtest

results = backtest("smac", 
                   dcv_data, 
                   fast_period=15, 
                   slow_period=40, 
                   verbose=False, 
                   plot=True
                  )
===Global level arguments===
init_cash : 100000
buy_prop : 1
sell_prop : 1
===Strategy level arguments===
fast_period : 15
slow_period : 40
Final PnL: -31257.65
Time used (seconds): 0.10944700241088867
Optimal parameters: {'init_cash': 100000, 'buy_prop': 1, 'sell_prop': 1, 'execution_type': 'close', 'fast_period': 15, 'slow_period': 40}
Optimal metrics: {'rtot': -0.37480465562458976, 'ravg': -0.0006645472617457265, 'rnorm': -0.15419454966091925, 'rnorm100': -15.419454966091925, 'sharperatio': -0.9821454406209409, 'pnl': -31257.65, 'final_value': 68742.35499999995}

The plot above is optional. backtest returns a dataframe of parameters and corresponding metrics:

results.head()
init_cash buy_prop sell_prop execution_type fast_period slow_period rtot ravg rnorm rnorm100 sharperatio pnl final_value
0 100000 1 1 close 15 40 -0.374805 -0.000665 -0.154195 -15.419455 -0.982145 -31257.65 68742.355

Define the search space

Second, we specify the range of reasonable values to explore for fast_period and slow_period. Let's take between 1 and 20 trading days (roughly a month) in steps of 1 day for fast_period, and between 21 and 240 trading days (roughly a year) in steps of 5 days for slow_period.

import numpy as np

fast_periods = np.arange(1,20,1, dtype=int)
slow_periods = np.arange(20,241,5, dtype=int)

# make a grid of 0's (placeholder)
period_grid = np.zeros(shape=(len(fast_periods),len(slow_periods)))
period_grid.shape
(19, 45)

Third, we run backtest for each iteration over each pair of fast_period and slow_period, saving each time the net profit to the period_grid variable.

For now, we will perform this the long way, but in a later section, we will perform the backtest in only 3 lines of code using a built in functionality within fastquant's backtest function.

Note: Before running backtest over a large grid, try measuring how long it takes your machine to run one backtest instance.

%timeit
backtest(...)

In my machine with 8 cores, backtest takes

101 ms ± 8.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from time import time

init_cash=100000

start_time = time()
for i,fast_period in enumerate(fast_periods): 
    for j,slow_period in enumerate(slow_periods):
        results = backtest('smac', 
                           dcv_data, 
                           fast_period=fast_period,
                           slow_period=slow_period,
                           init_cash=100000,
                           verbose=False, 
                           plot=False
                          )
        net_profit = results.final_value.values[0]-init_cash
        period_grid[i,j] = net_profit
end_time = time()
time_basic = end_time-start_time
print("Basic grid search took {:.1f} sec".format(time_basic))
Basic grid search took 100.4 sec

Visualize the period grid

Next, we visualize period_grid as a 2D matrix.

import matplotlib.colors as mcolors
import matplotlib.pyplot as pl
pl.style.use("default")

fig, ax = pl.subplots(1,1, figsize=(8,4))
xmin, xmax = slow_periods[0],slow_periods[-1]
ymin, ymax = fast_periods[0],fast_periods[-1]

#make a diverging color map such that profit<0 is red and blue otherwise
cmap = pl.get_cmap('RdBu')
norm = mcolors.DivergingNorm(vmin=period_grid.min(), 
                             vmax = period_grid.max(), 
                             vcenter=0
                            )
#plot matrix
cbar = ax.imshow(period_grid, 
                 origin='lower', 
                 interpolation='none', 
                 extent=[xmin, xmax, ymin, ymax], 
                 cmap=cmap,
                 norm=norm
                )
pl.colorbar(cbar, ax=ax, shrink=0.9,
            label='net profit', orientation="horizontal")

# search position with highest net profit
y, x = np.unravel_index(np.argmax(period_grid), period_grid.shape)
best_slow_period = slow_periods[x]
best_fast_period = fast_periods[y]
# mark position
# ax.annotate(f"max profit={period_grid[y, x]:.0f}@({best_slow_period}, {best_fast_period}) days", 
#             (best_slow_period+5,best_fast_period+1)
#            )
ax.axvline(best_slow_period, 0, 1, c='k', ls='--')
ax.axhline(best_fast_period+0.5, 0, 1, c='k', ls='--')

# add labels
ax.set_aspect(5)
pl.setp(ax,
        xlim=(xmin,xmax),
        ylim=(ymin,ymax),
        xlabel='slow period (days)',
        ylabel='fast period (days)',
        title='JFC w/ SMAC',
       );
print(f"max profit={period_grid[y, x]:.0f} @ ({best_slow_period},{best_fast_period}) days")
max profit=7042 @ (105,3) days

From the plot above, there are only a few period combinations which we can guarantee non-negative net profit using SMAC strategy. The best result is achieved with (105,30) for period_slow and period_fast, respectively.

In fact SMAC strategy is so bad such that there is only 9% chance it will yield profit when using any random period combinations in our grid, which is smaller than the 12% chance it will yield break even at least.

percent_positive_profit=(period_grid>0).sum()/np.product(period_grid.shape)*100
percent_positive_profit
9.005847953216374
percent_breakeven=(period_grid==0).sum()/np.product(period_grid.shape)*100
percent_breakeven
12.397660818713451

Anyway, let's check the results of backtest using the best_fast_period and best_slow_period.

results = backtest('smac', 
                   dcv_data, 
                   fast_period=best_fast_period, 
                   slow_period=best_slow_period, 
                   verbose=True, 
                   plot=True
                  )
net_profit = results.final_value.values[0]-init_cash
net_profit
Starting Portfolio Value: 100000.00
===Global level arguments===
init_cash : 100000
buy_prop : 1
sell_prop : 1
===Strategy level arguments===
fast_period : 3
slow_period : 105
2018-08-22, BUY CREATE, 286.00
2018-08-22, Cash: 100000.0
2018-08-22, Price: 286.0
2018-08-22, Buy prop size: 346
2018-08-22, Afforded size: 346
2018-08-22, Final size: 346
2018-08-23, BUY EXECUTED, Price: 286.00, Cost: 98956.00, Comm 742.17
2018-09-12, SELL CREATE, 277.00
2018-09-13, SELL EXECUTED, Price: 277.00, Cost: 98956.00, Comm 718.81
2018-09-13, OPERATION PROFIT, GROSS -3114.00, NET -4574.98
2018-10-23, BUY CREATE, 268.00
2018-10-23, Cash: 95425.015
2018-10-23, Price: 268.0
2018-10-23, Buy prop size: 353
2018-10-23, Afforded size: 353
2018-10-23, Final size: 353
2018-10-24, BUY EXECUTED, Price: 268.00, Cost: 94604.00, Comm 709.53
2018-10-25, SELL CREATE, 270.00
2018-10-26, SELL EXECUTED, Price: 270.00, Cost: 94604.00, Comm 714.83
2018-10-26, OPERATION PROFIT, GROSS 706.00, NET -718.36
2018-10-30, BUY CREATE, 264.00
2018-10-30, Cash: 94706.66
2018-10-30, Price: 264.0
2018-10-30, Buy prop size: 355
2018-10-30, Afforded size: 355
2018-10-30, Final size: 355
2018-10-31, BUY EXECUTED, Price: 264.00, Cost: 93720.00, Comm 702.90
2019-04-17, SELL CREATE, 303.00
2019-04-22, SELL EXECUTED, Price: 303.00, Cost: 93720.00, Comm 806.74
2019-04-22, OPERATION PROFIT, GROSS 13845.00, NET 12335.36
Final PnL: 7042.02
==================================================
**************************************************
--------------------------------------------------
{'init_cash': 100000, 'buy_prop': 1, 'sell_prop': 1, 'execution_type': 'close', 'fast_period': 3, 'slow_period': 105}
OrderedDict([('rtot', 0.06805130501900258), ('ravg', 0.00012065834223227409), ('rnorm', 0.03087288265827186), ('rnorm100', 3.087288265827186)])
OrderedDict([('sharperatio', 0.7850452330792583)])
Time used (seconds): 0.11643362045288086
Optimal parameters: {'init_cash': 100000, 'buy_prop': 1, 'sell_prop': 1, 'execution_type': 'close', 'fast_period': 3, 'slow_period': 105}
Optimal metrics: {'rtot': 0.06805130501900258, 'ravg': 0.00012065834223227409, 'rnorm': 0.03087288265827186, 'rnorm100': 3.087288265827186, 'sharperatio': 0.7850452330792583, 'pnl': 7042.02, 'final_value': 107042.0225}
7042.022500000006

There are only 6 cross-over events of which only the latest transaction yielded positive gains resulting to a 7% net profit. Is 7% profit over a ~two-year baseline better than the market benchmark?

Built-in grid search in fastquant with only 3 lines of code!

The good news is backtest provides a built-in grid search if strategy parameters are lists. Let's re-run backtest with a grid we used above.

Note that "3 lines" here refers to the fastquant import, the data pull (to generate dcv_data), and actually running the backtest function.

from fastquant import backtest

start_time = time()
results = backtest("smac", 
                   dcv_data, 
                   fast_period=fast_periods, 
                   slow_period=slow_periods, 
                   verbose=False, 
                   plot=False
                  )
end_time = time()
time_optimized = end_time-start_time
print("Optimized grid search took {:.1f} sec".format(time_optimized))
Optimized grid search took 95.6 sec

results is automatically ranked based on rnorm which is a proxy for performance. In this case, the best fast_period,slow_period=(8,200) d.

The returned parameters are should have len(fast_periods)xlen(slow_periods) (19x45=855 in this case).

results.shape
(855, 13)
results.head()
init_cash buy_prop sell_prop execution_type fast_period slow_period rtot ravg rnorm rnorm100 sharperatio pnl final_value
0 100000 1 1 close 3 105 0.068051 0.000121 0.030873 3.087288 0.785045 7042.02 107042.0225
1 100000 1 1 close 8 205 0.037052 0.000066 0.016693 1.669276 0.085703 3774.66 103774.6555
2 100000 1 1 close 11 170 0.037052 0.000066 0.016693 1.669276 0.085703 3774.66 103774.6555
3 100000 1 1 close 8 200 0.037052 0.000066 0.016693 1.669276 0.085703 3774.66 103774.6555
4 100000 1 1 close 9 95 0.033320 0.000059 0.014999 1.499897 0.092115 3388.12 103388.1175

Now, we recreate the 2D matrix before, but this time using scatter plot.

fig, ax = pl.subplots(1,1, figsize=(8,4))

#make a diverging color map such that profit<0 is red and blue otherwise
cmap = pl.get_cmap('RdBu')
norm = mcolors.DivergingNorm(vmin=period_grid.min(), 
                             vmax = period_grid.max(), 
                             vcenter=0
                            )
#plot scatter
results['net_profit'] = results['final_value']-results['init_cash']
df = results[['slow_period','fast_period','net_profit']]
ax2 = df.plot.scatter(x='slow_period', y='fast_period', c='net_profit',
                     norm=norm, cmap=cmap, ax=ax
                    )
ymin,ymax = df.fast_period.min(), df.fast_period.max()
xmin,xmax = df.slow_period.min(), df.slow_period.max()

# best performance (instead of highest profit)
best_fast_period, best_slow_period, net_profit = df.loc[0,['fast_period','slow_period','net_profit']]
# mark position
# ax.annotate(f"max profit={net_profit:.0f}@({best_slow_period}, {best_fast_period}) days", 
#             (best_slow_period-100,best_fast_period+1), color='r'
#            )
ax.axvline(best_slow_period, 0, 1, c='r', ls='--')
ax.axhline(best_fast_period+0.5, 0, 1, c='r', ls='--')

ax.set_aspect(5)
pl.setp(ax,
        xlim=(xmin,xmax),
        ylim=(ymin,ymax),
        xlabel='slow period (days)',
        ylabel='fast period (days)',
        title='JFC w/ SMAC',
       );

# fig.colorbar(ax2, orientation="horizontal", shrink=0.9, label='net profit')
print(f"max profit={net_profit:.0f} @ ({best_slow_period},{best_fast_period}) days")
max profit=7042 @ (105.0,3.0) days

Note also that built-in grid search in backtest is optimized and slightly faster than the basic loop-based grid search.

#time
time_basic/time_optimized
1.0497938879462785

Final notes

While it is tempting to do a grid search over larger search space and finer resolutions, it is computationally expensive, inefficient, and prone to overfitting. There are better methods than brute force grid search which we will tackle in the next example.

As an exercise, it is good to try the following:

  • Use different trading strategies and compare their results
  • Use a longer data baseline