Testing a Strategy

Testing has the purpose to determine a strategy's profit expectancy in live trading. It is not possible to calculate this with some algorithm or formula; the only way to find out it really trading it for a couple years. A proxy of this is the backtest: simulating the strategy with a couple years of historical price data. Problem is that a simple backtest will normally not produce a result that is representative for live trading. Aside from overfitting and other sorts of bias, backtest results differ from live trading in various ways (see backtest realism) that should be taken into account. Thus, testing a strategy is surprisingly complex.

Testing a script with Zorro

For quickly testing a strategy, just select the script and click [Test]. Depending on the strategy, a test can be finished in a second, or it can run for several minutes on complex portfolio strategies. If the test requires strategy parameters, capital allocation factors, or machine-learned trade rules, they must be generated in a [Train] run before. Otherwise Zorro will complain with an error message that the parameters or factors file or the rules code was not found.

If the script needs to be recompiled, or if it contains any global or static variables that must be reset, click [Edit] before [Test]. This also resets the sliders to their default values. Otherwise the static and global variables and the sliders keep their settings from the previous test run.

If the test needs historical price data that is not available, a dialog will pop up and propose to download the missing data set from the broker. Downloaded price data is stored in the History folder in the way described under Price History. If the broker does not offer price data for all years in the test period, substitute the missing years manually with files of length 0 to prevent the download dialog. No trades take then place during those years.

All test results - performance report, log file, trade list, chart - are stored in the Log folder. If the folder is cluttered with too many old log files and chart images, every 60 days a dialog will pop up and propose to automatically delete files that are older than 30 days. The minimum number of days can be set up in the Zorro.ini configuration file.

The test simulates a real broker account with a given leverage, spread, rollover, commission, and other asset parameters. By default, a microlot account with 100:1 leverage is simulated. If your account is very different, get your actual account data from the broker API as described under Download, or enter it manually. Simulated slippage, spread, rollover, commission, or pip cost can alternatively be set up in the script. If the NFA flag is set, either directly in the script or through the NFA parameter of the selected account, the simulation runs in NFA mode.

The test is not a real historical simulation. It rather simulates trades as if they were entered today, but with a historical price curve. For a real historical simulation, the spread, pip costs, rollovers and other asset and account parameters had to be changed during the simulation according to their historical values. This can indeed be done by script, but is normally not recommended for strategy testing because it would add artifacts to the results.

The test runs through one or many sample cycles, either with the full historical price data set, or with out-of-sample subsets. It generates a number of equity curves that are then used for a Monte Carlo Analysis of the strategy. The optional Walk Forward Analysis applies different parameter sets for staying always out-of-sample. Several switches (see Mode) affect test and data selection. During the test, the progress bar indicates the current position within the test period. The lengths of its green and red area display the current gross win and gross loss of the strategy. The result window below shows some information about the current profit situation, in the following form:

 3:   +8192 +256  214/299  

3: Current oversampling cycle (if any).
+8192 Current balance.
+256 Current value of open trades.
214 Number of winning trades so far.
/299 Number of losing trades so far.

After the test, a performance report and - if the LOGFILE flag was set - a log file is generated in the Log folder. The result window displays the annual return (if any) in percent of the required capital, and the annual gain/loss in pips. Note that due to different pip costs of the assets, a portfolio strategy can end with a negative pip value even when the annual return is positive, or vice versa.

After the test, the info window displays the annual return in percent and in pips, and the message window contains a short version of the performance report. The content of the message window can be copied to the clipboard by double clicking on it. The following performance figures are displayed (for details see performance report):

Median AR Annual return by Monte Carlo analysis at 50% confidence level.
Profit Total profit of the system in units of the account currency.
MI Average monthly income of the system in units of the account currency.
DD Maximum balance-equity drawdown.
Capital Required initial capital.
Trades Number of trades in the backtest period.
Win Percentage of winning trades.
Avg Average profit/loss of a trade in pips.
Bars Average number of bars of a trade. Fewer bars mean less exposure to risk.
PF Profit factor, gross win divided by gross loss.
SR Sharpe ratio. Should be > 1 for good strategies.
UI Ulcer index, the average drawdown percentage. Should be < 10% for ulcer prevention.
R2 Determination coefficient, the equity curve linearity. Should be close to 1 for good strategies.
AR Annual return of the simulation, for non-reinvesting systems.
CAGR Compound annual growth rate, for reinvesting systems.

If [Result] is clicked after the test, the performance sheet and the trades & equity chart are displayed (see performance).

Single step mode

For debugging the trade behavior, a test can be run in single step mode by setting the STEPWISE flag. In this mode execution pauses after every bar, and the buttons change their behavior. [Step] moves one bar forward, [Skip] moves to the next bar at which a trade opens or closes. A HTML browser window will pop up and display the current chart and open trade status on every step. The window is refreshed every two seconds.

Single Step Debugging

The stepwise change of variables and indicators can be made visible either in the message window with a watch statement, in the browser window with print(TO_HTML, ...), or on the chart with plot. Setting PlotBars to a negative value displays only the last part of the chart, f.i. -300 displays the last 300 bars. For debugging single loops or function calls, watch ("!...", ...) statements can be used.

Stepwise debugging normally begins at the end of the LookBack period. For beginning at a certain date or bar number, set the STEPWISE flag dependent on a condition, f.i. if(date() >= 20150401) set(STEPWISE);.

Backtest result files

When the LOGFILE flag is set, a backtest generates the following files for examining the trade behavior or for further evaluation:

The file formats are described under Export. The optional asset name is appended when the script does not call asset, but uses the asset selected with the scrollbox. When LogNumber is set, the number is appended to the script name, which allows comparing logs from the current and previous backtests. When [Result] is clicked, the chart image and the plot.csv are generated again for the asset selected from the scrollbox. The chart is opened in the chart viewer and the log and performance report are opened in the editor.

Backtest accuracy - TICKS, M1, T1, HFT..

The required price history resolution, and whether using TICKS mode or not, depends on the strategy to be tested. As a rule of thumb, the duration of the shortest trade should cover at least 20 prices in the historical data. For long-term systems, such as options trading or portfolio rebalancing, daily data as provided from Google™ or Quandl™ is normally sufficient. Day trading strategies need historical data with 1-minute resolution (M1 data, .t6 files) that is freely available from many brokers. If entry or exit limits are used and the trades have only a duration of a few bars, TICKS mode is mandatory. Scalping strategies that open or close trades in minutes require high resolution price history - normally T1 data in .t1 files. For backtesting high frequency trading systems that must react in microseconds, you'll need order book or BBO (Best Bid and Offer) price quotes with exchange time stamps, available from data vendors in CSV or NxCore tape format. An example of how to test a HFT system can be found on Financial Hacker. For testing with T1, BBO, or NxCore data, Zorro S is required.

The .t6 and .t1 file formats are described in the Price History chapter. In T1 data, any tick represents a price quote. In M1 data, a tick represents one minute. Since the price tick resolution of .t1 and .t6 files is not fixed, .t1 files can theoretically contain M1 data and .t6 files can contain T1 data - but normally it's the other way around. Often many price quotes arrive in a single second, therefore T1 data contains a lot more price ticks than M1 data. Using T1 data can have a strong effect on the backtest realism, especially when the strategy uses short time frames or small price differences. Trade management functions (TMFs) are called more often, the tick function is executed more often, and trade entry and exit conditions are simulated with higher precision.

For using T1 historical price data, set the History string in the script to the ending of the T1 files (usually History = ".t1"). Zorro will then load its price history from .t1 files. Make sure that you have downloaded the required files before, either with the Download script, or from the Zorro download page. Special streaming data formats such as NxCore can be directly read by script with the priceQuote function. An example can be found under Fill mode.

A backtest with T1 data in TICKS mode takes a long time and needs a lot of memory. Due to the high memory requirement, not more than one or two years can be backtested with T1 data. Even with non-scalping strategies T1 backtests can produce different results than M1 backtests. Differences arise from the different composition of M1 ticks on the broker's price server, and from different trade entry and exit prices. For instance, a trade at 15:00:01 would be entered at the first price quote after 15:00:01 in T1 mode, but at the close of the 15:00:00 M1 tick (or the open of the 15:01:00 tick) in M1 mode. Therefore M1 entry/exit prices can be off by up to one minute compared to T1 prices. If this difference is relevant for your strategy, test it with T1 data. The TickFix variable can shift M1 ticks forward or backward in time and test their effect on the result.

Backtest realism

Zorro backtests are as close to real trading as possible, especially when the TICKS flag is set, which causes trade functions and stop, profit, or entry limits to be evaluated at every price tick in the historical data. Theoretically a backtest should generate precisely the same trades and return the same profit or loss as live trading the script during the same time period. This is largely the case, but the following effects can cause differences:

The likeliness that the strategy exploits real inefficiencies depends on in which way it was developed and optimized. There are many factors that can cause bias in the test result. Curve fitting bias affects all strategies that use the same price data set for test and training. It generates largely too optimistic results and the illusion that a strategy is profitable when it isn't. Peeking bias is caused by letting knowledge of the future affect the trade algorithm. An example is calculating trade volume with capital allocation factors (OptimalF) that are generated from the whole test data set (always test at first with fixed lot sizes!). Data mining bias (or selection bias) is caused not only by data mining, but already by the mere act of developing a strategy with historical price data, since you will selecting the most profitable algorithm or asset dependent on test results. Trend bias affects all 'asymmetric' strategies that use different algorithms, parameters, or capital allocation factors for long and short trades. For preventing this, detrend the trade signals or the trade results. Granularity bias is a consequence of different price data resolution in test and in real trading. For reducing it, use the TICKS flag, especially when a trade management function is used. Sample size bias is the effect of the test period length on the results. Values derived from maxima and minima - such as drawdown - are usually proportional to the square root of the number of trades. This generates more pessimistic results on large test periods.

See also:

training, trading, mode, performance, troubleshooting

 

► latest version online