June 2009

The “2 Trades a Day” trading system is a commercially available system for day-trading futures on the Dow Jones Industrial Average. The vendor provides some characterization of the system’s profits, but provides little or no characterization of its risks. This analysis uses a statistical bootstrap to fill that gap, mostly focusing on maximum drawdown. Both a simple bootstrap and a time-series bootstrap are used. The principle finding is that the likely confidence interval for the true maximum drawdown is between $575 and $1,800, trading one futures contract.

There are numerous vendors of securities trading systems which purport to be the gateway to riches. Supposedly, a customer can purchase the system, follow its buy and sell signals, and become fabulously wealthy without working. The vast majority of these vendors, however, offer little more than anecdotal evidence of the system’s efficacy.

One notable exception is Jason Hale who sells the “2 Trades a Day” system through his web site. The site contains an enumeration of every trade from July 2008 to the present, including the profit and loss (P&L) for each trade. This creates the possibility of analysis of his results.

The system is profitable, according to his records, and the web site proudly trumpets that claim. But there is essentially no characterization of the risk incurred, which is unfortunate since no wise customer would purchase the system without knowing the risk. The only information provided by Mr. Hale is that he recommends maintaining $12,000 in trading capital for every one contract traded.

A *trading system* is an algorithm for buying and
selling securities. It can be executed by humans or machines, but the point is
to be profitable in a mechanical way. Because the system is mechanical, it
should have predictable results (in the statistical sense), unlike
judgment-based systems which could produce erratic results.

There are two key metrics for any mechanical trading system: the expected profit, and the expected risk. Obviously, we prefer more profit and less risk. We cannot predict either metric with certainty, but we can use past results as a guide to expected future results, especially for a mechanical system.

There are several simple, effective ways to measure
profitability. Measuring risk is trickier. Theoreticians use *variance of
returns* as their metric, which is pleasant because it is familiar and leads
to closed-form solutions for some problems. Variance is not especially useful
for practitioners, however. They must be concerned with worst-case scenarios.
Variance is useful for worst-case analysis only if the exact distribution of
returns is known – which it is not.

One useful risk measure is *maximum drawdown*. Simple *drawdown*
is the peak-to-valley distance in the trader’s account balance. If an account
falls from $1,000,000 to $900,000, the trader is experiencing a drawdown of
$100,000 or 10%. *Maximum drawdown* is the largest drawdown experienced
by the trader over some period of time.

The key advantage of knowing one’s maximum drawdown is to
avoid *gambler’s ruin*, the exhaustion of one’s trading capital. If we
have a reasonable expectation for maximum drawdown, we can correctly anticipate
our capital requirements and avoid running out of money. For example, suppose
a futures trader knows these facts about his or her trading system and
brokerage account:

Expected drawdown |
$10,000 |

Margin requirement |
$6,000 |

Minimum account size |
$2,000 |

Total |
$18,000 |

Mathematically speaking, the absolute minimum capital required is $18,000. Below that level of capital, the trader might be forced to stop trading during a severe drawdown because he or she could either not meet the margin requirement or not meet the minimum account size.

(Here, I am speaking only mathematically. There is a critical
psychological consideration, too. An experienced trader would begin with more
than $18,000, knowing that reaching the edge of capital exhaustion can be
mentally crippling. The size of the needed cushion depends upon the trader’s
level of *risk aversion*.)

Knowing one’s expected maximum drawdown may be quite useful, but it is difficult to characterize because its distribution is unknown. This paper will construct a reasonable confidence interval for the maximum drawdown of the “2 Trades A Day” system, using a non-parametric bootstrap.

The data is available from
the vendor’s web site, http://2tradesaday.com,
by clicking on the *Results* tab.

The data is a series of observations on three variables, one observation for each trade recommended by the system.

- Date
- Delta: The trade’s profit or loss (P&L) – that is, change
in the account balance – expressed as the
*number of futures points*generated by the trade, positive for a profitable trade and negative for an unprofitable trade. - Hypothetical: A flag for those results which are hypothetical, not real.

The vast majority of the observations are for real trades. On some days, however, the vendor was unable to perform actual trading, so hypothetical results are reported instead.

The data spans July 1, 2008, through May 29, 2009 (approximately 230 trading days), giving a total of 338 observations. Of those, 29 (8.6%) are hypothetical results.

There is typically one trade per calendar day, but some days have zero, two, or even three trades (despite the system’s name).

I scaled the deltas (P&L numbers) by the *dollars per
futures point* ($5), so that the deltas are expressed in dollars, not points,
which I find more natural. All numbers reported below are in dollars.

I also adjusted the deltas for *frictional costs*: fees
and slippage associated with executing futures trades. They are

- Brokerage fees – I assumed a brokerage fee of $4.26 per round-trip execution, which is the fee changed by Interactive Brokers, a large discount brokerage.
- Bid/ask spread – Since the Dow futures are quite liquid, I assumed the difference between bid and ask prices was only 1 point, or $5.
- Execution slippage – I assumed one cannot always obtain favorable trade execution, so I deducted one bid/ask spread to account for likely slippage.

The net adjustment to each trade was $4.26 + $5 + $5 = $14.26 deducted from each observation.

All results reported below are for the *adjusted*
deltas, not the original P&L numbers.

The deltas have a mean of 55.22, with a 95% confidence interval of (36.64, 73.80). Their standard deviation is 173.7, with this quartile summary:

Min. 1st Qu. Median Mean 3rd Qu. Max.

-454.30 -9.26 65.74 55.22 135.70 1111.00

The following statistics are of interest to a trader.

Prob. of winner: 0.6331361

Mean winner: 147.0250

Median winner: 110.74

Mean loser: -103.2116

Median loser: -96.76

On paper, the system looks great. Each trade has a 63% probability of being profitable, and the average profit is comfortably larger than the average loss. This is good news but, again, we are interested in characterizing the risk rather than the potential profits.

The distribution of deltas is mound-shaped, but uneven.

This time-series plot illustrates that the median is clearly above zero, but there is bunching of large values on the left.

A valuable graphic is the *equity graph*, which shows the
cumulative deltas, or the account balance over time.

An important observation is that the equity graph is
relatively steep on the left and relatively flat on the right. This is
confirmed by a *t* test of the first-half deltas versus the second-half
deltas.

Welch Two Sample t-test

data: deltas[1:mid] and deltas[(mid + 1):nDeltas]

t = 3.4248, df = 243.908, p-value = 0.0007217

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

27.06263 100.33382

sample estimates:

mean of x mean of y

87.07136 23.37314

The confidence interval for their difference is between 27.06 and 100.33, so the two time periods may have different distributions.

Checking the Dow Jones Industrial Average itself shows the index was substantially less volatile during the second-half time period than the first-half period. That difference alone could explain the shift in the trading system results, since trading systems such as this depend on market volatility to be profitable.

This graph illustrates the observed drawdown in the original data.

The observed *maximum* drawdown is $1,012.78, which
occurred in November 2008.

Because the deltas are an (irregular) time series, we need to check for a possible dependence structure. There is something odd in the ACF. The ACF for the entire series shows some mild autocorrelations at lags 1 and 4.

The ACF diagrams for the first- and second-half data, however, show no significant autocorrelation.

Given this ambiguous evidence, I performed two bootstraps: a simple bootstrap assuming no dependence structure; and a time-series bootstrap with fixed-block resampling.

I began with a basic bootstrap of maximum drawdown, using *R*
= 10,000 and α = 0.05. This yielded these bootstrap statistics

Bootstrap Statistics :

original bias std. error

t1* 1012.78 -24.82571 292.8265

and these typical confidence interval results.

Intervals :

Level Basic Percentile

95% ( 306, 1449 ) ( 576, 1791 )

The discrepancy between *basic* and *percentile*
confidence intervals led me to examine the histogram of the bootstrap
replicates.

The skew could explain the discrepancy, so I repeated the
bootstrap but on the *logarithm* of the maximum drawdown, which yielded
these bootstrap results.

Bootstrap Statistics :

original bias std. error

t1* 6.920454 -0.06872079 0.2760459

Intervals :

Level Normal Basic Percentile

95% ( 6.448, 7.530 ) ( 6.412, 7.483 ) ( 6.358, 7.429 )

Notice that the intervals are in better agreement, even with the normal approximation. Translating the percentile interval back to the original scale gives a confidence interval of (631.6, 1863.5), which resembles the original (non-logarithm) percentile interval, but is slightly larger.

Based on this confirmation of the original percentile interval, and based on my sense that the percentile interval better reflects practical considerations, I focused on percentile intervals for the remainder of this analysis.

I also performed a time-series bootstrap, due to the ambiguous evidence regarding autocorrelation within the time series of deltas. This was a fixed-length block resample bootstrap with a block size of 5 days (1 week). I chose this block size due to the presence of a small but significant autocorrelation at lag 4.

The time-series bootstrap yielded these results.

Bootstrap Statistics :

original bias std. error

t1* 1012.78 42.91658 311.7965

Intervals :

Level Percentile

95% ( 587, 1799 )

The reported confidence interval is (587, 1799), which is nearly identical to the simple (non-time-series) result of (576, 1791).

A random-length block resampling bootstrap produced very similar results, using a geometric distribution with a mean of 5 days (1 week).

My exploratory data analysis revealed other interesting facts which are not fully documented here. I mention them in passing in case the reader finds them useful.

- Based on the differences between the first-half and second-half data, I performed a weighted bootstrap using a simple linear weighting to give preference to recent observations. The results were essentially identical to the (unweighted) simple bootstrap and the time-series bootstrap.
- I performed a bootstrap of the
*risk of ruin*statistic; that is, the probability that a trader will exhaust his or her capital trading this system. The calculated probability was always zero, however, due to the system’s good profitability, so the bootstrap did not produce a useful result. - I fit the bootstrap replications of maximum drawdown to
several distributions, hoping to characterize them parametrically. No
distribution gave a satisfactory fit. I considered the log Normal, Gamma, χ
^{2}, and Poisson distributions. So the non-parametric bootstrap remains my only reasonable alternative. - The fact that the vendor mixed hypothetical results with
actual results raises the specter that the hypothetical values are
artificially inflated. A
*t*test of actual*versus*hypothetical results, however, show the average hypothetical result is substantially*worse*that the average actual result.

A reasonable confidence interval for the maximum drawdown is ($575, $1800) over the 11 month period, regardless of whether we use a simple bootstrap or a time-series bootstrap.

This estimate is based only on historical data, obviously.
A pretty good “rule of thumb” is to *double* the historical risk, and
trade as if that was one’s expectation. That rule yields a heuristic
confidence interval of ($1150, $3600) for the maximum drawdown.

The margin (performance bond) dictated by the exchange is $6,500 per contract (as of June 2009). If we add our heuristic upper bound of $3,600 to the margin requirement of $6,500, we get a minimum trading capital of $10,100, so Mr. Hale’s estimated requirement of $12,000 capitalization per contract is reasonable.

The most troubling limitation of this study is the short time-span of the available data. 338 observations may seem sufficient for a statistical study, but experienced customers of trading systems prefer 5 or more years of historical data. Market conditions change. Seeing the system’s performance under multiple conditions increases our confidence in its robustness.

We ignored the peculiar difference between the first-half data versus the second-half data, but that could point to a serious non-stationarity in the data which, in turn, might invalidate our results.

The differences between first-half data and second-half data indicate a need to study the halves separately and characterize the difference. This could yield valuable insights into the evolution of the system’s behavior under changing market conditions.

The time-series bootstrap used fixed-block resampling. The ACF points to a possible ARMA model for the deltas, and the shifting variance points to a possible GARCH model. Constructing those models would let us perform model-based resampling, possibly with improved confidence intervals.

Recall that some days had multiple trades. It is very plausible to me that same-day trades are interdependent, not independent, but I did not study this possibility. If true, the bootstrap simulation should be revised to reflect this dependence structure.

library(boot)

alpha <- 0.05

R <- 10000

blklen <- 5 # Block length for TS bootstrap

ONE_WAY_FEE <- 2.13 # IB one-way broker's fee

BID_ASK_SPRD <- 5 # E-mini bid/ask spread

ONE_WAY_SLIP <- BID_ASK_SPRD/2 # Assumed one-way slipage

load.data <- function() {

tbl <- read.csv("data.csv", header=T)

tbl$Date <- as.Date(tbl$Date, format="%m/%d/%Y")

tbl$Hypothetical <- as.logical(tbl$Hypothetical)

cat("Number of observations:", NROW(tbl), "\n")

cat("Oldest observation:", format(head(tbl$Date,1)), "\n")

cat("Latest observation:", format(tail(tbl$Date,1)), "\n")

return(tbl)

}

hale <- load.data()

gross.deltas <- hale$Ticks * 5 # Scale by dollars per con.

hypo <- hale$Hypothetical # Boolean: True for hypo. results

nDeltas <- length(deltas)

# Adjust gross profit for frictional costs

#

deltas <- gross.deltas - 2*ONE_WAY_FEE - BID_ASK_SPRD - 2*ONE_WAY_SLIP

# Simple drawdown

#

drawdown <- function(v) {

s <- cumsum(v)

return(cummax(s) - s)

}

# Maximum drawdown statistic for bootstrap

#

max.dd.stat <- function(v, ivec, window) {

samp <- v[ivec]

max(drawdown(samp))

}

max.dd.boot <- boot(deltas, max.dd.stat, R)

cat("\n***** Bootstrap results for maximum drawdown:\n")

print(max.dd.boot)

print(boot.ci(max.dd.boot, conf=1-alpha, type=c("basic", "perc")))

max.dd.tsboot <- tsboot(deltas, max.dd, R, l=blklen, sim="fixed")

cat("\n***** Time-series bootstrap for 1-week blocks:\n")

print(max.dd.tsboot)

print(boot.ci(max.dd.tsboot, conf=1-alpha, type="perc"))