The “2 Trades a Day” trading system is a commercially available system for day-trading futures on the Dow Jones Industrial Average. The vendor provides some characterization of the system’s profits, but provides little or no characterization of its risks. This analysis uses a statistical bootstrap to fill that gap, mostly focusing on maximum drawdown. Both a simple bootstrap and a time-series bootstrap are used. The principle finding is that the likely confidence interval for the true maximum drawdown is between $575 and $1,800, trading one futures contract.
There are numerous vendors of securities trading systems which purport to be the gateway to riches. Supposedly, a customer can purchase the system, follow its buy and sell signals, and become fabulously wealthy without working. The vast majority of these vendors, however, offer little more than anecdotal evidence of the system’s efficacy.
One notable exception is Jason Hale who sells the “2 Trades a Day” system through his web site. The site contains an enumeration of every trade from July 2008 to the present, including the profit and loss (P&L) for each trade. This creates the possibility of analysis of his results.
The system is profitable, according to his records, and the web site proudly trumpets that claim. But there is essentially no characterization of the risk incurred, which is unfortunate since no wise customer would purchase the system without knowing the risk. The only information provided by Mr. Hale is that he recommends maintaining $12,000 in trading capital for every one contract traded.
A trading system is an algorithm for buying and selling securities. It can be executed by humans or machines, but the point is to be profitable in a mechanical way. Because the system is mechanical, it should have predictable results (in the statistical sense), unlike judgment-based systems which could produce erratic results.
There are two key metrics for any mechanical trading system: the expected profit, and the expected risk. Obviously, we prefer more profit and less risk. We cannot predict either metric with certainty, but we can use past results as a guide to expected future results, especially for a mechanical system.
There are several simple, effective ways to measure profitability. Measuring risk is trickier. Theoreticians use variance of returns as their metric, which is pleasant because it is familiar and leads to closed-form solutions for some problems. Variance is not especially useful for practitioners, however. They must be concerned with worst-case scenarios. Variance is useful for worst-case analysis only if the exact distribution of returns is known – which it is not.
One useful risk measure is maximum drawdown. Simple drawdown is the peak-to-valley distance in the trader’s account balance. If an account falls from $1,000,000 to $900,000, the trader is experiencing a drawdown of $100,000 or 10%. Maximum drawdown is the largest drawdown experienced by the trader over some period of time.
The key advantage of knowing one’s maximum drawdown is to avoid gambler’s ruin, the exhaustion of one’s trading capital. If we have a reasonable expectation for maximum drawdown, we can correctly anticipate our capital requirements and avoid running out of money. For example, suppose a futures trader knows these facts about his or her trading system and brokerage account:
Minimum account size
Mathematically speaking, the absolute minimum capital required is $18,000. Below that level of capital, the trader might be forced to stop trading during a severe drawdown because he or she could either not meet the margin requirement or not meet the minimum account size.
(Here, I am speaking only mathematically. There is a critical psychological consideration, too. An experienced trader would begin with more than $18,000, knowing that reaching the edge of capital exhaustion can be mentally crippling. The size of the needed cushion depends upon the trader’s level of risk aversion.)
Knowing one’s expected maximum drawdown may be quite useful, but it is difficult to characterize because its distribution is unknown. This paper will construct a reasonable confidence interval for the maximum drawdown of the “2 Trades A Day” system, using a non-parametric bootstrap.
The data is available from the vendor’s web site, http://2tradesaday.com, by clicking on the Results tab.
The data is a series of observations on three variables, one observation for each trade recommended by the system.
The vast majority of the observations are for real trades. On some days, however, the vendor was unable to perform actual trading, so hypothetical results are reported instead.
The data spans July 1, 2008, through May 29, 2009 (approximately 230 trading days), giving a total of 338 observations. Of those, 29 (8.6%) are hypothetical results.
There is typically one trade per calendar day, but some days have zero, two, or even three trades (despite the system’s name).
I scaled the deltas (P&L numbers) by the dollars per futures point ($5), so that the deltas are expressed in dollars, not points, which I find more natural. All numbers reported below are in dollars.
I also adjusted the deltas for frictional costs: fees and slippage associated with executing futures trades. They are
The net adjustment to each trade was $4.26 + $5 + $5 = $14.26 deducted from each observation.
All results reported below are for the adjusted deltas, not the original P&L numbers.
The deltas have a mean of 55.22, with a 95% confidence interval of (36.64, 73.80). Their standard deviation is 173.7, with this quartile summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-454.30 -9.26 65.74 55.22 135.70 1111.00
The following statistics are of interest to a trader.
Prob. of winner: 0.6331361
Mean winner: 147.0250
Median winner: 110.74
Mean loser: -103.2116
Median loser: -96.76
On paper, the system looks great. Each trade has a 63% probability of being profitable, and the average profit is comfortably larger than the average loss. This is good news but, again, we are interested in characterizing the risk rather than the potential profits.
The distribution of deltas is mound-shaped, but uneven.
This time-series plot illustrates that the median is clearly above zero, but there is bunching of large values on the left.
A valuable graphic is the equity graph, which shows the cumulative deltas, or the account balance over time.
An important observation is that the equity graph is relatively steep on the left and relatively flat on the right. This is confirmed by a t test of the first-half deltas versus the second-half deltas.
Welch Two Sample t-test
data: deltas[1:mid] and deltas[(mid + 1):nDeltas]
t = 3.4248, df = 243.908, p-value = 0.0007217
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
mean of x mean of y
The confidence interval for their difference is between 27.06 and 100.33, so the two time periods may have different distributions.
Checking the Dow Jones Industrial Average itself shows the index was substantially less volatile during the second-half time period than the first-half period. That difference alone could explain the shift in the trading system results, since trading systems such as this depend on market volatility to be profitable.
This graph illustrates the observed drawdown in the original data.
The observed maximum drawdown is $1,012.78, which occurred in November 2008.
Because the deltas are an (irregular) time series, we need to check for a possible dependence structure. There is something odd in the ACF. The ACF for the entire series shows some mild autocorrelations at lags 1 and 4.
The ACF diagrams for the first- and second-half data, however, show no significant autocorrelation.
Given this ambiguous evidence, I performed two bootstraps: a simple bootstrap assuming no dependence structure; and a time-series bootstrap with fixed-block resampling.
I began with a basic bootstrap of maximum drawdown, using R = 10,000 and α = 0.05. This yielded these bootstrap statistics
Bootstrap Statistics :
original bias std. error
t1* 1012.78 -24.82571 292.8265
and these typical confidence interval results.
Level Basic Percentile
95% ( 306, 1449 ) ( 576, 1791 )
The discrepancy between basic and percentile confidence intervals led me to examine the histogram of the bootstrap replicates.
The skew could explain the discrepancy, so I repeated the bootstrap but on the logarithm of the maximum drawdown, which yielded these bootstrap results.
Bootstrap Statistics :
original bias std. error
t1* 6.920454 -0.06872079 0.2760459
Level Normal Basic Percentile
95% ( 6.448, 7.530 ) ( 6.412, 7.483 ) ( 6.358, 7.429 )
Notice that the intervals are in better agreement, even with the normal approximation. Translating the percentile interval back to the original scale gives a confidence interval of (631.6, 1863.5), which resembles the original (non-logarithm) percentile interval, but is slightly larger.
Based on this confirmation of the original percentile interval, and based on my sense that the percentile interval better reflects practical considerations, I focused on percentile intervals for the remainder of this analysis.
I also performed a time-series bootstrap, due to the ambiguous evidence regarding autocorrelation within the time series of deltas. This was a fixed-length block resample bootstrap with a block size of 5 days (1 week). I chose this block size due to the presence of a small but significant autocorrelation at lag 4.
The time-series bootstrap yielded these results.
Bootstrap Statistics :
original bias std. error
t1* 1012.78 42.91658 311.7965
95% ( 587, 1799 )
The reported confidence interval is (587, 1799), which is nearly identical to the simple (non-time-series) result of (576, 1791).
A random-length block resampling bootstrap produced very similar results, using a geometric distribution with a mean of 5 days (1 week).
My exploratory data analysis revealed other interesting facts which are not fully documented here. I mention them in passing in case the reader finds them useful.
A reasonable confidence interval for the maximum drawdown is ($575, $1800) over the 11 month period, regardless of whether we use a simple bootstrap or a time-series bootstrap.
This estimate is based only on historical data, obviously. A pretty good “rule of thumb” is to double the historical risk, and trade as if that was one’s expectation. That rule yields a heuristic confidence interval of ($1150, $3600) for the maximum drawdown.
The margin (performance bond) dictated by the exchange is $6,500 per contract (as of June 2009). If we add our heuristic upper bound of $3,600 to the margin requirement of $6,500, we get a minimum trading capital of $10,100, so Mr. Hale’s estimated requirement of $12,000 capitalization per contract is reasonable.
The most troubling limitation of this study is the short time-span of the available data. 338 observations may seem sufficient for a statistical study, but experienced customers of trading systems prefer 5 or more years of historical data. Market conditions change. Seeing the system’s performance under multiple conditions increases our confidence in its robustness.
We ignored the peculiar difference between the first-half data versus the second-half data, but that could point to a serious non-stationarity in the data which, in turn, might invalidate our results.
The differences between first-half data and second-half data indicate a need to study the halves separately and characterize the difference. This could yield valuable insights into the evolution of the system’s behavior under changing market conditions.
The time-series bootstrap used fixed-block resampling. The ACF points to a possible ARMA model for the deltas, and the shifting variance points to a possible GARCH model. Constructing those models would let us perform model-based resampling, possibly with improved confidence intervals.
Recall that some days had multiple trades. It is very plausible to me that same-day trades are interdependent, not independent, but I did not study this possibility. If true, the bootstrap simulation should be revised to reflect this dependence structure.