How to Choose the Right Benchmark

Brian Ernest Metzger on April 3, 2026

🔊 Listen to audio

Benchmark selection is not a reporting detail. It defines the control portfolio, the economic question, and the tradeoff the backtest is actually answering.

At Backtested Strategies, a benchmark is chosen to make the strategy’s mechanism, tradeoff, and opportunity cost clear, not to make the result look strong or weak.

This article explains how benchmarks are chosen, what a benchmark should and shouldn’t do, and when a strategy-specific benchmark is more appropriate than a generic market index.

A useful benchmark should pass three fairness tests:

Does it preserve the same underlying opportunity set?
Does it remove the active overlay without adding a hidden second thesis?
Would the comparison still feel fair if the strategy underperformed?

Why benchmark choice matters

A benchmark defines the economic baseline a strategy is measured against. Change the benchmark, and the comparison may start answering a different question.

A weak benchmark can make an ordinary strategy look impressive, and a misaligned benchmark can make a good strategy look worse than it actually is by answering the wrong question. Any given trading strategy may look strong against one control and weak against another. In both cases, the problem is the same: the benchmark is not isolating the strategy’s actual contribution.

That’s why benchmark choice is treated as part of the strategy’s published specification. The benchmark isn’t a decorative opponent. It’s the control portfolio that shows what the active overlay changed in economic terms: what risk it reduced, what return it gave up, and in which market regimes those tradeoffs appeared.

What a benchmark is

A benchmark is the fairest investable control portfolio for the strategy being tested, not an opponent to beat. For investors asking how to choose a benchmark index, the first step is deciding whether an index is actually the right control portfolio for the strategy.

The core question is simple: if we keep the same opportunity set and the same broad economic assumptions, but remove the strategy’s active overlay, what remains?

That overlay may be timing, ranking, rotation, defensive routing, volatility targeting, leverage, security selection, a weighting scheme, or another rule that changes the passive baseline. The benchmark should remove that overlay without quietly changing the underlying investment problem.

In other words, the benchmark should preserve the passive objective while stripping away the active decision. That’s what makes the comparison useful: the remaining difference is easier to interpret as the strategy’s contribution, not as a mismatch between two different economic problems.

What the benchmark is supposed to do

A good benchmark has a demanding job: it shouldn’t make the strategy look better or worse than it is. It should make the mechanism visible.

Isolate the value of the overlay. The benchmark should hold constant as much as possible so the remaining difference mainly reflects timing value, selection value, weighting value, defensive-routing value, or another clearly identifiable source.
Show the tradeoff honestly. The point isn’t to declare a winner. It’s to show what the strategy bought, what it cost, and when those effects showed up.
Provide a fair head-to-head comparison. A skeptical reader should be able to look at the benchmark and accept that it is a coherent and implementable control.
Expose opportunity cost. Defensive strategies should show the upside they gave up, and participation-seeking strategies should show the path risk they accepted.
Support the scorecard and the equity-curve narrative. The benchmark should help explain drawdown tradeoffs, volatility tradeoffs, relative windows, whipsaw regimes, defensive wins, and offensive penalties.

When the benchmark is right, the comparison becomes harder to mischaracterize. The tradeoff is visible even when it’s inconvenient.

What the benchmark should not do

A bad benchmark can fail in several ways, and most of them make a strategy look cleaner than it really is.

It should not be a straw man. Do not choose a benchmark because it is easier to beat than the real passive alternative.
It should not answer a different question. A benchmark can be economically interesting and still be wrong if it isn’t tied to the strategy’s actual decision problem.
It should not smuggle in hidden overlays. Tactical cash rules, altered weighting schemes, or other embedded views can turn the benchmark into a second strategy instead of a control.
It should not blur accounting categories. Comparing total return to price return, or mixing benchmark and strategy accounting without stating the comparison object clearly, creates confusion rather than insight.
It should not hide opportunity cost. If the strategy’s benefit is defense, the benchmark must still show the foregone upside. If the strategy’s benefit is participation, the benchmark must still show the extra path risk accepted to earn it.

The simplest test is this: would the comparison still feel fair if the strategy underperformed? If not, the benchmark is probably wrong.

How to choose the right benchmark

The framework is simple: preserve the same opportunity set and broad realism standard while removing the strategy’s active overlay. Put differently, the best benchmarking method is the one that keeps the investment problem comparable while isolating the active decision rule being tested.

Identify the active overlay. Start by naming exactly what the strategy is doing that a passive implementation would not do. That may be a trend filter, rotation rule, risk target, selection model, weighting rule, or defensive routing rule.
Identify the passive economic baseline. Remove only that overlay and ask what portfolio still represents the same investment problem. In many cases, that portfolio is the best primary benchmark.
Match the mechanics that materially matter. The benchmark should use the same realism standard where relevant: evaluation window, execution convention, cost realism, start rule, rebalancing rule, and construction assumptions. The goal is not to reproduce the strategy’s realized trade path. It is to match the same investment problem and realism standard while removing the active overlay.
Check whether the difference is now interpretable. If the remaining performance gap can be described mainly as timing value, selection value, weighting value, risk-management value, or defensive-routing value, the benchmark is likely doing its job. If the difference still reflects a different opportunity set or a hidden second thesis, it’s not.
Add secondary diagnostics only when they answer a separate question. Some strategies benefit from an additional unrebalanced companion, a mechanics-matched companion, or a broad market reference for context. For strategies with leverage or volatility targeting, an exposure-matched diagnostic can be useful, but it should not replace the primary control portfolio.

This framework is intentionally conservative. It doesn’t ask which benchmark is easiest to explain or market. It asks which benchmark makes the strategy’s mechanism clearest and hardest to mischaracterize.

The benchmark also has to meet the same feasibility discipline as the strategy. If a benchmark depends on index membership, that membership should be point-in-time. If it references an index or asset class that cannot be traded directly, the benchmark should use a clearly stated tradable proxy. If the benchmark cannot be modeled under the same realism standard as the strategy, it may still be useful as context, but it’s not a clean primary control.

Primary, diagnostic, and context benchmarks

Not every benchmark does the same job. This framework separates benchmarks into three categories:

Benchmark Type	Job	What It Should Not Do
Primary Benchmark	The main control portfolio for the strategy. It answers the core economic question and carries the main attribution burden.	It should not be swapped after results are known.
Diagnostic Benchmark	A secondary comparator used to isolate a narrower implementation question, such as rebalancing, dividend cash treatment, or cash deployment.	It should not replace the main control.
Context Benchmark	A broad market or category reference used for reader orientation.	It should not be treated as the benchmark that isolates the strategy’s overlay.

Only the primary benchmark should carry the headline comparison. Diagnostic and context benchmarks may appear on charts or in supporting discussion, but they should not replace the primary benchmark in the main scorecard unless they are explicitly relabeled. SPY can be useful as context even when it’s the wrong primary benchmark. Readers often want to know how a strategy behaved relative to a familiar market line. That’s reasonable, but it’s a different question from the one the control portfolio is supposed to answer.

The framework also distinguishes between a primary benchmark and a mechanics-matched companion. The primary benchmark, which carries the headline comparison, uses the standard benchmark convention described in BTS Methodology, including synthetic total-return reporting with ordinary dividends reinvested. A mechanics-matched companion may be shown alongside it to answer a narrower implementation question, such as whole-share trading or dividend cash retention. It’s diagnostic only and must not replace, redefine, or be blurred together with the primary benchmark.

How this article relates to BTS Methodology

This article governs benchmark selection. The BTS Methodology page governs benchmark implementation.

The distinction matters.

This article answers benchmark-selection questions: which control portfolio is right, whether the benchmark should be generic or strategy-specific, and whether SPY is the primary benchmark, a context line, or the wrong comparator.

The Methodology page answers implementation questions: when the benchmark starts, how it is priced, how dividends are handled, whether rebalances incur costs, and which execution and accounting conventions apply.

Strategy entries may specify the benchmark instrument set, weighting method, rebalance schedule, start rule, and maintenance rules. Locked methodology mechanics still apply unless the Methodology page itself permits a different treatment.

The primary benchmark is reported as synthetic total return, with ordinary dividends reinvested under the Methodology page’s benchmark convention. A mechanics-matched companion may be reported separately as a diagnostic, but it doesn’t replace the primary benchmark.

In short, this article explains what makes a benchmark fair. The BTS Methodology page explains how the selected benchmark is modeled.

The BTS benchmark rule

The benchmark rule is simple:

A benchmark is not an opponent to beat. It is the fairest investable control portfolio for a strategy: the same opportunity set, the same broad economic assumptions, and the active overlay removed.

That rule does not require every strategy to use a custom benchmark, and it does not require every strategy to use a familiar market index. A strategy-specific benchmark is appropriate when a defined basket, sleeve structure, or eligible universe would be misrepresented by a generic market line. A standard benchmark is appropriate when it is the correct passive baseline for the strategy’s decision problem. The test is fit, not familiarity.

Once the primary benchmark is selected, it should be specified precisely enough to be repeatable: the instruments or basket, target weights, rebalance cadence, benchmark accounting convention, execution and cost convention where relevant, and the rule for when the benchmark starts. Primary benchmark dividend handling follows the Methodology page’s synthetic total-return convention.

Benchmark choice is therefore part of the strategy’s published specification. Changing the primary benchmark is not an editorial tweak; it changes the meaning of the comparison. The benchmark should reveal the cost and benefit of the strategy’s objective, not simply declare a winner.

Return to the three tests from the beginning: the benchmark should preserve the same underlying opportunity set, remove the active overlay without adding a hidden second thesis, and still feel fair if the strategy underperformed. If not, it’s probably answering the wrong question.