Checklist: How to Audit a Backtest for Bias

A practical checklist for detecting bias in investment backtests. Covers survivorship bias, look-ahead bias, overfitting, transaction costs, and more.

February 15, 2026

Backtesting is the process of testing an investment strategy against historical data to see how it would have performed. It is an essential tool for systematic investors — but it is also one of the easiest places to fool yourself. A backtest that looks incredible on paper can be completely worthless in practice if it is contaminated by bias.

This checklist gives you a structured framework for auditing any backtest — your own or someone else's — to determine whether the results are trustworthy or the product of hidden flaws.

1. Survivorship Bias

Survivorship bias occurs when a backtest only includes companies that still exist today, ignoring those that went bankrupt, were acquired, or delisted. This systematically inflates returns because you are only looking at winners.

How to detect it: Ask: does the dataset include delisted companies? If the stock universe was built from today's index membership applied backwards in time, the backtest has survivorship bias.
How to fix it: Use a survivorship-bias-free dataset that includes all companies that existed at each point in time, regardless of whether they still exist. Most professional-grade datasets (CRSP, Compustat) handle this.

2. Look-Ahead Bias

Look-ahead bias happens when a backtest uses information that would not have been available at the time the trading decision was made. This is surprisingly common and devastatingly misleading.

How to detect it: Check when financial data becomes available. Earnings for Q4 are not reported until weeks after the quarter ends. If the backtest uses Q4 earnings on January 1, it has look-ahead bias. Also check if signals use adjusted data (like restated earnings) that changed after the fact.
How to fix it: Use point-in-time data that reflects exactly what was known at each date. Add realistic reporting lags — assume at least a 45-day delay for quarterly financials.

3. Data-Mining / Overfitting

Data-mining bias (or overfitting) occurs when you test so many variations of a strategy that you inevitably find one that looks great — purely by chance. If you try 100 parameter combinations, some will show strong returns by random luck alone.

How to detect it: Ask: how many parameter variations were tested? Is the final strategy suspiciously specific (e.g., "buy stocks with P/E between 12.3 and 14.7 on the third Tuesday of each month")? Does the strategy have a logical, economic rationale, or was it discovered purely through optimization?
How to fix it: Start with economic intuition and use backtesting to validate, not discover. Keep strategies simple with few parameters. Apply multiple testing corrections (like the Bonferroni adjustment) when many combinations are tested.

4. Transaction Costs and Slippage

Many backtests assume zero trading costs, which is unrealistic. Every trade has a real cost: commissions, bid-ask spreads, and market impact (the price moving against you as you trade).

How to detect it: Check if the backtest includes trading costs. Look at the turnover rate — strategies that trade frequently are more sensitive to transaction costs. A high-frequency rebalancing strategy showing 20% annual returns might turn negative after realistic costs.
How to fix it: Include realistic round-trip costs of 0.1-0.5% per trade for liquid large caps, and 1-2% for small caps and illiquid stocks. Add slippage estimates, especially for larger position sizes. If the strategy still works after costs, it is more likely real.

5. Regime Dependency

A strategy might work beautifully during bull markets but collapse in bear markets, or vice versa. Regime dependency means the strategy's success is tied to specific market conditions that may not persist.

How to detect it: Break the backtest into sub-periods: bull markets, bear markets, high-inflation periods, rising-rate environments. Does the strategy perform consistently across all regimes, or does all the alpha come from one favorable period?
How to fix it: Test across multiple market cycles. A strategy that only works in bull markets is not a strategy — it is disguised market exposure. Favor strategies that show positive returns across different environments, even if the absolute returns are lower.

6. Sample Size

A backtest over 3 years might capture one market cycle. A backtest over 20 years captures multiple cycles and is far more reliable. Short backtest periods are statistically meaningless.

How to detect it: Check the backtest period. Fewer than 10 years is a yellow flag. Fewer than 5 years is a red flag. Also check the number of trades — a strategy with only 20 trades over 15 years does not have enough data points to be statistically significant.
How to fix it: Use the longest available data history. If limited to shorter periods, be honest about statistical significance. Calculate t-statistics and p-values for the excess returns — a t-stat below 2.0 means the results may be due to chance.

7. Out-of-Sample Testing

The most powerful validation technique: reserve a portion of your data that you never use during strategy development, then test on it only once to validate.

How to detect it: Ask: was the entire dataset used to develop the strategy? If so, there is no out-of-sample validation. The in-sample results are inherently optimistic.
How to fix it: Split data into training (70%) and testing (30%) sets. Develop the strategy on the training set, then validate on the testing set. You can also use walk-forward analysis, where the strategy is repeatedly calibrated on past data and tested on the next unseen period.

8. Point-in-Time Data

Financial databases often update historical records when companies restate earnings, revise guidance, or reclassify items. If your backtest uses the revised data rather than what was originally reported, it introduces a subtle but significant bias.

How to detect it: Check if the data source provides point-in-time snapshots or only current (revised) data. If using a standard financial API, you are probably getting revised data.
How to fix it: Use data providers that offer point-in-time databases (CRSP, Compustat, FactSet). If unavailable, acknowledge this as a limitation and avoid strategies heavily dependent on exact earnings figures around reporting dates.

9. Capacity and Liquidity Constraints

A strategy that works with $10,000 might not work with $10 million. Many backtests ignore the practical reality that executing large trades in illiquid stocks moves prices against you.

How to detect it: Check the average market cap and daily volume of stocks in the strategy. If it holds 10% positions in micro-cap stocks, the strategy is not realistic at meaningful scale.
How to fix it: Filter for stocks with adequate daily volume relative to your position size. A common rule: your daily trading should not exceed 10% of the stock's average daily volume. Apply market impact models for larger portfolios.

The Quick Audit Checklist

Use this as a rapid reference when evaluating any backtest:

Does the dataset include delisted/bankrupt companies? (Survivorship)
Is all data used only after it was publicly available? (Look-ahead)
How many parameter combinations were tested? (Overfitting)
Are realistic transaction costs included? (Costs)
Does it work across bull and bear markets? (Regime)
Is the sample period long enough (10+ years)? (Sample size)
Was out-of-sample data reserved and tested? (Validation)
Does the data reflect what was known at each point in time? (Point-in-time)
Can the strategy execute at realistic scale? (Capacity)

If a backtest fails more than two of these checks, treat its results with heavy skepticism.

From Theory to Practice

The best way to build testable investment hypotheses is to start with simple, fundamentally grounded screening criteria — metrics with academic support and economic logic. Our screener is designed to help you do exactly that:

Start with the Value Screener Preset to build a replicable screen based on well-established valuation criteria.

Use the screener's filter combinations to create clear, repeatable hypotheses. Document your criteria, track them over time, and always apply the audit checklist above before drawing conclusions from the results.

#Advanced #Screening #Fundamentals

Get a Free Account

Unlock watchlists, saved screens, and weekly market insights.