Factor Zoo
A friend forwarded me a smallcase last week. Six-factor edge, the pitch said. Beats momentum. Look at the backtest.
The chart was beautiful. Smooth line, up and to the right, crushing the index. And that is exactly when I get nervous.
Because a backtest is not a result. It is a hypothesis dressed up as one.
Years ago, three researchers (Hou, Xue and Zhang) took 452 published "market-beating" factors and did the boring thing nobody does. They rebuilt every one on clean data and asked how many actually survived a real statistical test. Sixty-five percent could not clear even a basic bar. Raise it to an honest bar, the kind you should use when hundreds of people are mining the same history, and eighty-two percent were gone.
Most of the famous factors were never there.
Here is why this happens, and it is not fraud. Test 100 useless signals against the past and about five will look "significant" by pure chance. Now picture an entire industry fishing in the same dataset. Someone always lands a winner. It just doesn't predict the next year, because it never described anything real in the first one.
The few that hold up (size, value, momentum, quality, low volatility) share two things the zoo doesn't. There is an economic reason they should pay you, a risk you carry or a behaviour you exploit. And they keep working on data they were never built on.
So the next time a backtest takes your breath away, don't ask how good the line looks. Ask what makes the factor real, and whether it survives a bar high enough to kill a coincidence.
Anyone can find a pattern in yesterday. The only factor worth your money is the one that still shows up in a tomorrow it never got to see.
How to tell a real factor from a data-mined one:
Educational content only. Figures are illustrative and computed on historical or representative data for teaching purposes. Not investment advice. Past performance does not guarantee future returns. Sourced from NSE, BSE, SEBI, AMFI, and RBI public data.