Academic factors only academic after all
While nearly 400 investment factors have been proposed in academic journals since 1960, new research calls into question their practicality – and their ability to earn anybody a performance fee.
The rise of quantitative and smart beta investing has also seen an explosion in the number of factors – that is, attributes or traits that allegedly drive returns – put forward by academic journals. But research by Campbell Harvey, a partner at Research Affiliates and Duke University theoretician, suggests that many factors (at least, those “discovered” by academics) are actually hot air.
But Harvey’s research, titled “The Pitfalls of Asset Management Research”, shows that over 400 factors have been published in top journals since the 1960s and that almost all of these factors are deemed “significant” by researchers. But Harvey believes that many of these results are produced through “p-hacking” – the misuse of data analysis to find patterns in data that can then be presented as statistically significant.
“Researchers frequently achieve statistical significance (or a low p-value) by making choices. For example, many variables might be considered and the best ones cherry picked for reporting,” Harvey wrote. “Different sample starting dates might be considered to generate the highest level of significance.
“Certain influential episodes in the data, such as those arising from the global financial crisis or the COVID-19 pandemic, might be censored because they diminish the strength of the research results.
P-hacking itself is driven by an incentive structure that encourages the publication of papers to earn a promotion or tenure, “leads to the unfortunate conclusion that roughly half of the empirical research findings in finance are likely false.” These papers also tend not to account for factors like implementation costs, including transaction and shorting costs.
To validate the theory, Harvey uses a testing group of smart beta ETFs, many of which are claimed to be based on “peer-reviewed research published in the finest academic journals” – peer-reviewed research that might have been p-hacked or overfit “to such an extent that the results are unlikely to repeat out of sample.” Backtested returns for the ETFs were strong, but after application to the SEC and the subsequent launch of the ETF, “the excess returns are zero” – an outcome Harvey believes is consistent with overfitting and/or p-hacking.
“As with academic research, investors need to be sceptical of asset management research conducted by practitioners,” Harvey writes. “Indeed, one company might comb through the academic research and do its own data mining in order to launch many ETFs, fully knowing some will fail. Nevertheless, the company receives a fixed fee. Given the large number of funds launched, most remember the winners more than the losers.”
The good news from Harvey’s test is that doing this in the real world obviously doesn’t create any sort of return. Asset managers are unlikely to choose the best performing backtest out of an acknowledgement that it results from “overfitting and luck”, and so won’t generate what they really want: performance fees. And if an asset manager’s products disappoint investors, they’ll flee the firm – so the “market mechanism naturally minimises the overfitting.”
Still, incentive structures in place within some asset management firms should give investors pause for thought. A dysfunctional research culture where one researcher is elevated over another when their research works out can result in other researchers engaging in data mining and p-hacking to effectively save their own skins by delivering a statistically “significant” result.
Investors can take a number of steps to mitigate the problem,” Harvey wrote. “First, be sceptical of both academic and practitioner research. Often a predetermined agenda or incentives make the results seem stronger than they are. Second, take the research culture into account. For example, when presented with a new strategy, ask if the company keeps a record of all variables that were tried.”
“…strategically ask questions such as “Did you try X?” If the answer is “Yes, and it does not work” and X is not reported, interpret this as a red flag. On seeing one cockroach, you can safely assume a dozen are behind the wall.”