Are Machine Learning Investment Returns Overstated?

illustration of human and machine playing chess

Story by Jack Croft
Photo by iStock/Fedora Chiosea

Sterling Yan says yes and he knows why—look-ahead bias.

A review of recent asset pricing studies almost invariably leaves readers with the impression that machine learning methods routinely and significantly boost investment performance. In fact‚ says Sterling Yan, a professor who holds the Joseph R. Perella and Amy M. Perella Chair in Finance at Lehigh College of Business‚ “You get the impression that if you’re not using machine learning to invest‚ you are leaving a lot of money on the table.”

Over the past 100 years‚ U.S. stock market returns have averaged about 1 percent a month. Recent studies found that machine learning could double or even triple stock market returns delivered by traditional models. That led Yan and three international colleagues—Bin Li of Wuhan University in China‚ Alberto Rossi of Georgetown University‚ and Lingling Zheng of Renmin University of China—to wonder: “Are these numbers too good to be true?”

The short answer is‚ “Yes.” As a study conducted by Yan and his fellow researchers concluded: “Our analyses paint a more conservative picture of the practical value of machine learning strategies for real-time investors.”

Yan says the problem with previous studies has been that they examined anomaly variables—outliers or rare events or observations that deviated significantly from the norm.

“When researchers have evaluated the incremental benefit of using machine learning methodology‚ they go back to the 1960s and they assume that investors at that time knew about the variables that academics—in the 1990s or 2000s—were aware of‚” he explains. “They incorrectly assumed that investors knew those variables and had learned from them.

“That assumption is not feasible because‚ in the 1960s‚ investors didn’t know that 30 years or 40 years later‚ academics would publish papers explaining how variables could predict returns. They couldn’t foresee what was going to happen.”

“What we have here is a situation of ‘look-ahead bias’ which skews results upward in those previous studies—meaning the machine learning strategies that were examined could not be implemented in real time by investors‚” Yan says.

“We’re not saying machine learning strategy doesn’t work‚” he emphasizes. “That’s not our message at all.

We actually find positive performance from machine learning strategies‚ but it’s not as impressive as shown by previous studies. It’s not even close.”

Across the Universe to Real Time

Having identified the shortcomings of previous studies‚ Yan and his colleagues set out to offer a “fix” that more accurately reflected the economic benefit of using machine learning forecasts for real-time investors. They developed real-time machine learning investment strategies by examining a “universe” of more than 18‚000 fundamental‚ return-based signals constructed from financial statement variables.

While previous studies relied on a subset of fundamental signals that had been published in academic journals‚ which led to issues of data mining and look-ahead bias‚ using this larger universe of signals allowed Yan and his colleagues to develop machine learning strategies that are implementable in real time.

“We’re essentially trying to replicate the learning process of real-time investors‚” Yan says‚ “by providing a pool of candidate variables that investors can learn from. They can learn whether any one of the variables or none of them is able to predict returns, which is still machine learning‚ but it’s a real-time‚ implementable machine learning strategy.”

Why it Matters

Most finance experts, whether practitioners, scholars or those who manage their own investments, rely on research done by others to guide their strategies. “If you are talking about investment performance, it better be realizable,” Yan says.