
“All of the frontier models we evaluated lost money during the season and many experienced ruin,” the paper’s authors concluded, and the AI ”systematically underperforms humans” in this scenario.
| AI model | Average return on investment | best try | worst attempt | Middle End Bottom |
|---|---|---|---|---|
| Anthropic Claude Opus 4.6 | –11.0% | –0.2% | –18.8% | £89,035 |
| OpenAI GPT-5.4 | –13.6% | –4.1% | –31.6% | £86,365 |
| Google Gemini 3.1 Pro | –43.3% | +33.7% | –100.0% | £56,715 |
| Google Gemini Flash 3.1 LP | –58.4% | +24.7% | –100.0% | £41,605 |
| Z.AI GLM-5 | –58.8% | –14.3% | –100.0% | £41,221 |
| Kimi K2.5 Moonshot | –68.3% | –27.0% | –100.0% | £7,420 |
| xAI Grok 4.20 | –100.0% | –100.0% | –100.0% | £0 |
| Trinity of Acre | –100.0% | –100.0% | –100.0% | £0 |
| Each model started with a standard fund of £100,000. Return on investment and final funds are averaged over three attempts. Grok and Trinity did not complete all attempts. | ||||
The results offer some comfort to white-collar professionals and companies who fear AI could take their jobs, as it affects stocks in industries ranging from finance to marketing.
Ross Taylor, one of the study’s authors and CEO of General Reasoning, said: “There’s a lot of hype about automating AI, but not much action to put AI on a long-term horizon.”
He added that many of the benchmarks typically used to test AI are flawed because they are set in “very static environments” that bear little resemblance to the chaos and complexity of the real world.
The General Reasoning paper, which has not yet been peer-reviewed, provides a counterweight to growing enthusiasm in Silicon Valley over recent enormous advances in AI’s ability to complete computer programming tasks with little or no human intervention.
Taylor, a former Meta AI researcher, said: “If you… try AI on some real-world tasks, it performs really poorly… Yes, software engineering is very important and economically valuable, but there are many other activities with longer time horizons that are important to consider.”
© 2026 The Financial Times Ltd.. All rights reserved. It must not be redistributed, copied or modified in any way.





