The business risk that no one is modeling: AI is replacing the very experts you need to learn from



For artificial intelligence systems To continue improving in knowledge work, they need a reliable mechanism for autonomous improvement or human evaluators capable of detecting errors and generating high-quality feedback. The industry has invested hugely in the former. He hardly thinks about what happens to the second one.

I would argue that we must treat the problem of human evaluation with as much rigor and investment as we put into building the model’s capabilities. Hiring of new graduates at major technology companies has increased fell by half since 2019. Document review, first-pass research, data cleansing, code review – models take care of this now. Economists who follow this call it displacement. Companies that do it call it efficiency. Neither of them focuses on the future problem.

Why self-improvement has limits in knowledge work

The obvious step back is reinforcement learning (RL). AlphaZero learned Go, Chess, and Shogi at superhuman levels without human data and generated novel strategies in the process. The 37th move in the 2016 match against Lee Sedol, a move the pros said they would never have made, did not arise from human scoring. It emerged from autonomous AI play.

What allows this is the stability of the environment. Move 37 is a novel move within the fixed state space of Go. The rules are complete, unambiguous and permanent. More importantly, the reward signal is perfect: win or lose, and immediate, with no room for interpretation. The system always knows if a play was good because the game ends with a clear result.

Knowledge work has none of those properties. The rules in any professional field are dynamic and are continually rewritten by the humans who operate within them. New laws are passed. New financial instruments are invented. A legal strategy that worked in 2022 may fail in a jurisdiction that has since changed its interpretation. Years may pass without knowing whether a medical diagnosis was correct. Without a stable environment and an unambiguous reward signal, the circle cannot be closed. Humans are needed in the evaluation chain to continue teaching the model.

The problem of training

The AI ​​systems being built today were trained with the experience of people who went through exactly that training. The difference now is that the entry-level jobs that develop that expertise were automated first. Which means that the next generation of potential experts is not accumulating the type of trial that makes it worth having a human evaluator in the loop.

History has examples of knowledge dying. Roman concrete. Gothic construction techniques. Mathematical traditions that took centuries to recover. But in all historical cases, the cause was external: the plague, the conquest, the collapse of the institutions that housed knowledge. The difference here is that no external force is required. Fields could atrophy not because of one catastrophe but because of a thousand individually rational economic decisions, each of them sensible in isolation. This is a new mechanism and we don’t have much practice recognizing it as it happens.

When entire fields fall silent

At its logical limit, this is not just a pipeline problem. it’s a demand collapse from one’s own experience.

Let’s consider advanced mathematics. It does not atrophy because we stop training mathematicians. It atrophies because organizations stop needing mathematicians for their daily work, the economic incentive to become one disappears, the population of people who can do frontier mathematical reasoning shrinks, and the field’s ability to generate novel knowledge quietly collapses. The same logic applies to encoding. Our question is not “will AI write code?” but “if AI writes all the production code, who develops the deep architectural intuition that produces genuinely novel system designs?”

There is a critical difference between a field that is automated and a field that is understood. We can automate a lot of structural engineering today, but the abstract knowledge of why certain approaches work lives in the heads of the people who spent years doing it wrong first. If you eliminate the practice, you not only lose the practitioners. You lose the ability to know what you have lost.

Advanced mathematics, theoretical computer science, deep legal reasoning, complex systems architecture: When the last person who deeply understands a subfield of algebra retires and no one replaces him or her because funding dried up and the career path disappeared, that knowledge is not likely to be rediscovered anytime soon.

It’s gone. And no one notices because models trained on their work still perform well on benchmarks for another decade. I think of this as hollowing out: the surface ability remains (models can still produce expert-looking results) while the underlying human ability to validate, extend, or correct that expertise quietly disappears.

Why rubrics do not completely replace

The current approach is rubric-based assessment. Constitutional AI, reinforcement learning from AI feedback (RLAIF), and structured criteria that allow models to score are serious techniques that significantly reduce dependence on human evaluators. I don’t rule them out.

His limitation is this: a rubric can only capture what the person who wrote it knew how to measure. Optimize a lot and you’ll end up with a model that’s very good at satisfying the rubric. That is not the same as a model that is actually correct.

The rubrics escalate the explicit and articulable part of the judgment. The deepest part, the instinct, the feeling that something is wrong, does not fit into a rubric. You can’t write it because you need to experience it first before you know what to write.

What this means in practice

This is not an argument to stop development. The capacity gains are real. And researchers may find ways to close the evaluation loop without human judgment. Maybe synthetic data channels are good enough. Perhaps the models will develop reliable self-correcting mechanisms that we cannot yet imagine.

But today we don’t have them. And in the meantime, we are dismantling the human infrastructure that currently fills the void, not as a deliberate decision but as a byproduct of thousands of rational decisions. The responsible version of this transition is not to assume that the problem will resolve itself. It is about treating the evaluation gap as an open research problem with the same urgency that we give to capacity gains.

What AI needs most from humans is what we are least focused on preserving. Whether this is true permanently or temporarily, the cost of ignoring it is the same.

Ahmad Al-Dahle is chief technology officer at Airbnb.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *