
Enterprise AI is entering a new phase, where the central question is no longer what can be built, but how to get the most out of our investment in AI.
In the latest session of VentureBeat’s AI Impact Tour, Brian Gracely, Red Hat’s director of portfolio strategy, described the operational reality within large organizations: AI expansion, rising inference costs, and limited visibility into what those investments are actually generating.
It is “Day 2” time, when pilots give way to production and costs, governance and sustainability become more difficult than building the system in the first place.
"We’ve seen customers say, ‘I have 50,000 Copilot licenses.’ I really don’t know what people gain from that. But I do know that I am paying for the most expensive computing in the world, because they are GPUs.”" Gracely said. "’How am I going to control that?’"
Why enterprise AI costs are now a board-level issue
For much of the past two years, cost was not the primary concern for organizations evaluating generative AI. The experimental phase gave teams cover to spend freely, and the promise of productivity gains justified aggressive investment, but that dynamic is changing as companies enter their second and third budget cycles with AI. The focus has moved from "Can we build something?" to "Do we get what we pay for?"
Companies that made big early bets on managed AI services are conducting rigorous reviews to determine whether those investments are generating measurable value. The problem is not just that GPU computing is expensive. The problem is that many organizations lack the tools to connect spend to results, making it nearly impossible to justify renewals or scale responsibly.
The strategic shift from token consumer to token producer
The dominant AI procurement model of recent years has been simple: pay a vendor per token, per seat, or per API call, and let someone else manage the infrastructure. That model made sense as a starting point, but more and more organizations with enough experience to compare alternatives are questioning it.
Companies that have gone through an AI cycle are starting to rethink that model.
"Instead of being purely a token consumer, how can I start being a token generator?" Gracely said. "Are there use cases and workloads where it makes sense to have more? It may mean operating GPUs. It may mean renting GPUs. And then you ask, ‘Does that workload need the best-of-breed model?’ Are there more capable open models or smaller models that fit?"
The decision is not binary. The right answer depends on the workload, organization, and risk tolerance involved, but the math becomes more complicated as the number of capable open models grows, from DeepSeek to models now available through cloud marketplaces. Now companies actually have real alternatives to the handful of suppliers that dominated the landscape two years ago.
Falling AI Costs and Rising Usage Create a Paradox for Enterprise Budgets
Some business leaders argue that locking in infrastructure investments now could mean overpaying in the long run, pointing to Anthropic CEO Dario Amodei’s statement that AI inference costs are declining by about 60% per year.
The emergence of open source models like DeepSeek and others has significantly expanded the strategic options available to companies willing to invest in the underlying infrastructure over the past three years.
But while costs per token are falling, usage is accelerating at a pace that more than offsets the efficiency gains. It is a version of Jevons’ paradox, the economic principle that improvements in resource efficiency tend to increase total consumption rather than reduce it, since lower cost allows for broader adoption.
For business budget planners, this means that decreasing unit costs do not translate into decreasing total bills. An organization that triples its use of AI while halving costs ends up spending more than before. The consideration becomes which workloads actually require the more capable, more expensive models, and which can be handled perfectly by smaller, cheaper alternatives.
The business case for investing in AI infrastructure flexibility
The recipe is not to curb investment in AI, but to build with flexibility in mind. The organizations that will win are not necessarily the ones that move the fastest or spend the most; they are the ones building infrastructure and operating models capable of absorbing the next unexpected development.
"The more you can create some abstractions and have some flexibility, the more you can experiment without increasing costs, but also without jeopardizing your business. They are just as important as asking if you are applying all the best practices right now." Gracely explained.
But as entrenched as AI discussions have become in business planning cycles, the hands-on experience most organizations have is still measured in years, not decades.
"It seems like we’ve been doing this forever. We’ve been doing this for three years," Gracely added. "It’s early and it’s moving very quickly. You don’t know what comes next. But the features of what comes next should give you an idea of what it will look like.”
For business leaders still calibrating their AI investment strategies, that may be the most practical conclusion: The goal is not to optimize the current cost structure, but to develop the organizational and technical flexibility to adapt when, not if, it changes again.





