How Shopify built an AI stack that doesn't care which models survive

Shopify created an LLM proxy that gives each engineer access to multiple AI providers, with automatic failover when any of them stop working, change, or disappear. When Claude Fable 5 closedShopify engineers didn’t panic. The proxy switched them to Claude Opus or GPT 5.5 automatically, without disrupting their workflows. “Fable looks amazing; we use it, of course,” Farhan Thawar, head of engineering at Shopify, says in a new VentureBeat podcast Beyond the Pilot. “When a model appears and then disappears, or can be as harmless as an update, the proxy allows us to distribute among the different providers,” says Thawar.

Shopify buys tokens in bulk and all users connect to the models through its proxy, Thawar says. This gives your team access to reporting and failover; When there is an availability issue with one provider, users can be “automatically and seamlessly” transferred to another. Companies can learn from this example and consider how a disruption could impact their businesses, Thawar says. At the very least, they should establish a solid backup plan. It is important to have a system that allows movement between models so that companies are not “super-tied” to a specific supplier. Distillation is another important strategy. With distillation, a student model learns from a teacher model and typically specializes in a more specific task. These small language models (SLMs) may be more beneficial than the generalized models available in some circumstances. For example, Shopify’s flagship AI assistant, Sidekick, which performs numerous specialized subtasks so merchants can “take the hard work out” of their day-to-day lives. Using smaller distillate models can be faster and cheaper than more generalized models, Thawar says. In some cases, they have proven to be twice as cheap and fast; in more extreme cases, 30 times cheaper and faster, he says. But “it’s not just about cost and latency, which are important; it’s about accuracy,” Thawar says. Engineers feed the UDP their master model, training data, evaluations, and a target model; say, Opus 4.8 distilled down to Qwen 3.5. The pipeline lasts about a day and then returns an evaluation that shows what the tuned model actually achieved in terms of speed, cost, and accuracy for that subtask. If the trade-off seems good, the engineer implements it, without the need for an approval process. Shopify’s internal platform, Tangle, allows anyone to visualize the process as it runs. Thawar says his “dream” is to not give the distillation pipeline any objective model. Instead, users could provide the teacher’s model with data and evaluations and the directive: “Based on what we’ve learned over time, I want you to look at a different kind of model, different sizes, different types, and tell me what the correct distillation target is.” “Maybe we’ll be surprised. Maybe it’s a model so small that it can work on a phone,” Thawar says. “Other times, maybe he comes back and says, ‘There’s no way to boil this down into anything better than what we have at the border.’”

moving away from "AI reflexivity" to "Leveraging AI"

Shopify users can apply any harness they want: Claude Code, Codex, Cursor, GitHub Copilot for VS Code. “We expose everyone to the different harnesses so they can get an idea of what may or may not work in their workflow.” But the company also implemented a usage panel; This allows Thawar’s team to ask interesting questions not only about token spending, but also: Who uses the most expensive tokens? Who spends more time reasoning? What types of models are used and in what disciplines and levels? As for the "tokenmaxxing" In question, Shopify has “circuit breakers” installed. If a user has a model running for a long time (say, 10 hours) and consumes a lot of tokens, they will be asked: “Did you want to spend this?” As Thawar explains, sometimes the answer is “Oh, absolutely.” Other times it’s, ‘Wow, I didn’t know that was running in the background.’ I completely forgot. I’d rather stop it now. The ultimate goal, as Thawar describes it, is to move from “AI reflexivity” to “AI leverage” and get people to really think deeply about where they can benefit most from AI in their workflows. Listen to the full podcast to learn more about:

Shopify’s philosophy of building infrastructure before features. As Thawar says: “We have always built more infrastructure. We will always continue to build more infrastructure.”
How River, Shopify’s internal AI agent, creates an “information substrate” across the company.
How Thawar’s OpenClaw agent discovered he was traveling on his schedule and what that moment told him about where agents are really headed.

You can also listen and subscribe Beyond the pilot in Spotify, Apple or wherever you get your podcasts.

Source link

How Shopify built an AI stack that doesn’t care which models survive

moving away from "AI reflexivity" to "Leveraging AI"

Leave a ReplyCancel Reply

At 33%, the Samsung Galaxy A37 5G has never been cheaper

The most affordable, eye-friendly phone of the year just hit a ridiculously low price on Prime Day and it’s totally free from the clutches of T-Mobile.