Meta freezes AI data work after breach puts training secrets at risk



In brief: Meta suspended its collaboration with Mercor, a $10 billion AI data startup, after a supply chain attack exposed what could be the AI ​​industry’s best-kept secrets: not just personal data, but also the training methodologies that power the world’s leading language models. The breach, carried out via a poisoned version of the LiteLLM open source library, triggered investigations into OpenAI and Anthropic, and resulted in a class-action lawsuit affecting more than 40,000 people.

When hackers poisoned a widely used open source library last month, they didn’t just steal personal data. According to a report from Wired, they may have come away with the blueprints for how some of the most powerful AI models in the world are built.

Meta has stopped its work with Mercor, a San Francisco-based AI data company that generates custom training data sets for the biggest names in AI, after a cyberattack exposed sensitive information about how the company, and potentially several of its other clients, actually trains its models. The pause is indefinite and the incident has caused a wave of anxiety in an industry that has spent billions developing patented methods that it counted on keeping secret.

The startup behind the curtain

Mercor isn’t a household name, but it’s at a critical juncture in the AI ​​economy. Founded in 2023 by Brendan Foody, Adarsh ​​Hiremath, and Surya Midha, three Bay Area high school friends who competed together on Bellarmine College’s prep speech and debate team, the company recruits networks of human contractors, engineers, lawyers, doctors, bankers, and journalists to produce high-quality, proprietary training data for AI labs. Its clients include Meta, OpenAI, Anthropic and Google.

The startup’s rise has been extraordinary even by Silicon Valley standards. In October 2025, Mercor closed a $350 million Series C round that valued it at $10 billion, making the three founders the world’s youngest self-made billionaires at the age of 22. By September 2025, the company had reached $500 million in annualized revenue, up from $100 million just six months earlier. Its business model, which generates the tuning and reinforcement learning data that AI labs rely on but rarely discuss publicly, made it one of the most valuable private companies in the AI ​​supply chain.

That same positioning is now the source of his vulnerability.

A poisoned package, a cascade of exposure

The attack that hit Mercor originated several steps upstream. According to an analysis by Wiz, Snyk, and Datadog Security Labs, a group of threat actors known as TeamPCP compromised the CI/CD process of LiteLLM, an open source Python library used by millions of developers to connect applications to AI services, with 97 million monthly downloads and an estimated presence in 36% of cloud environments.

TeamPCP had previously used a supply chain attack against Trivy, a widely used security scanner, to obtain credentials belonging to a LiteLLM maintainer. On March 27, 2026, the group used those credentials to publish two malicious versions of the LiteLLM package, 1.82.7 and 1.82.8, directly to PyPI, the Python package repository. The contaminated packages were available for approximately 40 minutes before being identified and removed.

The payload was sophisticated. Version 1.82.7 embedded base64-encoded malware directly into the library’s proxy server code and was executed upon import. Version 1.82.8 used a malicious route configuration file that was automatically activated on each Python process startup. Both variants were designed to collect environment variables, API keys, SSH keys, cloud credentials in AWS, Google Cloud and Azure, Kubernetes configurations, CI/CD secrets and database credentials, exfiltrating everything to a server in models.litellm(.)cloud.

Mercor, which confirmed it was “one of thousands of companies” affected by the attack, later discovered that the breach had exposed approximately four terabytes of data. According to court documents and claims by the hacking groups involved, the stolen cache includes 939 gigabytes of platform source code, a 211-gigabyte user database, and approximately three terabytes of video interview recordings and identity verification documents. The information exposed may include the full names and Social Security numbers of more than 40,000 current and former Mercor contractors and clients.

The secrets that matter most

The exposure of personal data would be worrying enough. But what alarmed Meta and caught the attention of other AI labs is a completely different category of information.

Because Mercor sits within the data pipelines of multiple AI companies simultaneously, the breach may have exposed details about the data selection criteria, labeling protocols, and training strategies that the companies have spent years and billions of dollars developing. Competitors can replicate a data set; replicating a training methodology is more difficult and represents a real competitive moat. The Wired report notes that the scale of that potential exposure has led multiple artificial intelligence labs to investigate what, precisely, may have left its orbit.

OpenAI, which also uses Mercor’s services, has said it is investigating the incident but has not stopped its current projects with the company. anthropic, which raised $3 billion in early 2026 and has been aggressively expanding its research infrastructureHe has not commented publicly on his exhibition. Google, which operates similar relationships with competing data providers, is also understood to be assessing the scope of the breach.

The incident illustrates a structural risk that the AI ​​industry has rarely had to confront: When multiple competitors rely on the same third-party data provider, a single breach can expose the competitive secrets of all of them at once.

Extortion and legal consequences

The Lapsus$ threat group, which had previously been linked to high-profile attacks against large corporations, subsequently claimed responsibility for the Mercor breach and began auctioning off the stolen data on dark web forums. Security researchers believe that Lapsus$ is acting in collaboration with TeamPCP, which has emerged as a systematic threat across the enterprise software and artificial intelligence ecosystem. The same group is believed to be responsible for a Wave of supply chain compromises. which affected over 1,000 enterprise SaaS environments through the previous Trivy attack, including a European Commission breach attributed by CERT-EU to the same campaign.

On April 1, 2026, plaintiff Lisa Gill, a resident of Wahiawa, Hawaii, filed a class action lawsuit against Mercor.io Corp. in the United States District Court for the Northern District of California. The lawsuit alleges that Mercor failed to maintain adequate cybersecurity protections, leaving more than 40,000 people exposed to identity theft and fraud. The complaint states that the LiteLLM incident on March 27 was the entry point and that Mercor’s reliance on a compromised open source dependency without sufficient monitoring created the conditions for the breach.

Meta, meanwhile, has not said anything publicly, a silence that says it all. The company signed a $27 billion AI infrastructure deal with Nebius Group in March 2026. and has forecast capital expenditures of between $115 billion and $135 billion for the year, making its AI training portfolio one of its most strategically sensitive assets. Pausing a relationship with a data provider, even a major one, is the kind of decision that is made only when the risk to the proprietary methodology outweighs the operational cost of stopping the work.

A warning for the AI ​​supply chain

The Mercor breach is, in a sense, a conventional supply chain attack: a threat actor found a weak link in an open source dependency and exploited it to steal credentials and exfiltrate data. In another sense, it is something newer and more disturbing. The AI ​​industry has built its most valuable intellectual property on an interconnected network of data providers, open source tools, and shared infrastructure, and that network now constitutes an attack surface that no company fully controls.

Security companies have been warning precisely about this dynamic. Aikido Security, which achieved unicorn status in January 2026built its business on the premise that the risk of open source dependency had become existential for enterprise software. The Mercor incident suggests that the same logic applies, perhaps more acutely, to the AI ​​training process.

For the three young founders who built one of the fastest-growing technology companies, the coming months will test whether Mercor’s extraordinary momentum can survive a breach that exposed not only its users’ data but also its customers’ best-kept secrets. The dizzying 2025 of the AI ​​industry It was built under the assumption that the infrastructure supporting it was secure enough to be trusted. That assumption is now under review.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *