Back to all blogs

From PyPI to 4TB: How Lapsus$ Breached Mercor Through a Python Package

From PyPI to 4TB: How Lapsus$ Breached Mercor Through a Python Package

Aaryan Bhujang, AI security researcher

Aaryan Bhujang

Aaryan Bhujang

|

AI security researcher

AI security researcher

|

11 min read

From PyPI to 4TB: How Lapsus$ Breached Mercor Through a Python Package
Repello tech background with grid pattern symbolizing AI security

TL;DR

  • On March 31, 2026, extortion group Lapsus$ claimed a 4TB breach of Mercor, a $10 billion AI recruiting startup that works with OpenAI and Anthropic.

  • The entry point was not Mercor's own code. It was LiteLLM versions 1.82.7 and 1.82.8, where a threat actor called TeamPCP had compromised the library's PyPI publishing credentials and injected a three-stage malicious backdoor.

  • The backdoor harvested credentials and established persistent system access. Lapsus$ claims it used the resulting access to breach Mercor's Tailscale VPN, exfiltrating 939GB of source code, 211GB of database records, and nearly 3TB of files including candidate video interviews, KYC documents, and passports.

  • Mercor confirmed it was "one of thousands of companies" affected. This is what downstream blast radius looks like when a widely-used AI library is compromised at the package registry level.

Mercor is not a careless company. It is a well-funded, operationally mature AI startup valued at $10 billion, counting OpenAI and Anthropic among its clients, processing over $2 million in contractor payouts daily. It got breached anyway, through a Python package it imported, published by a maintainer whose PyPI credentials had been stolen by someone else entirely.

This is the Mercor breach. And the first thing to understand about it is that Mercor was not the target. It was the downstream consequence.

The attack started with LiteLLM, the open-source AI routing library used to standardize LLM API calls across models. LiteLLM is downloaded millions of times per day. It sits in the dependency tree of a significant fraction of every AI application in production. When a threat actor called TeamPCP compromised LiteLLM's PyPI publishing credentials in late March 2026 and pushed malicious versions 1.82.7 and 1.82.8, they did not need to find Mercor. They waited for Mercor, and thousands of other organizations, to pull the package themselves.

The attack did not breach Mercor's perimeter. It walked through the front door with a credential Mercor handed it.

How the attack actually worked

The LiteLLM backdoor was not a simple credential stealer bolted onto a popular package. TeamPCP injected a three-stage malicious payload into versions 1.82.7 and 1.82.8 of the LiteLLM PyPI package. The three-stage architecture is worth understanding because each stage served a distinct purpose.

Stage one executed on installation: the package's setup routines ran attacker-controlled code the moment any system ran pip install litellm with a vulnerable version. This stage collected environment information and identified the target environment's configuration.

Stage two harvested credentials: API keys, environment variables, cloud provider credentials, and service tokens accessible from the execution context. In the context of an AI application, this is an unusually rich harvest. AI applications are credential-dense environments: they hold LLM provider API keys, cloud storage credentials, database connection strings, vector store access tokens, and in enterprise environments, internal service authentication material.

Stage three established persistence: the backdoor created mechanisms for ongoing access that survived package updates and service restarts. This is the stage that converted a one-time credential theft into a durable foothold in affected environments.

The malicious code was identified and removed within hours. But for organizations that had already installed the compromised version, stage three had already run. The foothold was already established.

What Lapsus$ claims they took from Mercor

Lapsus$ published their claimed haul on a dark web leak site, with data listed for live auction. The breakdown as claimed:

939GB of platform source code. Mercor's internal codebase, covering the platform that manages contractor matching, AI model training workflows, and payout processing.

211GB of database records. The candidate database: resumes, professional credentials, and personal data for the specialized domain experts (scientists, doctors, lawyers) that Mercor recruits globally across markets including India.

Nearly 3TB of stored files. This is the category with the most significant downstream implications. The stored files reportedly include video recordings of candidate interviews conducted through Mercor's platform, identity verification documents, and passports submitted as part of KYC processes.

Video interviews containing face and voice data from candidates who submitted biometric information in the course of a job application represent a specific and severe category of exposure. These are not password resets. They are not rotatable. The candidates whose interview recordings and identity documents are in that 3TB dataset cannot re-issue their face or revoke their passport.

Mercor confirmed the breach and stated it was conducting an investigation with third-party forensics experts. The company declined to confirm whether Lapsus$'s specific claims about the data volume or categories were accurate.

Why this breach pattern is accelerating

This is not an isolated incident. The Mercor breach occurred on the same day as the Claude Code source code leak. Both happened in the last week of March 2026. Both originated in the AI toolchain. Both affected organizations that had no indication anything was wrong until after the fact.

The pattern connecting them is the same one that has defined AI supply chain risk since large-scale AI adoption began: organizations are building on dependencies they have not evaluated, running libraries they cannot inspect at runtime, and integrating tools whose behavior they cannot monitor. The AI application stack is a trust chain that most organizations have not audited, and threat actors have noticed.

LiteLLM is downloaded millions of times per day according to Snyk's analysis of the incident. A credential compromise at the PyPI level is not a targeted attack on any single organization. It is a broadcast attack against every organization that pulls the package. The organizations with runtime visibility into their AI dependencies detected the anomaly. The ones without visibility did not.

Mercor's statement that it was "one of thousands of companies" affected is accurate and, in a different framing, is the most important sentence in the entire incident. Thousands of organizations had the same package in their stack. Most of them still do not know what happened inside their environment.

What your organization should be doing now

Know what AI libraries are in your stack before an attacker uses them

The Mercor breach entered through a dependency, not a direct attack. The first control that would have changed the outcome is continuous visibility into which AI packages are running in your environment and at which versions.

This is not a solved problem with standard SCA (software composition analysis) tooling. Standard SCA tools catch known CVEs in dependencies. The LiteLLM backdoor was not a CVE; it was a malicious package version that existed for hours before being pulled. What catches it is behavioral monitoring: detecting anomalous network calls, unexpected credential access patterns, or new persistent processes spawned by packages that have no legitimate reason to spawn them.

Repello's AI Asset Inventory provides continuous discovery of AI libraries, packages, and integrations across your environment, flagging version changes and new AI dependency additions as they occur. If your LiteLLM version had changed to 1.82.7 or 1.82.8, it would have appeared as a new inventory event for your security team to review.

Test AI library integrations for supply chain vulnerabilities before they reach production

The LiteLLM backdoor executed on installation. A pre-production testing step that runs new or updated AI library versions in an isolated environment before deploying to production would have contained the stage one execution to a non-production context.

Repello's ARTEMIS tests AI integrations against known supply chain attack patterns, including credential harvesting payloads, persistent access mechanisms, and anomalous network behavior, before they reach production workloads. Running ARTEMIS against new AI library versions as part of your CI/CD pipeline converts supply chain testing from a reactive investigation into a proactive gate.

Apply runtime monitoring to detect credential harvesting in progress

The three-stage backdoor's stage two ran in the execution context of the LiteLLM library, with access to the same environment variables, API keys, and service credentials that legitimate LiteLLM usage accessed. No firewall rule would have flagged this; the network calls were coming from a process that had every right to be running.

What flags it is behavioral anomaly detection at the runtime layer: unexpected API calls to external endpoints, credential access patterns inconsistent with the library's documented function, or process spawning behavior outside the library's normal execution profile. Repello's ARGUS monitors AI runtime behavior continuously, establishing baselines for normal operation and alerting on deviations. A stage two credential harvest generates exactly the kind of anomalous credential access pattern that ARGUS is designed to catch.

The three incidents, one pattern

March 2026 produced three major AI security incidents in rapid succession: the LiteLLM backdoor, the Claude Code source code exposure, and the Mercor breach. Each is distinct in mechanism, but they share a common root: organizations building on AI tooling they cannot see into, cannot test systematically, and cannot monitor at runtime.

The LiteLLM supply chain attack was the injection point. The Mercor breach is what downstream impact looks like. The Claude Code leak is what happens when the tooling itself becomes a roadmap for attackers. In each case, the organizations with the most exposure were the ones who had the least visibility into their AI stack.

This is not a coincidence. It is the predictable consequence of AI adoption outpacing the security practices that govern it. AI supply chain security is not a future concern. It is the active threat category producing the largest confirmed incidents in AI security right now.

Frequently asked questions

What was the Mercor data breach?

Mercor, a 10billionAIrecruitingstartup,confirmedasecurityincidentonMarch31,2026resultingfromasupplychainattackontheLiteLLMopen−sourcelibrary.ExtortiongroupLapsus10billionAIrecruitingstartup,confirmedasecurityincidentonMarch31,2026resultingfromasupplychainattackontheLiteLLMopensourcelibrary.ExtortiongroupLapsus claimed responsibility and alleged exfiltration of 4TB of data including source code, candidate database records, and video interviews containing biometric data. The attack originated when threat actor TeamPCP compromised LiteLLM's PyPI publishing credentials and injected a three-stage malicious backdoor into versions 1.82.7 and 1.82.8.

What is the connection between the Mercor breach and LiteLLM?

LiteLLM is an open-source Python library for standardizing LLM API calls, downloaded millions of times per day. In late March 2026, a threat actor called TeamPCP compromised the library's PyPI publishing credentials and pushed two malicious versions containing a backdoor designed to harvest credentials and establish persistent access. Mercor, like thousands of other organizations, used LiteLLM. Importing the compromised version executed the backdoor in Mercor's environment, which Lapsus$ reportedly used to access Mercor's Tailscale VPN and exfiltrate data.

Who is Lapsus$?

Lapsus$ is an extortion-focused cybercrime group with a history of high-profile breaches of major technology companies. The group typically uses stolen credentials and social engineering to gain access, then publishes or auctions stolen data to pressure victims into payment. Their claimed involvement in the Mercor breach follows a pattern of targeting organizations with access to large volumes of sensitive personal and intellectual property data.

Was Mercor the only company affected by the LiteLLM backdoor?

No. Mercor stated it was "one of thousands of companies" affected. LiteLLM's download volume (millions of times per day) means the attack surface of the compromised versions was enormous. Most affected organizations have not publicly disclosed any impact, and in many cases may not yet know the full extent of what occurred in their environments during the window when the malicious versions were available.

What data is most at risk from AI supply chain attacks like this?

AI applications are credential-dense: they hold LLM API keys, cloud storage credentials, database connection strings, vector store tokens, and in enterprise environments, internal service authentication material. Beyond credentials, the data that AI applications process is often highly sensitive: training datasets, model fine-tuning data, user inputs to production AI systems, and in Mercor's case, candidate personal data including video interviews and identity documents. AI supply chain attacks that reach execution context have access to all of it.

How can organizations protect against AI supply chain attacks?

Three controls address the attack chain that produced the Mercor breach: continuous AI dependency inventory (detecting version changes before they reach production), pre-production behavioral testing of AI library updates (running new versions in isolated environments before deployment), and runtime monitoring for credential harvesting and anomalous behavior from AI library execution contexts. These controls need to be purpose-built for the AI context; standard SCA tools and network firewalls do not cover the specific mechanisms this attack used.

The Mercor breach is a case study in downstream supply chain blast radius. The direct target was LiteLLM's PyPI credentials. The downstream consequence was 4TB of data from a $10 billion company whose own security practices were not the point of failure.

Every organization running AI applications on open-source libraries has the same exposure surface. The ones who know what is in their stack, test it before it reaches production, and monitor it at runtime are the ones who will catch the next variant before it becomes a breach notification.

To see how Repello's AI Asset Inventory, ARTEMIS, and ARGUS work together to close the visibility gap that made the Mercor breach possible, book a demo or visit repello.ai/product.

Share this blog

Share on LinkedIn
Share on LinkedIn

Subscribe to our newsletter

Repello tech background with grid pattern symbolizing AI security
Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

AICPA SOC 2 certified badge
ISO 27001 Information Security Management certified badge

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.

Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

AICPA SOC 2 certified badge
ISO 27001 Information Security Management certified badge

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.