Back to all blogs

|
|
14 min read


TL;DR: AI supply chain attacks compromise the dependencies, models, data pipelines, and tooling that AI applications consume rather than attacking the application itself. The attack surface is significantly larger than traditional software supply chain risk because it includes model weights, training data, vector databases, SaaS-embedded AI features, and the MCP plugin ecosystem, not just code packages. Three Q1 2026 incidents (LiteLLM, axios, Mercor) demonstrated a repeatable four-step methodology now being used against AI teams at scale. This guide covers the full attack surface, how each vector is exploited, what detection looks like for each, and the three-layer defense model that closes the structural gaps.
What is an AI supply chain attack?
A supply chain attack targets something your system trusts and consumes rather than attacking your system directly. In traditional software, this means compromising a library, package, or build tool. In AI applications, the supply chain is broader: it includes the packages your code depends on, the model weights you serve, the training data that shaped the model's behavior, the SaaS tools with embedded AI features, and the plugins or MCP servers your agents connect to.
The risk profile is different at each layer. A compromised code package can harvest credentials during a build. A compromised model weight can introduce a backdoor that only activates on specific inputs. Standard accuracy testing will not catch it. Poisoned training data can shift a model's behavior in ways that persist through fine-tuning cycles. Each attack surface requires a different detection and mitigation approach.
What all AI supply chain attacks share: they exploit implicit trust. Your pipeline trusts a package because a known maintainer published it. Your inference server trusts a model weight because it came from an authorized registry. Your agent trusts an MCP server's tool responses because the server is listed in your configuration. Attackers exploit that trust at the layer where it is hardest to verify.
The five attack surfaces
1. Package and library dependencies
This is the most active attack surface in 2026. AI applications depend heavily on Python packages (PyPI) and JavaScript packages (npm) that are maintained by small teams, often by individual contributors with limited account security. Compromise a maintainer account, publish a malicious version, and every organization that runs a build with that dependency is a potential victim.
Three incidents from Q1 2026 demonstrate how far this has progressed. Attackers compromised a PyPI contributor to LiteLLM and published versions 1.82.7 and 1.82.8 with a three-stage credential-harvesting backdoor. Separately, North Korean-attributed threat actors compromised the npm account of a core axios maintainer and published two malicious versions carrying a cross-platform RAT that infected the first endpoint within 89 seconds of publication. In both cases, the malicious payload used anti-forensic techniques (self-deletion, manifest restoration) to appear clean on post-mortem inspection. The detailed attack pattern across all three incidents is covered in Account Compromise, Phantom Dependency, No Forensic Trail.
Software composition analysis tools check against known-malicious signatures. A phantom dependency published on the day of the attack has no CVE and no known-bad history. Detection requires version drift monitoring and behavioral sandbox testing, not signature matching.
2. Model weights and registries
Model weights are large binary files that most organizations treat like trusted build artifacts: downloaded once, deployed, and not scrutinized further. Malicious actors can introduce backdoors into model weights in two ways. They can compromise the registry account of a legitimate model publisher (the same account-compromise vector used in package attacks), or they can publish convincingly named models designed to be mistaken for legitimate releases.
A model backdoor, sometimes called a Trojan or sleeper, is a modification to the weight file that causes the model to behave normally on standard inputs but produce attacker-controlled outputs when a specific trigger is present. Research published at NeurIPS demonstrated that backdoors can be injected into neural networks such that they are undetectable through standard accuracy evaluation. The backdoor only activates when the trigger token or pattern appears in the input.
Hugging Face hosts hundreds of thousands of models from a large contributor base. The same account security posture that makes npm maintainer accounts vulnerable applies here. A compromised Hugging Face publisher account can push a malicious version of a widely used model without triggering automated alerts.
To address this: verify model provenance before deployment, check cryptographic hashes against those published by the original research team, and run behavioral testing with adversarial inputs before serving a new or updated model weight in production.
3. Training data and RAG pipelines
Training data attacks and RAG poisoning operate at an earlier point in the AI supply chain than package or model attacks, but the effects are more persistent. Data poisoned before or during training shapes the model's learned behavior in ways that survive fine-tuning cycles. A backdoor introduced at training time does not disappear when you update your application code or rotate your API keys.
Research from Carnegie Mellon University demonstrated that clean-label poisoning attacks, where training samples are poisoned without altering their labels, can achieve attack success rates above 90% while remaining undetectable through data quality audits. The poisoned samples appear correct to human reviewers and pass automated validation.
RAG poisoning is a more immediate risk for most production teams. Retrieval-augmented generation systems pull context from document stores, knowledge bases, or vector databases at inference time. An attacker who can write to those data sources, via a compromised integration, a document upload feature, or a misconfigured permission boundary, can inject adversarial content that manipulates model outputs for specific queries. Our RAG security guide covers the injection attack chain in detail.
To address this: treat training data and RAG document stores as integrity-critical systems, implement write controls and audit logs on your vector databases, and test your RAG pipeline against adversarial document injection before deployment.
4. SaaS-embedded AI features
Many organizations have AI supply chain exposure they are not tracking. SaaS products across project management, CRM, HR, communication, and productivity categories have added AI features without any announcement or opt-in. These features may process sensitive data, access integrated systems, or take actions on the user's behalf.
This creates a shadow AI supply chain problem. Your approved software vendor updates their product to include an AI assistant that reads your Slack messages to generate summaries. Your project management tool adds an AI feature that can create tickets by reading your email. None of this goes through your procurement process because it is packaged as a feature update, not a new AI tool adoption.
The risk is not theoretical. A SaaS AI feature with access to sensitive data that is served by a compromised model, or that communicates with an attacker-controlled endpoint, represents a supply chain attack vector entirely outside your standard software inventory. Repello AI's AI Inventory discovers these integrations at the infrastructure level, surfacing AI features your organization consumes without explicit approval.
To address this: audit your SaaS vendor contracts and product changelogs for AI feature additions, review data access permissions granted to AI features, and include AI feature updates in your change management process.
5. MCP servers and agent plugins
The MCP (Model Context Protocol) ecosystem introduces a supply chain risk specific to agentic AI. When an AI agent connects to an MCP server, it trusts the tool descriptions and responses that server provides. A malicious or compromised MCP server can send tool responses containing prompt injection payloads that redirect the agent's behavior, exfiltrate data, or cause the agent to take unintended actions.
The plugin and skill ecosystem for agent frameworks (OpenClaw, LangChain tools, CrewAI integrations) carries the same risk. A malicious plugin published to a public marketplace can contain instructions designed to activate when the agent encounters specific conditions. Unlike a code package that executes in a build environment, a malicious MCP server or plugin influences agent behavior at inference time, reaching every user of that agent deployment. Our MCP security guide and MCP security checklist cover this attack surface in full. Repello AI's MCP Gateway provides real-time monitoring and control of MCP connections at the enterprise level.
To address this: treat MCP server and plugin additions with the same scrutiny as code dependencies, verify publisher identity, and monitor tool response content for prompt injection patterns at runtime.
How the best attacks avoid detection
Across all five attack surfaces, the most sophisticated AI supply chain attacks share three evasion characteristics.
The malicious component is signed or published by a trusted identity: a legitimate maintainer account, an authorized registry publisher, or a known MCP server. Standard security tooling checks what was installed, not whether the publishing identity was compromised.
The payload looks normal to static analysis. A phantom dependency with no CVE history passes signature scanning. A model weight with a backdoor passes accuracy evaluation. A RAG document that injects instructions passes content filters unless the filter specifically understands adversarial instruction formatting.
The attack cleans up after itself. The axios dropper deleted itself and restored the package manifest. LiteLLM's backdoor stages were designed to avoid persistence artifacts where not needed. A post-mortem inspection of the affected system returns a false negative.
These three characteristics define why traditional security controls (vulnerability scanners, SAST tools, antivirus) provide inadequate coverage for AI supply chain risk. They were designed to find known-bad artifacts. AI supply chain attacks are designed to look like known-good ones.
The three-layer defense model
Closing the AI supply chain attack surface requires controls at three points in the delivery chain: before a component is consumed, before it executes in production, and while it is running.
Layer 1: Inventory and version monitoring
You cannot defend what you cannot see. The first control is a complete, continuously updated map of every AI component your organization consumes: code packages, model weights, data integrations, SaaS AI features, and MCP connections.
For code dependencies, this means tracking exact version state across your AI applications and alerting on unexpected version changes. This is the control that would have caught the LiteLLM and axios attacks during their publication window rather than after execution. For models, it means maintaining a registry of approved weight files with cryptographic hashes. For SaaS AI features, it means discovering what your vendors have deployed into your data environment.
Repello AI's AI Inventory provides this visibility across all five attack surfaces, generating a continuously updated AI Bill of Materials with automated alerts on version drift, new integration discovery, and dependency graph changes.
Layer 2: Pre-production behavioral testing
Inventory tells you what changed. Behavioral testing tells you what the change does. Before a new or updated AI component reaches your build pipeline or production inference stack, it should be executed in an instrumented environment that observes its actual behavior: network connections, file operations, data access patterns, model outputs under adversarial inputs.
This is the control that static analysis and signature scanning cannot provide. A package that beacons to a C2 server will reveal itself under network observation. A model weight with a backdoor will reveal itself under targeted adversarial probing. A RAG document with adversarial instructions will reveal itself when the pipeline is tested against injection payloads.
ARTEMIS provides automated adversarial testing across AI components, including supply chain testing for new dependencies and model weights, RAG pipeline injection testing, and continuous regression testing after updates. The coverage maps to OWASP LLM Top 10, NIST AI RMF, and MITRE ATLAS threat categories.
Layer 3: Runtime monitoring and blocking
Even with inventory and pre-production testing, the runtime layer is necessary because the attack surface changes continuously. New dependencies are added. Models are updated. MCP servers return responses that were not present during pre-production testing. An attacker who has compromised a trusted component after your last test cycle will only be caught at runtime.
ARGUS monitors AI application behavior in production, blocking malicious inputs in under 100ms, detecting anomalous outbound connections, and enforcing behavioral policies derived from pre-production red teaming results. For MCP specifically, Repello AI's MCP Gateway provides real-time monitoring and control of every MCP connection, with the ability to block malicious servers and enforce custom security policies across enterprise deployments.
Building your AI supply chain security program
Most organizations are at the beginning of this. A practical starting sequence:
Start with inventory. You need to know what you are consuming before you can test or monitor it. Run an AI asset discovery exercise across your codebase, your SaaS vendor stack, and your agent configurations. Document every AI component, its source, its version, and its data access scope. AI Inventory automates this process and keeps it current.
Then close the pre-production gap. Identify which components in your AI stack are being deployed without behavioral verification. Prioritize based on data access scope and deployment reach. Packages with broad installation and access to production credentials should be tested first. ARTEMIS can be integrated directly into your CI/CD pipeline to run behavioral checks before deployment.
Then instrument production. Deploy runtime monitoring for your highest-risk AI application deployments. This gives you a detection backstop for the components that were already in production before you started this program and for any attacks that reach execution despite upstream controls. ARGUS deploys as a drop-in runtime layer with no code changes required.
If you want to understand your current exposure before starting that sequence, book a demo with Repello AI; the assessment maps your AI pipeline attack surface against the five vectors covered in this guide.
FAQ
What is an AI supply chain attack?
An AI supply chain attack compromises a dependency, model, data source, or tool that your AI application consumes rather than attacking your application directly. The attacker exploits the trust your system extends to components it installs, downloads, or integrates, inserting malicious behavior at a point where standard perimeter and application security controls do not apply.
How is AI supply chain risk different from traditional software supply chain risk?
Traditional software supply chain risk centers on code packages and build tools. AI supply chain risk adds model weights, training data, vector database content, SaaS-embedded AI features, and the MCP plugin ecosystem. Model backdoors and training data poisoning have no equivalent in traditional software; they allow attackers to shape system behavior at training time in ways that persist through application updates. The detection gap is also wider: model backdoors and data poisoning do not produce the artifact signatures that traditional security scanners look for.
What does an AI Bill of Materials include?
An AI BOM documents every AI component an organization consumes: code packages and their versions, model weights with cryptographic hashes and provenance, training data sources, SaaS AI feature integrations, and MCP server/plugin configurations. It is the foundation of AI supply chain security: you cannot monitor, test, or enforce controls on components you have not inventoried. See our AI Bill of Materials guide for the full framework.
How do attackers compromise open source AI packages?
The primary vector is maintainer account compromise rather than vulnerability exploitation in the package code itself. Attackers obtain the credentials of a legitimate package contributor through phishing, credential stuffing, or session token theft, then publish a malicious version under that trusted identity. Standard scanner tools check what was published, not whether the publisher account was compromised. Version drift monitoring and behavioral testing close this gap.
What frameworks and standards apply to AI supply chain security?
MITRE ATLAS documents AI-specific supply chain attack techniques under AML.T0010 (ML Supply Chain Compromise). OWASP LLM Top 10 covers supply chain vulnerabilities under LLM05. NIST AI RMF addresses supply chain risk in its Govern and Map functions. The EU AI Act imposes supply chain transparency requirements on high-risk AI systems, including traceability of training data and model provenance. See our MITRE ATLAS framework guide for the full technique mapping.
Can we use open source tools to cover AI supply chain security?
Open source tools like Agent Wiz can help with threat modeling AI agent systems and mapping attack paths across multi-agent architectures. For package-level behavioral testing, tools like Garak provide some coverage of model-level adversarial inputs. However, open source tooling does not provide continuous version drift monitoring, automated CI/CD integration for pre-production behavioral testing, or production runtime blocking: the three controls required to close the attack pattern demonstrated in LiteLLM, axios, and Mercor.
Share this blog
Subscribe to our newsletter











