Back to all blogs

ML Model Security vs. LLM Security: What's the Difference and Why You Need Both

ML Model Security vs. LLM Security: What's the Difference and Why You Need Both

Aryaman Behera

Aryaman Behera

|

Co-Founder, CEO

Co-Founder, CEO

Feb 20, 2026

|

7 min read

Emoji Prompt Injection: Why Your LLM's Guardrails Are Blind to It
Emoji Prompt Injection: Why Your LLM's Guardrails Are Blind to It
Emoji Prompt Injection: Why Your LLM's Guardrails Are Blind to It
Repello tech background with grid pattern symbolizing AI security

-

Understand the difference between ML model security and LLM security, how their threat surfaces interact, and why a complete AI security program must cover both layers.

TL;DR

  • ML model security targets the model as an artifact: training data poisoning, supply chain risks, and inference-time attacks like adversarial examples and model inversion.

  • LLM security targets the application layer: prompt injection, jailbreaking, RAG exfiltration, and agentic tool abuse.

  • The two threat surfaces compound. Gaps in one make the other more exploitable.

  • Most organizations are focused on LLM security while classical ML models sit in a blind spot owned by neither the security team nor the data science team.

  • A complete program starts with an AI Bill of Materials (AI-BOM), red teams both layers, and deploys runtime monitoring that covers the full inference stack.

Your security team spent the last six months getting serious about LLM security. You've deployed runtime guardrails on your ChatGPT integration, you've run a red team exercise against your customer-facing chatbot, and your incident response plan now includes AI-specific scenarios. Good. But while that work was happening, your data science team quietly pushed three new ML models into production: a fraud detection model, a credit scoring system, and a document classification tool. None of them were in scope for your AI security program. And none of them have been tested for the specific class of threats that classical ML models face.

A quick clarification before we go further: ML model security, as used here, means protecting ML models from attack, not using ML as a tool to detect threats. That distinction matters because most of what's written on this topic conflates the two. This article is about the inverse: your models are the target.

This is the gap that's opening up in enterprise AI security right now. ML model security and LLM security are related disciplines, but they cover different attack surfaces, involve different tooling, and, critically, tend to fall into different organizational ownership silos. Understanding the distinction is not academic. It is the difference between a complete AI security posture and a program with a structural blind spot.


ML model security

LLM security

Primary target

The model artifact and training pipeline

The inference-time interaction and application layer

Behavior

Deterministic: same input, same output

Probabilistic: outputs vary, creating unpredictable attack paths

Key threats

Data poisoning, model inversion, model stealing, supply chain attacks, adversarial examples

Prompt injection, jailbreaking, RAG exfiltration, agentic tool abuse

Attack surface

Narrow: data pipeline, serialized model files, inference API

Wide: prompts, retrieved context, tool calls, multi-turn conversations

Who typically owns it

Data science teams (often without adversarial mindset)

Security teams (growing awareness since 2023)

Tooling

Adversarial robustness testing, model scanning, pipeline audits

LLM red teaming, runtime guardrails, browser-mode attack simulation

Regulatory relevance

NIST AI 100-2, MITRE ATLAS, supply chain frameworks

OWASP LLM Top 10 2025, NIST AI RMF, EU AI Act, ISO 42001

What ML model security actually covers

ML model security addresses the security of a model as an artifact and as a system. That scope runs from how it was built to how it behaves when an adversary is actively probing it. NIST's Adversarial Machine Learning taxonomy (AI 100-2) is the definitive reference for this threat landscape, categorizing attacks across the full ML lifecycle from training through deployment.

The threats fall into roughly three categories.

Training-time attacks

Training-time attacks target the model before it ever reaches production. Data poisoning involves introducing malicious samples into the training set to corrupt the model's behavior in specific ways — for example, training a spam filter to systematically allow certain senders. Backdoor insertion is a more surgical form of poisoning where the attacker embeds a hidden trigger: the model behaves normally on standard inputs but produces attacker-controlled outputs whenever a specific pattern appears. These attacks are particularly dangerous because they are invisible once the model is deployed. The model passes all your usual evaluations. It just has a trapdoor.

NIST's AI 100-2 taxonomy classifies these as "poisoning attacks" and distinguishes between attacks that degrade overall performance and "targeted" backdoor attacks designed to produce specific adversary-chosen outputs on trigger inputs, while maintaining normal accuracy on clean data, making them exceptionally hard to detect through standard model evaluation.

Supply chain attacks

Supply chain attacks exploit the fact that most ML models today are not built entirely from scratch. They are assembled from pre-trained components, public datasets, and open-source libraries. Serialization vulnerabilities in formats like Python's pickle are a well-documented example: a malicious model file hosted on a public repository can execute arbitrary code the moment someone loads it.

This is not a theoretical concern. JFrog security researchers identified at least 100 malicious model instances on Hugging Face, with PyTorch pickle files used to deliver silent backdoors. A 2024 arXiv study of Hugging Face model repositories found that 59% of serialized model files use unsafe formats, meaning the majority of publicly available models carry at least theoretical execution risk on load. ReversingLabs documented a novel evasion technique they named "nullifAI," where attackers embed malicious payloads in broken pickle files using non-standard compression, bypassing Hugging Face's own scanning tools. When your data science team downloads a base model to fine-tune, they may be importing more than weights. Repello's comprehensive guide to ML model scanning covers what operationalizing supply chain protection looks like in practice, including what to scan for and at which pipeline stages.

Inference-time attacks

Inference-time attacks against classical ML models are distinct from LLM attacks. Adversarial examples are carefully crafted inputs — often imperceptibly modified images or data points — that cause the model to misclassify with high confidence. NIST AI 100-2 defines these as "evasion attacks" and notes they are particularly effective against deep learning classifiers used in security-critical applications like malware detection and fraud scoring.

Model inversion attacks reconstruct sensitive training data from model outputs. Given enough API queries, an adversary can infer what individuals were in the training set with surprising accuracy. A 2024 survey of model inversion techniques documents cases where facial features, genomic data, and financial records have been recovered from production models using only query access. MITRE ATLAS catalogs model inversion as an established adversarial technique with documented real-world instances. Model stealing attacks go further, using query access to reproduce a functional copy of the model entirely, bypassing both licensing controls and any security testing done on the original.

What all of these have in common is that they require thinking about the model itself as the attack target, not just the application built on top of it.

What LLM security covers (and where it diverges)

LLM security is about what happens at inference time when the model is used as an interface. The threat is not primarily in the weights or the training pipeline. It lives in the conversation, the context window, and the tools the model can call. For a detailed breakdown of each risk category and what it means for enterprise deployments, Repello's OWASP LLM Top 10 series for CISOs is the most practical starting reference.

There is also a foundational behavioral difference worth naming. Classical ML models are deterministic: the same input always produces the same output, so security teams can write firm rules around expected behavior. LLMs are probabilistic: the same prompt can generate different responses, and those responses can include unexpected content, fabricated data, or outputs that bypass policy controls in ways that are genuinely difficult to predict in advance. That unpredictability is not a bug — it is what makes LLMs useful. But it creates an attack surface that traditional security tooling was never designed to cover.

Prompt injection

Prompt injection has held the top spot on the OWASP LLM Top 10 since the list's inception, and retained that position in the 2025 edition. An adversary embeds malicious instructions in a user input or a retrieved document, causing the model to override its original instructions. Direct prompt injection comes from the user themselves. Indirect prompt injection comes from the environment the model reads: a poisoned web page, a malicious email, a tampered document in a RAG pipeline. OWASP defines indirect prompt injection as one of the most dangerous variants because it can affect models with no user-facing input at all — the attack surface is anything the model reads. Google's Gemini and Microsoft Copilot have both been demonstrated susceptible to indirect prompt injection through external documents. These are not theoretical attacks.

For a full breakdown of prompt injection patterns and real-world examples, see Repello's prompt injection attack examples.

Jailbreaking

Jailbreaking attempts to bypass the model's safety and policy controls through carefully constructed inputs: role-playing scenarios, hypothetical framings, or more technical approaches that exploit model behavior at the token prediction level. A jailbroken enterprise LLM can produce outputs that violate compliance requirements, leak internal information, or generate content that creates regulatory liability.

Agentic attacks

Agentic attacks represent the newest and most serious category. OWASP LLM06:2025 (Excessive Agency) addresses this directly: as LLMs gain the ability to call tools, browse the web, execute code, and take actions in external systems, the attack surface expands dramatically. A prompt injection in an agentic pipeline does not just manipulate an output — it can trigger unauthorized transactions, exfiltrate data to external endpoints, or escalate permissions through a sequence of individually plausible tool calls. MITRE ATLAS documents agentic AI abuse as an emerging category in its adversarial threat taxonomy.

RAG exfiltration

RAG exfiltration exploits the retrieval layer. OWASP classifies this under LLM08:2025 (Vector and Embedding Weaknesses): if an attacker can influence what documents get retrieved, they can use those documents to inject instructions or pull sensitive information out of the context window through the model's responses. Repello's research into RAG poisoning against Llama3 demonstrates concretely how a poisoned retrieval corpus can cause a production model to produce attacker-controlled outputs.

The key characteristic of LLM security threats is that they live at the application layer. The model weights could be perfectly clean. The inference infrastructure could be properly hardened. And the system can still be completely compromised through the way it is used.

Why the two threat surfaces compound

It is tempting to treat ML model security and LLM security as two separate columns on a checklist. In practice, they interact in ways that make gaps in either more dangerous.

A model trained on poisoned data does not just produce wrong outputs in isolation. Certain forms of training-time manipulation can make a model more susceptible to particular prompt injection patterns at inference time, because the behavior being induced aligns with what the attacker planted during training. You have tested your LLM for prompt injection resistance, but your baseline was already compromised.

Conversely, strong inference-time protections do not retroactively fix a compromised training pipeline. If your fraud detection model was trained with backdoored data, blocking prompt injection attacks on your chatbot does not help you. Repello's research into indirect prompt injection via Gmail agentic workflows illustrates how an attacker can chain these layers: a compromised agentic pipeline becomes the delivery mechanism for attacks that neither ML security controls nor LLM guardrails, applied in isolation, would catch.

There is also the visibility problem. According to IBM's 2024 Cost of a Data Breach Report, the global average breach cost reached a record $4.88 million, with one in five organizations experiencing breaches linked to shadow AI — unsanctioned AI tools and models deployed without security oversight. Those shadow AI incidents added an average of $670,000 to breach costs and disproportionately exposed customer PII and intellectual property. Among organizations that suffered AI-related breaches, 97% lacked proper AI access controls. Repello platform data shows organizations typically discover 40% more AI assets than they expected when they first run an automated discovery scan.

The organizational gap that makes this dangerous

Ask most enterprise security teams who owns ML model security. You will get a pause.

Data science teams build and deploy classical ML models. They are thinking about accuracy, latency, and drift. They are generally not thinking adversarially about their training pipelines or testing their models for inversion attacks. Security teams, meanwhile, are increasingly focused on the LLM application layer — prompt injection, jailbreaking, agentic risks. These have dominated the AI security conversation since 2023, there is a growing tooling ecosystem, and they map naturally onto the application security workflows security teams already own.

The classical ML model estate often falls in between. It is technically infrastructure owned by data science, but the risk profile is clearly a security problem. The result is that nobody is systematically red-teaming the fraud models, the recommendation engines, or the document classifiers for the threats specific to that class of system.

As Repello's essential guide to AI red teaming notes, AI red teaming requires a fundamentally different approach from conventional penetration testing, and most security teams are still building that capability. IBM's 2024 report found that only 24% of generative AI initiatives are currently secured — but classical ML deployments, which predate the GenAI conversation entirely, are even further behind.

What a complete ML model security program looks like in practice

Closing the gap requires treating ML model security and LLM security as complementary parts of a unified AI security program rather than separate workstreams.

The starting point is inventory. You cannot protect what you cannot see. A comprehensive AI Bill of Materials (AI-BOM) should capture every model in the organization: classical ML models, fine-tuned LLMs, third-party AI APIs, agentic systems, and experimental deployments. NIST's AI Risk Management Framework frames governance and inventory as the foundation of responsible AI deployment. Repello's AI Asset Inventory automates this discovery process — Repello platform data shows organizations typically find 40% more AI assets than they expected when they first run a scan. Each asset should be tiered by risk based on the sensitivity of the data it accesses, the actions it can take, and the business criticality of the system it powers.

From there, red teaming needs to cover both layers. For classical ML models, that means testing for adversarial robustness as defined in NIST AI 100-2, checking model serialization for vulnerabilities, auditing the data pipeline for poisoning risk, and validating that inference APIs cannot be abused for model stealing or inversion. For LLMs, it means testing for all ten OWASP LLM Top 10 2025 risks across the full attack surface: at the API level, through the actual user interface, through multi-turn conversations, through the retrieval pipeline, and through any agentic tool integrations.

This is the approach Repello takes through ARTEMIS, its automated red teaming engine. Where most red-teaming tools operate only at the API level, ARTEMIS includes a Browser Agent mode that navigates the actual application UI exactly as a human attacker would, testing multi-turn workflows, file upload paths, and tool chains that API-only testing misses entirely. Findings from red teaming feed directly into runtime protection through ARGUS, which enforces adaptive guardrails in under 100 milliseconds without noticeable latency impact. The loop closes: testing informs protection, and protection data informs the next round of testing.

Compliance requirements reinforce the case for covering both layers. OWASP's LLM Top 10, NIST AI RMF, MITRE ATLAS, and ISO 42001 all span training-time and inference-time risks. An AI security program that addresses only the LLM application layer while ignoring the classical ML estate will have material gaps in any compliance assessment.

Frequently asked questions

What is the difference between ML model security and LLM security? ML model security covers threats to the model artifact itself, as taxonomized by NIST AI 100-2: training data poisoning, supply chain attacks through malicious pre-trained components, and inference-time attacks like adversarial examples and model inversion. LLM security covers threats at the application layer during inference, as cataloged by the OWASP LLM Top 10 2025: prompt injection, jailbreaking, RAG exfiltration, and agentic tool abuse. Both disciplines require different tooling and typically fall under different team ownership, but they interact in ways that make gaps in either more dangerous.

What are the biggest threats to ML models in production? The main threats are adversarial examples (crafted inputs that cause misclassification), model inversion attacks (reconstructing training data from model outputs), model stealing (reproducing model behavior via API queries), training data poisoning (corrupting model behavior at training time), and supply chain attacks through malicious pre-trained components. JFrog researchers found at least 100 malicious model instances on Hugging Face using pickle serialization to deliver backdoors, and a 2024 arXiv study found 59% of serialized Hugging Face model files use unsafe formats.

Can prompt injection attacks affect classical ML models? Prompt injection is specific to LLMs, not classical ML models. However, the threat surfaces interact. A model trained on poisoned data can be made more susceptible to certain LLM-layer attacks. A complete AI security program must address both ML model security and LLM security to avoid gaps that compound each other.

What is an AI-BOM and why does it matter for ML model security? An AI Bill of Materials (AI-BOM) is a comprehensive inventory of every AI model, agent, and application in an organization, including the data each system accesses, the tools it can call, and its business risk tier. It is the foundational step for any ML model security program. NIST's AI Risk Management Framework identifies governance and inventory as prerequisites for responsible AI deployment. Repello platform data shows organizations typically discover 40% more AI assets than they expected when they first run an automated discovery scan.

How do I start building an ML model security program? Start with inventory: build an AI-BOM mapping every classical ML model and LLM deployment across your organization. Apply risk tiering based on data sensitivity and business criticality. Then red team both layers: test classical ML models for adversarial robustness per NIST AI 100-2, and test LLM applications against the OWASP LLM Top 10 2025. Finally, deploy runtime monitoring that covers the full inference stack. IBM's 2024 Cost of a Data Breach Report found that 97% of organizations that suffered AI-related breaches lacked proper AI access controls — the most fundamental control being visibility into what you are running.

Secure your full AI stack — not just the chatbot

Most AI security programs start with LLM security because that's where the noise is. The fraud models, the recommendation engines, and the document classifiers running quietly in production are the gap. Book a demo with Repello to see how ARTEMIS and ARGUS cover both layers — from training pipeline through agentic runtime.

TL;DR

  • ML model security targets the model as an artifact: training data poisoning, supply chain risks, and inference-time attacks like adversarial examples and model inversion.

  • LLM security targets the application layer: prompt injection, jailbreaking, RAG exfiltration, and agentic tool abuse.

  • The two threat surfaces compound. Gaps in one make the other more exploitable.

  • Most organizations are focused on LLM security while classical ML models sit in a blind spot owned by neither the security team nor the data science team.

  • A complete program starts with an AI Bill of Materials (AI-BOM), red teams both layers, and deploys runtime monitoring that covers the full inference stack.

Your security team spent the last six months getting serious about LLM security. You've deployed runtime guardrails on your ChatGPT integration, you've run a red team exercise against your customer-facing chatbot, and your incident response plan now includes AI-specific scenarios. Good. But while that work was happening, your data science team quietly pushed three new ML models into production: a fraud detection model, a credit scoring system, and a document classification tool. None of them were in scope for your AI security program. And none of them have been tested for the specific class of threats that classical ML models face.

A quick clarification before we go further: ML model security, as used here, means protecting ML models from attack, not using ML as a tool to detect threats. That distinction matters because most of what's written on this topic conflates the two. This article is about the inverse: your models are the target.

This is the gap that's opening up in enterprise AI security right now. ML model security and LLM security are related disciplines, but they cover different attack surfaces, involve different tooling, and, critically, tend to fall into different organizational ownership silos. Understanding the distinction is not academic. It is the difference between a complete AI security posture and a program with a structural blind spot.


ML model security

LLM security

Primary target

The model artifact and training pipeline

The inference-time interaction and application layer

Behavior

Deterministic: same input, same output

Probabilistic: outputs vary, creating unpredictable attack paths

Key threats

Data poisoning, model inversion, model stealing, supply chain attacks, adversarial examples

Prompt injection, jailbreaking, RAG exfiltration, agentic tool abuse

Attack surface

Narrow: data pipeline, serialized model files, inference API

Wide: prompts, retrieved context, tool calls, multi-turn conversations

Who typically owns it

Data science teams (often without adversarial mindset)

Security teams (growing awareness since 2023)

Tooling

Adversarial robustness testing, model scanning, pipeline audits

LLM red teaming, runtime guardrails, browser-mode attack simulation

Regulatory relevance

NIST AI 100-2, MITRE ATLAS, supply chain frameworks

OWASP LLM Top 10 2025, NIST AI RMF, EU AI Act, ISO 42001

What ML model security actually covers

ML model security addresses the security of a model as an artifact and as a system. That scope runs from how it was built to how it behaves when an adversary is actively probing it. NIST's Adversarial Machine Learning taxonomy (AI 100-2) is the definitive reference for this threat landscape, categorizing attacks across the full ML lifecycle from training through deployment.

The threats fall into roughly three categories.

Training-time attacks

Training-time attacks target the model before it ever reaches production. Data poisoning involves introducing malicious samples into the training set to corrupt the model's behavior in specific ways — for example, training a spam filter to systematically allow certain senders. Backdoor insertion is a more surgical form of poisoning where the attacker embeds a hidden trigger: the model behaves normally on standard inputs but produces attacker-controlled outputs whenever a specific pattern appears. These attacks are particularly dangerous because they are invisible once the model is deployed. The model passes all your usual evaluations. It just has a trapdoor.

NIST's AI 100-2 taxonomy classifies these as "poisoning attacks" and distinguishes between attacks that degrade overall performance and "targeted" backdoor attacks designed to produce specific adversary-chosen outputs on trigger inputs, while maintaining normal accuracy on clean data, making them exceptionally hard to detect through standard model evaluation.

Supply chain attacks

Supply chain attacks exploit the fact that most ML models today are not built entirely from scratch. They are assembled from pre-trained components, public datasets, and open-source libraries. Serialization vulnerabilities in formats like Python's pickle are a well-documented example: a malicious model file hosted on a public repository can execute arbitrary code the moment someone loads it.

This is not a theoretical concern. JFrog security researchers identified at least 100 malicious model instances on Hugging Face, with PyTorch pickle files used to deliver silent backdoors. A 2024 arXiv study of Hugging Face model repositories found that 59% of serialized model files use unsafe formats, meaning the majority of publicly available models carry at least theoretical execution risk on load. ReversingLabs documented a novel evasion technique they named "nullifAI," where attackers embed malicious payloads in broken pickle files using non-standard compression, bypassing Hugging Face's own scanning tools. When your data science team downloads a base model to fine-tune, they may be importing more than weights. Repello's comprehensive guide to ML model scanning covers what operationalizing supply chain protection looks like in practice, including what to scan for and at which pipeline stages.

Inference-time attacks

Inference-time attacks against classical ML models are distinct from LLM attacks. Adversarial examples are carefully crafted inputs — often imperceptibly modified images or data points — that cause the model to misclassify with high confidence. NIST AI 100-2 defines these as "evasion attacks" and notes they are particularly effective against deep learning classifiers used in security-critical applications like malware detection and fraud scoring.

Model inversion attacks reconstruct sensitive training data from model outputs. Given enough API queries, an adversary can infer what individuals were in the training set with surprising accuracy. A 2024 survey of model inversion techniques documents cases where facial features, genomic data, and financial records have been recovered from production models using only query access. MITRE ATLAS catalogs model inversion as an established adversarial technique with documented real-world instances. Model stealing attacks go further, using query access to reproduce a functional copy of the model entirely, bypassing both licensing controls and any security testing done on the original.

What all of these have in common is that they require thinking about the model itself as the attack target, not just the application built on top of it.

What LLM security covers (and where it diverges)

LLM security is about what happens at inference time when the model is used as an interface. The threat is not primarily in the weights or the training pipeline. It lives in the conversation, the context window, and the tools the model can call. For a detailed breakdown of each risk category and what it means for enterprise deployments, Repello's OWASP LLM Top 10 series for CISOs is the most practical starting reference.

There is also a foundational behavioral difference worth naming. Classical ML models are deterministic: the same input always produces the same output, so security teams can write firm rules around expected behavior. LLMs are probabilistic: the same prompt can generate different responses, and those responses can include unexpected content, fabricated data, or outputs that bypass policy controls in ways that are genuinely difficult to predict in advance. That unpredictability is not a bug — it is what makes LLMs useful. But it creates an attack surface that traditional security tooling was never designed to cover.

Prompt injection

Prompt injection has held the top spot on the OWASP LLM Top 10 since the list's inception, and retained that position in the 2025 edition. An adversary embeds malicious instructions in a user input or a retrieved document, causing the model to override its original instructions. Direct prompt injection comes from the user themselves. Indirect prompt injection comes from the environment the model reads: a poisoned web page, a malicious email, a tampered document in a RAG pipeline. OWASP defines indirect prompt injection as one of the most dangerous variants because it can affect models with no user-facing input at all — the attack surface is anything the model reads. Google's Gemini and Microsoft Copilot have both been demonstrated susceptible to indirect prompt injection through external documents. These are not theoretical attacks.

For a full breakdown of prompt injection patterns and real-world examples, see Repello's prompt injection attack examples.

Jailbreaking

Jailbreaking attempts to bypass the model's safety and policy controls through carefully constructed inputs: role-playing scenarios, hypothetical framings, or more technical approaches that exploit model behavior at the token prediction level. A jailbroken enterprise LLM can produce outputs that violate compliance requirements, leak internal information, or generate content that creates regulatory liability.

Agentic attacks

Agentic attacks represent the newest and most serious category. OWASP LLM06:2025 (Excessive Agency) addresses this directly: as LLMs gain the ability to call tools, browse the web, execute code, and take actions in external systems, the attack surface expands dramatically. A prompt injection in an agentic pipeline does not just manipulate an output — it can trigger unauthorized transactions, exfiltrate data to external endpoints, or escalate permissions through a sequence of individually plausible tool calls. MITRE ATLAS documents agentic AI abuse as an emerging category in its adversarial threat taxonomy.

RAG exfiltration

RAG exfiltration exploits the retrieval layer. OWASP classifies this under LLM08:2025 (Vector and Embedding Weaknesses): if an attacker can influence what documents get retrieved, they can use those documents to inject instructions or pull sensitive information out of the context window through the model's responses. Repello's research into RAG poisoning against Llama3 demonstrates concretely how a poisoned retrieval corpus can cause a production model to produce attacker-controlled outputs.

The key characteristic of LLM security threats is that they live at the application layer. The model weights could be perfectly clean. The inference infrastructure could be properly hardened. And the system can still be completely compromised through the way it is used.

Why the two threat surfaces compound

It is tempting to treat ML model security and LLM security as two separate columns on a checklist. In practice, they interact in ways that make gaps in either more dangerous.

A model trained on poisoned data does not just produce wrong outputs in isolation. Certain forms of training-time manipulation can make a model more susceptible to particular prompt injection patterns at inference time, because the behavior being induced aligns with what the attacker planted during training. You have tested your LLM for prompt injection resistance, but your baseline was already compromised.

Conversely, strong inference-time protections do not retroactively fix a compromised training pipeline. If your fraud detection model was trained with backdoored data, blocking prompt injection attacks on your chatbot does not help you. Repello's research into indirect prompt injection via Gmail agentic workflows illustrates how an attacker can chain these layers: a compromised agentic pipeline becomes the delivery mechanism for attacks that neither ML security controls nor LLM guardrails, applied in isolation, would catch.

There is also the visibility problem. According to IBM's 2024 Cost of a Data Breach Report, the global average breach cost reached a record $4.88 million, with one in five organizations experiencing breaches linked to shadow AI — unsanctioned AI tools and models deployed without security oversight. Those shadow AI incidents added an average of $670,000 to breach costs and disproportionately exposed customer PII and intellectual property. Among organizations that suffered AI-related breaches, 97% lacked proper AI access controls. Repello platform data shows organizations typically discover 40% more AI assets than they expected when they first run an automated discovery scan.

The organizational gap that makes this dangerous

Ask most enterprise security teams who owns ML model security. You will get a pause.

Data science teams build and deploy classical ML models. They are thinking about accuracy, latency, and drift. They are generally not thinking adversarially about their training pipelines or testing their models for inversion attacks. Security teams, meanwhile, are increasingly focused on the LLM application layer — prompt injection, jailbreaking, agentic risks. These have dominated the AI security conversation since 2023, there is a growing tooling ecosystem, and they map naturally onto the application security workflows security teams already own.

The classical ML model estate often falls in between. It is technically infrastructure owned by data science, but the risk profile is clearly a security problem. The result is that nobody is systematically red-teaming the fraud models, the recommendation engines, or the document classifiers for the threats specific to that class of system.

As Repello's essential guide to AI red teaming notes, AI red teaming requires a fundamentally different approach from conventional penetration testing, and most security teams are still building that capability. IBM's 2024 report found that only 24% of generative AI initiatives are currently secured — but classical ML deployments, which predate the GenAI conversation entirely, are even further behind.

What a complete ML model security program looks like in practice

Closing the gap requires treating ML model security and LLM security as complementary parts of a unified AI security program rather than separate workstreams.

The starting point is inventory. You cannot protect what you cannot see. A comprehensive AI Bill of Materials (AI-BOM) should capture every model in the organization: classical ML models, fine-tuned LLMs, third-party AI APIs, agentic systems, and experimental deployments. NIST's AI Risk Management Framework frames governance and inventory as the foundation of responsible AI deployment. Repello's AI Asset Inventory automates this discovery process — Repello platform data shows organizations typically find 40% more AI assets than they expected when they first run a scan. Each asset should be tiered by risk based on the sensitivity of the data it accesses, the actions it can take, and the business criticality of the system it powers.

From there, red teaming needs to cover both layers. For classical ML models, that means testing for adversarial robustness as defined in NIST AI 100-2, checking model serialization for vulnerabilities, auditing the data pipeline for poisoning risk, and validating that inference APIs cannot be abused for model stealing or inversion. For LLMs, it means testing for all ten OWASP LLM Top 10 2025 risks across the full attack surface: at the API level, through the actual user interface, through multi-turn conversations, through the retrieval pipeline, and through any agentic tool integrations.

This is the approach Repello takes through ARTEMIS, its automated red teaming engine. Where most red-teaming tools operate only at the API level, ARTEMIS includes a Browser Agent mode that navigates the actual application UI exactly as a human attacker would, testing multi-turn workflows, file upload paths, and tool chains that API-only testing misses entirely. Findings from red teaming feed directly into runtime protection through ARGUS, which enforces adaptive guardrails in under 100 milliseconds without noticeable latency impact. The loop closes: testing informs protection, and protection data informs the next round of testing.

Compliance requirements reinforce the case for covering both layers. OWASP's LLM Top 10, NIST AI RMF, MITRE ATLAS, and ISO 42001 all span training-time and inference-time risks. An AI security program that addresses only the LLM application layer while ignoring the classical ML estate will have material gaps in any compliance assessment.

Frequently asked questions

What is the difference between ML model security and LLM security? ML model security covers threats to the model artifact itself, as taxonomized by NIST AI 100-2: training data poisoning, supply chain attacks through malicious pre-trained components, and inference-time attacks like adversarial examples and model inversion. LLM security covers threats at the application layer during inference, as cataloged by the OWASP LLM Top 10 2025: prompt injection, jailbreaking, RAG exfiltration, and agentic tool abuse. Both disciplines require different tooling and typically fall under different team ownership, but they interact in ways that make gaps in either more dangerous.

What are the biggest threats to ML models in production? The main threats are adversarial examples (crafted inputs that cause misclassification), model inversion attacks (reconstructing training data from model outputs), model stealing (reproducing model behavior via API queries), training data poisoning (corrupting model behavior at training time), and supply chain attacks through malicious pre-trained components. JFrog researchers found at least 100 malicious model instances on Hugging Face using pickle serialization to deliver backdoors, and a 2024 arXiv study found 59% of serialized Hugging Face model files use unsafe formats.

Can prompt injection attacks affect classical ML models? Prompt injection is specific to LLMs, not classical ML models. However, the threat surfaces interact. A model trained on poisoned data can be made more susceptible to certain LLM-layer attacks. A complete AI security program must address both ML model security and LLM security to avoid gaps that compound each other.

What is an AI-BOM and why does it matter for ML model security? An AI Bill of Materials (AI-BOM) is a comprehensive inventory of every AI model, agent, and application in an organization, including the data each system accesses, the tools it can call, and its business risk tier. It is the foundational step for any ML model security program. NIST's AI Risk Management Framework identifies governance and inventory as prerequisites for responsible AI deployment. Repello platform data shows organizations typically discover 40% more AI assets than they expected when they first run an automated discovery scan.

How do I start building an ML model security program? Start with inventory: build an AI-BOM mapping every classical ML model and LLM deployment across your organization. Apply risk tiering based on data sensitivity and business criticality. Then red team both layers: test classical ML models for adversarial robustness per NIST AI 100-2, and test LLM applications against the OWASP LLM Top 10 2025. Finally, deploy runtime monitoring that covers the full inference stack. IBM's 2024 Cost of a Data Breach Report found that 97% of organizations that suffered AI-related breaches lacked proper AI access controls — the most fundamental control being visibility into what you are running.

Secure your full AI stack — not just the chatbot

Most AI security programs start with LLM security because that's where the noise is. The fraud models, the recommendation engines, and the document classifiers running quietly in production are the gap. Book a demo with Repello to see how ARTEMIS and ARGUS cover both layers — from training pipeline through agentic runtime.

Share this blog

Subscribe to our newsletter

Repello tech background with grid pattern symbolizing AI security
Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.

Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.

Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.