AI Red Teaming Vendor Pricing: What You'll Actually Pay in 2026

TL;DR: Every AI red teaming vendor quotes "custom pricing" on the first call. Behind that phrase are five pricing models: Red Team as a Service per engagement ($16K to $100K typical), platform license per asset, per-test usage ($8K to $150K by system shape), hybrid platform-plus-services, and freemium with paid escalation ($20 per month entry points exist). This post explains each model, the scope multipliers that move the number, the eight hidden line items your CFO will surface at signing, and the six questions that decode any vendor's pricing in one 30-minute demo. Repello uses the platform-license-per-asset model and answers all six on the first call.

Why everyone says "custom pricing"#

Type ai red teaming vendor pricing into search and the first organic result is the AI Governance Library, which states plainly that public discussion of cost benchmarking is limited in this category. That observation is correct. Almost no commercial AI red teaming vendor publishes a price list. Spend an hour on G2, Capterra, and vendor websites and you will collect the same two words on every comparison page: contact sales.

Three forces produce that opacity. The first is real scope variability. A single text-only chatbot pen test is not the same product as a 10-agent system with PII flowing through tool calls and a RAG pipeline reading a regulated corpus. Vendors who publish one number have to caveat it so heavily that the number stops being useful. The second is anchoring. If a procurement team sees $20K on a website, that is now the ceiling in their head, even when the actual work is $80K of value. Sales teams resist that math. The third is the absence of a category-standard pricing unit. SaaS converged on per-seat-per-month two decades ago. AI red teaming has not converged on engagement, asset, test, or seat. Every vendor invents their own denominator.

The honest version is that opacity benefits the vendor more than the buyer. The five models below cover the entire commercial surface in 2026, and once you know which one a vendor is selling, the rest of the pricing conversation becomes routine. Repello publishes its model and answers the six demo-day questions in writing — we walk through how at the end of the post. The pricing-model framework first.

Platform license per asset is the only model whose unit of cost matches the cadence of ongoing assurance; everything else trades predictability or coverage.

The five pricing models you'll encounter#

Read this table once, then read the sections below for the detail. Every commercial vendor in the category fits into one of these five shapes, or sells a labeled combination of two of them.

Model	What you buy	Published anchor	Suits	Predictability
Red Team as a Service (per engagement)	A scoped, time-boxed assessment ending in a report	$16K to $100K (services-firm ranges in public)	One-off audits, compliance windows	Low across years
Platform license, per asset (Repello)	Annual access to a testing platform scoped to N AI applications	Broader market opaque; published tier pricing rare	Multi-model orgs needing ongoing assurance	High
Per-test, usage-based	A unit price per attack, scenario, or test run	$8K to $150K by system type (public platform ranges)	Irregular testers, automated pipelines	Variable with usage
Hybrid (platform plus services)	Annual platform plus scoped engagements bundled	Public ranges rare; enterprise standard	Programs that need both continuous and deep work	Moderate
Freemium with paid escalation	Free or low-cost entry, paid feature gates	~$20 per month entry tiers exist	Initial assessment, budget gating	High at the bottom, low at the top

1. Red Team as a Service (per engagement)#

This is the closest analog to a traditional pen test contract. You agree on a scope, the vendor runs the engagement over a defined window, and you receive a report. Services firms in the space publish minimums around $16K, with full engagement ranges running $10K to $100K depending on system depth and reporting scope. The width of that range tells you the model's defining trait: it is responsive to scope and almost impossible to predict across multiple years.

Per-engagement pricing suits compliance windows, vendor risk reviews, and pre-launch audits where the testing is bounded by a specific question and a specific date. It is awkward for continuous assurance because every change to the AI system invites a separate scoping conversation, and the renewal pattern is renegotiation, not a clean uplift.

The CFO question to ask: what is the per-engagement cost ceiling, and what triggers the next engagement? If the vendor sells one report per year, you have an annual line item. If the vendor's average customer buys three engagements a year because the AI system changes that often, the headline price misrepresents the real run rate.

2. Platform license, per asset (the model Repello uses)#

Most commercial AI red teaming platforms price annually against the number of AI assets in scope. An asset is usually defined as an application, an LLM-powered feature, or a deployed agent. The model resembles a SaaS contract: annual commitment, per-asset uplift as you bring more applications under test, optional add-ons for specialty modules (browser-mode testing, MCP coverage, agentic red teaming).

Public pricing for this category is opaque almost across the board. Most platform vendors — including the major commercial players — quote a per-asset price that scales by application complexity, with a tier above which custom enterprise pricing applies. Some publish tier ranges on their site; most do not.

Platform pricing suits organizations with multiple AI applications and an expectation of ongoing testing rather than annual snapshots. The unit of cost matches the unit of work — the application under test — and the renewal pattern is closer to standard B2B SaaS than to bespoke services. It is also the only model whose default output structure produces an auditor-ready evidence package without a separate professional-services bolt-on.

This is the model Repello sits in, deliberately. Asset definitions are explicit on the contract (one application, one agent, one MCP surface — not five aliases of the same thing). Browser-mode coverage and agentic-red-team scenarios are bundled, not gated. The audit evidence package — scoping doc, methodology mapped to NIST AI 600-1 and MITRE ATLAS, raw test artifacts, retest evidence, control mapping — ships as the default output, not a paid add-on. We answer the six demo-day questions in writing.

The CFO question to ask any vendor selling this model: what counts as one asset, and what is the per-asset uplift when we add the next ten applications? If a multi-agent system is one asset to your team and five assets to the vendor, the number changes by a factor of five at signing.

3. Per-test, usage-based#

A handful of vendors price per test or per attack, where the unit is one full attack run against one target. Public ranges show $8K to $150K scoped by system type, with the top of the range covering complex agentic systems and the bottom covering single-purpose chatbots. Per-test models also show up in API-priced platforms that meter test runs as you go.

The model suits organizations that test irregularly: a quarterly safety review, a pre-release adversarial run, a one-off red team against a new feature. It also fits automated pipelines that fire tests as part of CI, where the unit cost lines up with how often the pipeline runs.

The risk is variance. Per-test pricing rewards small users and punishes growth. A team that starts running tests on every pull request can see a usage-based contract triple in a quarter without an obvious upgrade signal.

The CFO question to ask: what counts as one test? Is a 50-prompt jailbreak suite one test or 50 tests? Is a multi-turn conversation one test or one test per turn? The definition of the unit decides whether the model is cheap or expensive against your usage pattern.

4. Hybrid (platform plus services)#

Most enterprise-tier vendors run a hybrid. The customer signs an annual platform contract for ongoing testing and a separate scoped services engagement for the deeper work: a quarterly deep dive, a pre-audit assessment, a specialized red team against a new agentic system. Public ranges for hybrid pricing are rare because the services bundle absorbs most of the variance, but enterprise contracts in the category routinely land between $75K and $400K per year for organizations with two or more AI applications in production.

Hybrid suits programs that need both continuous coverage and periodic deep testing. The platform handles the always-on layer; the services component handles the specialist work that needs human attackers. The trap is the bundle. Vendors love hybrid contracts because they make line items hard to unbundle at renewal, which is exactly why you should ask for an itemized version of the proposal up front.

The CFO question to ask: what is the platform, what is the services component, and what is the renewal escalator on each piece independently? If the vendor cannot separate them on paper, you cannot reduce the contract at renewal even if usage drops.

5. Freemium with paid escalation#

A growing slice of the market sells a free or near-free entry tier and gates higher-impact features behind a paid plan. Entry tiers near $20 per month exist, and several open-source-derived platforms offer free community editions with paid upgrades for advanced attack libraries, integrations, or compliance mapping. The model resembles developer-tier SaaS: easy to start, friction-free for trial, monetized at the upgrade.

Freemium suits a buyer who wants to assess the vendor and the AI system simultaneously, or a team running an initial baseline before deciding what kind of long-term coverage to buy. It is a poor fit for procurement teams who need to commit to a full program in one purchase decision, because the upgrade path is rarely transparent until you hit the feature gate.

The CFO question to ask: what feature gates trigger the upgrade, and how does pricing scale once we cross them? A $20-per-month entry that becomes $4,000 per month at the first useful feature is not really $20-per-month pricing.

What "scope" actually means#

Three multipliers move every vendor's headline number. System complexity is the largest. A single text-only chatbot might come in at 1x. A RAG application with a retrieval index over sensitive content adds the indirect-injection surface and the document-corpus testing, pushing toward 2x to 3x. A multi-agent system with tool calls, MCP connections, and inter-agent message passing carries 5x to 10x of the cost of the chatbot because every tool boundary and every agent-to-agent hand-off is a separate test surface. The pillar guidance in the essential guide to AI red teaming breaks down where the testing time actually goes.

Data sensitivity is the second multiplier. A chatbot answering questions from a public knowledge base is not the same procurement object as a chatbot reading patient records or financial documents. Regulated data demands stricter handling for the test artifacts themselves, often including evidence destruction policies, supervised testing windows, and chain-of-custody documentation. Vendors price the difference.

Test depth is the third. A point-in-time audit produces a snapshot. Continuous testing produces a stream of results that need to be triaged, tracked, and retested. Continuous costs more, and it produces a different output. If your auditor wants evidence that an exploit closed two weeks after disclosure, point-in-time pricing will not buy you that proof.

Five progressively wider bars showing how AI red teaming cost scales with system shape. Row 1 (1x baseline): single text-only chatbot on public data — one application, one surface, one report. Row 2 (2x): chatbot plus public RAG — adds retrieval index and indirect-injection surface. Row 3 (3 to 5x): internal RAG over sensitive data — adds regulated-data handling, evidence destruction, supervised testing windows. Row 4 (5 to 7x): multi-agent system with tool calls — every tool boundary and agent-to-agent hand-off is a new test surface, including MCP coverage. Row 5 (7 to 10x, focal): agentic system with tool calls plus regulated or PII data — combines every preceding multiplier with ongoing retest cycles and chain-of-custody. — A baseline chatbot priced at $20K to $40K becomes a $200K to $500K-plus program once tool calls and regulated data enter scope.

A worked illustrative example. Real numbers vary by vendor and by year, and the table below is for shape, not for quotes.

System under test	Indicative first-year cost	What drives it
Single text-only chatbot, public data, annual audit	$20K to $40K	One application, one surface, one report
RAG application, internal documents, quarterly testing	$60K to $120K	Document corpus testing, four testing cycles per year
Multi-agent system with tools and MCP, regulated data, continuous	$200K to $500K+	Multiple agents and tools, sensitive data handling, ongoing assurance

Disclaimer: these are illustrative ranges based on public data points and our observation of typical commercial contracts. Treat them as anchoring, not as quotes. The next section is where most procurement teams get surprised.

Hidden cost categories your CFO will ask about#

Eight line items sit outside almost every published headline. The honest move is to ask for each of them explicitly during the first scoping conversation, before you anchor on a number.

Setup and onboarding fees. One-time charge to configure the testing environment against your system, set credentials, define scope. Sometimes baked into the platform license, often surfaced as a separate professional services line.
Retest fees. After a vulnerability is closed, the auditor needs evidence that the fix worked. Some platforms include retests in the license; some charge per retest cycle. Retest cycle costs add up fast on programs with active remediation.
Scope-creep fees. When your AI system grows mid-contract (a new feature, a new agent, a new MCP integration), the testing surface grows with it. Vendors handle this with mid-year scope amendments that carry their own line items.
Framework-mapping add-ons. Mapping findings to NIST AI 600-1, MITRE ATLAS, ISO 42001, or OWASP LLM Top 10 is sometimes a paid add-on rather than a default output. If your compliance program requires one of these mappings, ask whether it is included or billable.
Report customization. The default report format the vendor produces is the cheapest one. A custom executive summary, a board-facing one-pager, or a SOC 2 evidence package usually carries an extra fee.
Framework re-mapping. Frameworks update. NIST AI 600-1 will not be the last revision of the generative AI profile, and ISO 42001 will not stay frozen. When a framework updates, the vendor may charge to re-map existing findings to the new control set.
Renewal escalator. Year-over-year price uplift baked into the contract. Standard SaaS is 3 to 7 percent; AI red teaming vendors commonly carry 10 to 15 percent escalators given the immaturity of the category.
Specialty module fees. Browser-mode testing, agentic red teaming, MCP coverage, and multimodal testing are sometimes platform-included and sometimes separately priced. The same vendor's pricing card may treat a chatbot test as base and an agent test as a paid module.

None of these line items are scandalous on their own. The scandal is when they are absent from the proposal you sign and then appear on the invoice. Itemized proposals make this category visible.

For reference, Repello's pricing card includes the first six categories — setup, retests, scope amendments within the annual asset count, framework mapping to NIST AI 600-1 / MITRE ATLAS / ISO 42001 / OWASP LLM Top 10, custom executive summaries, and framework re-mapping when the spec updates — as defaults of the platform license. Renewal escalator and specialty modules are negotiated, not hidden. Comparing a Repello proposal to one with all eight items billed separately is rarely an apples-to-apples comparison on the headline number alone.

What your auditor will accept#

Compliance auditors do not accept a vendor's marketing assessment as evidence. SOC 2, ISO 42001, and the EU AI Act each demand a specific evidence package, and the package looks structurally similar across the three.

The auditor wants the scoping document that defines what was tested and what was excluded. They want the methodology, ideally mapped to a public framework like NIST AI 600-1 or MITRE ATLAS, because a methodology mapped to a public framework is reviewable in a way a proprietary methodology is not. They want test artifacts, meaning the actual attack traces and tool outputs, not just summary statistics. They want retest evidence proving that remediated findings stayed remediated. And they want mapping to controls, so the test results can be tied to the framework clauses they back.

A pricing model that produces this evidence package as a default output costs more upfront than one that produces a summary PDF. It also costs less downstream, because the audit team is not paying for additional vendor work to assemble the package from raw findings two weeks before the audit window. The right pricing model is the one whose output your auditor can use without translation.

Repello's platform ships the full evidence package — scoping doc, NIST AI 600-1 + MITRE ATLAS methodology mapping, raw attack traces, retest evidence, control mapping — as the default output of every assessment, not as an executive-summary add-on. Compliance teams pull the package directly from the platform when the SOC 2 / ISO 42001 / EU AI Act window opens. No pre-audit scramble.

How to read the demo#

The first 30-minute call with any AI red teaming vendor should answer six questions. Bring them on paper. The questions decode the vendor's pricing model in a way the pricing page cannot.

What is your unit of pricing? Engagement, asset, test, or seat. The answer tells you which of the five models the vendor uses and which CFO question applies.
What is a typical first-year total for a company of our shape? Give the vendor a specific shape (number of AI applications, data sensitivity, testing cadence) and ask for a range. A vendor who cannot offer a range without three additional discovery calls is selling opacity.
What triggers an escalation in cost? A new asset, a new test, a new module, a scope amendment. Knowing the trigger lets you forecast the second-year total before you sign the first-year contract.
Is your methodology mapped to a public framework? NIST AI 600-1, MITRE ATLAS, OWASP LLM Top 10. A vendor with a publicly mapped methodology is one your auditor can review. A vendor with a proprietary methodology is one your auditor will question.
What evidence package do you produce, and is it auditor-acceptable? Ask to see a sample report. Ask whether the sample is what an ISO 42001 auditor would accept as control evidence. The honest answer might be "with the executive-summary add-on, yes."
What is the renewal escalator? The number is the number. If it is 12 percent and you sign for three years, you are paying for the escalator as much as for the platform.

A vendor who answers all six clearly on the first call is operating with the kind of pricing transparency the category usually lacks. That is the wedge: the call is the test.

For reference, Repello's answers on each: (1) platform license per asset; (2) typical first-year ranges land at $40K to $200K depending on the asset mix, before any agentic or regulated-data uplift; (3) cost escalates on a new asset added, not on a new test fired — usage inside the asset is unlimited within fair-use bounds; (4) methodology is publicly mapped to NIST AI 600-1, MITRE ATLAS, and OWASP LLM Top 10; (5) the evidence package is auditor-acceptable as a default of the license, not as an add-on; (6) the standard renewal escalator is in the 5 to 8 percent range, well below the category median. Bring the list and we will go through them in order on the call.

TL;DR for the buyer in a hurry#

Every AI red teaming vendor says "custom pricing." Five models live behind that phrase: per engagement, per asset, per test, hybrid, freemium.
Published public anchors: $16K to $100K for per-engagement services work; $8K to $150K for per-test platform pricing by system type; ~$20/month entry tiers for freemium. Platform-license pricing remains largely opaque across the market.
System complexity, data sensitivity, and test depth are the three multipliers that move every vendor's number.
Eight hidden line items typically sit outside the headline price; ask for the itemized proposal before signing.
Your auditor wants a specific evidence package; the right pricing model produces it as a default, not as an add-on.
Six questions in the first 30-minute demo decode the entire pricing model. Bring them on paper. Repello will answer all six.
Two recent acquisitions, Splx into Zscaler (November 2025) and Promptfoo into OpenAI (March 2026), shifted the vendor-independence question for AI red teaming buyers. We cover the implications in Splx alternatives and Promptfoo alternatives.

FAQ#

Why does every AI red teaming vendor quote custom pricing?#

Three reasons compound. Scope is genuinely variable across system types: a single chatbot pen test, a RAG application audit, and a multi-agent system review have very different costs. Vendors avoid publishing a number that anchors enterprise buyers low. And the category lacks a standard pricing unit, so there is no shared denominator the way "per seat per month" works for SaaS. The result is opacity that hurts buyers more than it hurts vendors.

What is a realistic budget range for an AI red teaming engagement?#

Published anchors span an order of magnitude. Services firms publish minimums around $16K with full-engagement ranges of $10K to $100K. Platform vendors publish per-test ranges of $8K to $150K based on system type. Freemium-tier entry points start near $20 per month for automated probes. A first-year program for a single LLM application typically lands somewhere between $25K and $75K, with multi-agent and regulated-data systems pushing higher. Treat any number you see as an anchor, not a quote.

Which pricing model is best for ongoing assurance versus a one-off audit?#

A scoped engagement priced per project is usually fine for an annual point-in-time audit needed for a compliance window. Platform-license pricing per asset fits ongoing assurance, because the same testing runs continuously as the application changes. Per-test pricing fits irregular work but becomes unpredictable as usage grows. Hybrid pricing covers both shapes inside one contract. The right model is the one whose unit of cost matches your testing cadence.

What hidden costs should procurement teams ask about?#

Setup and onboarding fees, retest fees after a vulnerability is closed, scope-creep fees when the system grows mid-contract, framework-mapping add-ons for ISO 42001 or NIST AI 600-1, executive-summary report fees, and renewal escalators that increase year over year. These line items often sit outside the headline number. Asking for the full itemized proposal before signing is the only way to surface them.

Will my auditor accept a vendor's red teaming report as evidence?#

Auditors for SOC 2, ISO 42001, and the EU AI Act do not accept a marketing summary. They want a scoping document, the testing methodology, raw test artifacts, retest evidence after remediation, and mapping to specific controls. A vendor whose pricing model covers report customization, framework mapping, and retest cycles will produce an auditor-acceptable evidence package. A vendor whose pricing model covers only the initial scan will not.

What questions should I ask in the first demo with any AI red teaming vendor?#

Six questions cover most of the surface. What is your unit of pricing? What is a typical first-year total for a company of our shape? What triggers cost escalation? Is your methodology mapped to NIST AI 600-1 or MITRE ATLAS? What evidence package do you produce, and is it auditor-acceptable? What is the renewal escalator? Any vendor that can answer all six clearly on the first call is operating with the kind of pricing transparency the category usually lacks.

Bring this list to a Repello demo#

The six questions above are the wedge. Most vendors will not answer all of them on a first call. We will.

Book a Repello demo and bring the list. We will walk through the unit of pricing, a realistic first-year range for your shape, the cost-escalation triggers, our public framework mapping, the auditor-acceptable evidence package, and the renewal escalator. Thirty minutes, six answers.

Price is one axis; coverage is the other. Volume isn't coverage covers how to tell whether a vendor actually tests the risk specific to your application or just runs up a bug count on a leaderboard.