Back to all blogs

The Zero-Day Collapse: The Case for Continuous AI Red Teaming

The Zero-Day Collapse: The Case for Continuous AI Red Teaming

Naman Mishra

Naman Mishra

|

Co-Founder, CTO

Co-Founder, CTO

Mar 9, 2026

|

10 min read

The Zero-Day Collapse: The Case for Continuous AI Red Teaming
Repello tech background with grid pattern symbolizing AI security

-

TL;DR

  • The median time from vulnerability disclosure to first observed exploit dropped from 771 days in 2018 to 4 hours in 2024. By 2025, the majority of exploited vulnerabilities were weaponized before public disclosure.

  • AI has industrialized exploit development: a working exploit for a known vulnerability now costs as little as $4, and AI agent swarms can find over 100 new kernel vulnerabilities in 30 days.

  • Organizations can remediate roughly 10% of new vulnerabilities per month. The average patch takes 20 days to test and deploy. The attack is already inside that window.

  • Periodic red teaming, run quarterly or annually, was calibrated for a threat landscape that no longer exists. Continuous AI red teaming is not a premium option — it is the minimum viable defense posture for 2026.

In 2018, the median time from a vulnerability being disclosed to the first observed exploit in the wild was 771 days. Security teams had over two years to identify, prioritize, test, and deploy a patch before real-world exploitation typically began. That window shaped an entire generation of security practice: vulnerability management programs, patch cadence policies, compliance frameworks, and red teaming schedules.

By 2024, that window had compressed to 4 hours. In 2025, 67.2% of exploited CVEs were weaponized before or on the day of public disclosure, up from 16.1% in 2018. Two-thirds of confirmed exploitation in the wild is now happening before defenders have any information to act on.

This is the finding at the center of Sergej Epp's Zero Day Clock, a live dashboard tracking the collapse of exploitation timelines across 3,515 CVE-exploit pairs from CISA KEV, VulnCheck KEV, and XDB. It is one of the most important data visualizations in security right now, not because it introduces a new threat, but because it makes the structural failure of reactive security impossible to ignore. This post draws out the implications for AI red teaming specifically.

How AI collapsed the exploitation timeline

The compression of time-to-exploit from 771 days to 4 hours did not happen through attacker skill improvement alone. It happened because AI eliminated the bottleneck in exploit development: the time and cost required to translate a disclosed vulnerability into a working weaponized payload.

Research from the University of Illinois at Urbana-Champaign demonstrated that GPT-4 could autonomously exploit one-day vulnerabilities with an 87% success rate at an average cost of $8.80 per exploit. That cost has since fallen further. AI agent swarms built by researchers Yaron Dinkin and Eyal Kraft found over 100 exploitable vulnerabilities across major hardware vendor kernel drivers in 30 days, at a total cost of $600, roughly $4 per confirmed bug. Sean Heelan built agents that generated over 40 working exploits for a single vulnerability for $50, defeating address space layout randomization, control-flow integrity, hardware security features, and sandboxes in the process.

Google's Project Zero and DeepMind collaboration, Big Sleep, independently discovered a critical exploitable memory-safety flaw in SQLite before any human researcher did — the first publicly documented case of AI finding a previously unknown real-world vulnerability that was ready to exploit. Anthropic published findings that Claude had identified over 500 high-severity vulnerabilities in widely used open-source software, bugs that had survived decades of expert human review.

The acceleration is structural. Epp's framework, which he calls Verifier's Law, explains why: AI capability scales with the cheapness of verification, and offense has the cheapest verifier in all of cybersecurity. Did the exploit succeed? The feedback is binary and instant. The AI learns at machine speed. On the defensive side, verification is ambiguous, slow, and expensive. Is this alert real? Is this system secure? The signal is noisy and the learning loop is broken. AI systems improve faster on offense not because defenders are less capable, but because the verification structure inherently favors the attacker.

The CVE volume problem compounds the timeline problem

If the exploitation timeline were collapsing in isolation, defenders could theoretically adapt by improving patch velocity. The problem is that the timeline collapse is happening simultaneously with an exponential increase in vulnerability volume.

CVE.ICU, which tracks CVE publication records, shows 25,000 CVEs published in 2022, 29,000 in 2023, 40,000 in 2024, and over 48,000 in 2025, a 520% rise in annual CVE records since 2016. The 2025 total brings the all-time published CVE count past 308,000.

The driver is not just more software. It is AI-assisted development. Developers using AI coding tools are reporting 40%+ productivity gains, with AI generating a significant share of committed code. More code, produced faster, with proportionally less human review means the vulnerability production rate is scaling with developer productivity. The attack surface is not just growing — it is being manufactured at industrial scale, and the manufacturing process has been automated.

Organizations facing this volume are not keeping up. Cyentia Institute research shows organizations can remediate approximately 10% of new vulnerabilities per month. The average time to test and deploy a security patch is 20 days. CISA KEV data shows it takes organizations around 55 days to remediate 50% of known-exploited vulnerabilities once a patch is available. When the median time-to-exploit is measured in hours, a 55-day remediation timeline is not inadequate. It is irrelevant.

The 2024 Verizon DBIR documented that vulnerability exploitation as an initial access vector tripled, growing 180% over the prior year's report. The 2025 DBIR continued the trend, reporting a 34% year-over-year rise in breaches initiated through vulnerability exploitation. This is not a spike. It is the new baseline, driven by the same AI-assisted exploitation infrastructure that collapsed the timeline.

Why CVSS prioritization fails in this environment

The standard response to overwhelming vulnerability volume — prioritize by CVSS severity — is precisely wrong for the threat environment the Zero Day Clock describes.

Fewer than 10% of all published CVEs are ever actually exploited in the wild. Only 0.2% of published vulnerabilities have confirmed use by ransomware operators or advanced persistent threat groups. Yet organizations sorting by CVSS score are treating every critical and high finding as urgent, working through a queue that is 90% noise while the 0.2% that actually matters may be buried in the backlog.

Reachability analysis changes this calculus significantly. Research using this approach has shown that 92% of vulnerabilities, without context signals such as active exploitation status, EPSS score, reachability in the running application, or availability of working exploits and proof-of-concept code, are effectively noise that can be safely deprioritized. The problem is that most organizations do not have these signals in place. They are still running CVSS-based triage queues in an environment where the effective exploited population is a fraction of a percent of published CVEs, and where the 0.2% that matters is actively being weaponized by AI systems before the triage queue even starts moving.

This is not a criticism of the teams doing the triage. It is a structural mismatch between the tools built for a prior threat environment and the one that now exists.

Why periodic red teaming can no longer answer the right question

Traditional red teaming was designed to simulate the capabilities of human attackers operating on human timescales. A quarterly engagement produces a point-in-time snapshot of exploitable conditions, delivered in a report with a remediation backlog that may take months to work through.

That model was calibrated for a world where a motivated human attacker needed days to weeks to develop a working exploit for a known vulnerability, and months to discover new ones. The Zero Day Clock's data makes clear that this calibration is obsolete. When AI can find 100 new kernel vulnerabilities in 30 days at $4 each, and generate a working exploit within hours of a CVE being published, the question a quarterly red team engagement answers, which is "could an attacker with significant time and skill compromise this system," is no longer the relevant question. The relevant question is "what can an attacker with $50 and an AI agent do to this system in the next 4 hours."

Continuous AI red teaming answers the right question. By running adversarial simulations continuously against production configurations, rather than against a snapshot taken once per quarter, it surfaces exploitable conditions as they emerge rather than weeks after they have been present. It also directly addresses the Verifier's Law asymmetry: a continuous AI red teaming system gives defenders fast, automated feedback on whether a given configuration is exploitable, compressing the defensive verification cycle rather than leaving it at the mercy of noisy, manual alert triage.

The 2025 Verizon DBIR data on exploitation as the primary initial access vector makes the cost of periodic testing explicit: every day between engagements is a day during which newly disclosed and newly exploitable conditions are present in production without being detected by the security team's own testing.

What continuous AI red teaming looks like in practice

Continuous AI red teaming is not a rebranded vulnerability scanner. Scanners identify the presence of known-vulnerable software versions. They do not simulate what an attacker with AI tooling can actually do with those vulnerabilities in the specific configuration of your environment.

The distinction matters because exploitability depends on context: network segmentation, authentication controls, adjacent service exposure, and the specific combination of vulnerabilities present. A known-vulnerable library running behind three layers of authentication and without external network access presents a fundamentally different risk profile than the same library in an externally-facing service with permissive egress. Scanners report both at the same CVSS severity. A red team simulation surfaces the difference.

Repello's ARTEMIS is built for this specific requirement: continuous adversarial testing across AI application configurations, agentic pipelines, and LLM deployment architectures, running the same attack techniques that AI-assisted offensive tools now execute at commodity cost. For teams that also need runtime protection during the window between exploitation and patching, ARGUS operates at the inference layer, enforcing behavioral constraints and blocking anomalous agent actions before they complete.

Together they address both sides of the Zero Day Clock problem: ARTEMIS compresses the detection gap by continuously testing whether current configurations are exploitable, and ARGUS provides runtime enforcement for the period when the patch cycle has not yet caught up with the exploitation window.

Conclusion

The Zero Day Clock is not a warning about a future state. It is a measurement of the present one. The median exploitation window is now measured in hours, 67% of exploited CVEs in 2026 are weaponized before or on the day of disclosure, and AI has reduced the cost of a working exploit to single-digit dollars. The patch cycle that takes 20 days on average is not competing with that timeline. It is not in the same category.

The security practices built for a 771-day window — quarterly red team engagements, CVSS-based prioritization queues, voluntary secure-by-design commitments — are not failing because teams are executing them poorly. They are failing because the environment they were designed for no longer exists. Continuous AI red teaming is not a more expensive version of what came before. It is the only approach calibrated to the threat environment the data actually describes.

See how ARTEMIS continuous red teaming works against AI-assisted attack techniques. Talk to Repello's team.

Frequently asked questions

What is the Zero Day Clock and who built it?

The Zero Day Clock is a live dashboard created by Sergej Epp, CISO of Sysdig, that tracks the collapse of time-to-exploit (TTE) timelines using data from 3,515 CVE-exploit pairs sourced from CISA KEV, VulnCheck KEV, and XDB. It visualizes the compression of the exploitation window from a median of 771 days in 2018 to 4 hours in 2024, and tracks the trajectory toward sub-day exploitation across the broader CVE population. The dashboard is designed to make the structural failure of reactive security visible to security and engineering leadership.

Why has the time-to-exploit window collapsed so dramatically?

The primary driver is AI-assisted exploit development, which has eliminated the time and skill bottleneck that previously constrained exploitation velocity. Research has demonstrated that AI systems can autonomously exploit known vulnerabilities at an 87% success rate for under $9 per attempt, and find new exploitable vulnerabilities in production-scale codebases at a cost of roughly $4 per bug. Combined with a 520% rise in annual CVE publication volume since 2016, defenders are facing more vulnerabilities being exploited faster than at any point in the history of the CVE program.

What is Verifier's Law and why does it matter for AI security?

Verifier's Law, a concept introduced by Sergej Epp, holds that AI capability scales with the cheapness of verification. Offense has the cheapest verifier in cybersecurity: did the exploit succeed? The feedback is binary and instant. Defense has expensive, ambiguous verification: is this alert real, is this system secure? The practical consequence is that AI systems improve on offense faster than on defense not because offensive researchers are more capable, but because the verification structure is asymmetric. It explains why AI-assisted exploitation has industrialized faster than AI-assisted defense.

Why doesn't faster patching solve the zero-day collapse problem?

Faster patching is constrained by two factors that AI does not help with: organizational capacity and patch paradox. Organizations can remediate approximately 10% of new vulnerabilities per month regardless of tooling improvements. More critically, the act of publishing a patch now accelerates exploitation: AI can reverse-engineer a patch, identify the vulnerability it fixes, and generate a working exploit faster than most organizations can test and deploy the fix. Average patch deployment takes 20 days. The exploitation window is 4 hours. Patching faster is necessary but insufficient.

What is the difference between a vulnerability scanner and continuous AI red teaming?

A vulnerability scanner identifies the presence of known-vulnerable software versions or configurations. It does not simulate what an attacker can actually accomplish against your specific environment. Exploitability depends on context: network segmentation, authentication controls, adjacent service exposure, and the combination of vulnerabilities present. Continuous AI red teaming simulates active exploitation attempts against your real configuration, surfaces exploitable combinations that scanners miss, and provides feedback calibrated to actual attacker capability rather than theoretical CVSS severity.

How should security teams prioritize in a 48,000-CVE-per-year environment?

CVSS-based prioritization is structurally flawed for the current environment: fewer than 10% of published CVEs are ever exploited, and only 0.2% have confirmed ransomware or APT use. Effective prioritization requires context signals: active exploitation status from threat intelligence feeds, EPSS score, reachability analysis confirming whether the vulnerable code path is actually reachable in your running application, and availability of working exploits or proof-of-concept code. Reachability analysis in particular has been shown to reduce the actionable vulnerability pool by approximately 92%, allowing teams to focus remediation effort on the fraction that presents genuine risk in their specific environment.

Share this blog

Subscribe to our newsletter

Repello tech background with grid pattern symbolizing AI security
Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.

Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.