Back to all blogs

What is dark AI? The security risks of uncensored and unvetted models

What is dark AI? The security risks of uncensored and unvetted models

Archisman Pal

Archisman Pal

|

Head of GTM

Head of GTM

Feb 23, 2026

|

5 min read

What is dark AI? The security risks of uncensored and unvetted models
Repello tech background with grid pattern symbolizing AI security

Summary

Dark AI refers to language models that have been stripped of safety guardrails, trained on harmful data, or purpose-built to assist cybercrime. The category spans three distinct tiers: purpose-built criminal tools like WormGPT and FraudGPT, uncensored open-source variants, and jailbroken commercial models. Together, these tools lower the skill floor for executing phishing campaigns, generating malware, and conducting social engineering attacks at scale. Security teams need to understand what dark AI enables, take inventory of what AI is running in their environment, and actively test their own systems against the same techniques these tools make accessible.

A security researcher at SlashNext first publicly documented WormGPT in July 2023: a GPT-based model with no content policies, no usage restrictions, and a clear commercial pitch to criminals looking to automate business email compromise attacks. It was not a theoretical threat. It was a subscription service, sold on dark web forums, with a product page and a pricing tier.

Since then the ecosystem has expanded. FraudGPT, DarkBERT, EvilGPT, and a rotating cast of successor tools have appeared on Telegram channels and criminal marketplaces, each promising to remove the friction that safety-aligned models impose on users with malicious intent. The term "dark AI" has emerged to describe this class of tools, but the boundaries of the category are worth examining carefully, because the risk extends well beyond purpose-built criminal software.

This post covers what dark AI actually means across its three distinct tiers, what it enables for attackers, and what security teams need to do about it.

What "dark AI" actually means

The term covers at least three categories, and conflating them leads to imprecise defenses.

Purpose-built malicious models are the clearest case. WormGPT, FraudGPT, and tools like them are explicitly marketed to criminals. They are typically fine-tuned on datasets of malware samples, phishing templates, exploit code, and cybercrime forum content. They have no content filtering by design. SlashNext's July 2023 research documented WormGPT generating compelling, contextually accurate business email compromise attacks with no safety restrictions, in seconds, with no technical expertise required from the operator.

Uncensored open-source models form a larger and more ambiguous tier. Models like Dolphin (fine-tuned on Mistral and LLaMA variants) and various community releases have had their safety alignment deliberately removed. They are not purpose-built for crime. Their maintainers typically argue they are for research or bypassing overly restrictive content policies. In practice they are accessible to anyone, require no subscription, and produce outputs that commercial models refuse to generate. These models are running locally on commodity hardware right now, and many of the people running them have no criminal intent, which does not reduce the risk they represent when that access is abused.

Jailbroken commercial models occupy a third tier. Standard models including GPT-4 and Claude variants can be prompted into producing harmful outputs through AI jailbreaking techniques: multi-step prompt manipulation, role-play framing, encoding tricks, and context injection. The outputs are more constrained compared to purpose-built tools, but the barrier to entry is near-zero. No dark web subscription required.

Understanding which tier an attack is using matters because it shapes the detection and response strategy.

What dark AI enables for attackers

The primary effect of removing safety alignment from a language model is lowering the skill floor for attacks that previously required domain expertise.

Phishing at scale and quality. Constructing a convincing spear-phishing email that impersonates a specific executive, references real organizational context, and adapts to multiple languages traditionally required either native-language proficiency or expensive labor. Dark AI removes that constraint entirely. Research from IBM's X-Force team has tracked the increasing use of AI-generated content in phishing campaigns, noting a measurable improvement in grammatical quality and contextual accuracy compared to campaigns from prior years.

Malware generation and modification. Purpose-built tools and uncensored models can generate functional exploit code, write ransomware variants, and modify existing malware to evade signature-based detection. This does not replace skilled malware authors for sophisticated campaigns, but it significantly reduces the cost of producing commodity malware and polymorphic variants that defeat static analysis tools.

Social engineering at scale. Voice cloning combined with dark AI text generation creates automated social engineering pipelines. An attacker can generate targeted scripts, adapt them to specific individuals based on scraped public information, and deploy them at a volume no human attacker can match. The FBI's 2023 Internet Crime Report recorded $2.9 billion in business email compromise losses, a category where AI-assisted generation is now a documented component of attack toolchains.

Vulnerability research assistance. Dark AI models trained on exploit databases and security research can assist in identifying attack vectors, generating proof-of-concept code, and explaining vulnerability classes in operational detail. This accelerates the time from vulnerability disclosure to working exploit for attackers operating well below the nation-state skill tier.

The consistent pattern is scale and accessibility. Attacks that once required specialized skills or significant labor are becoming automatable at low cost.

The open-source dimension: distilled and fine-tuned models

The dark AI risk extends into models that were not designed to be malicious. Open-source model releases, including distillations of frontier models, carry safety properties that are often poorly documented and easily modified.

Repello AI's research on safety in models derived from DeepSeek-R1 illustrates the problem directly: distillation from a capable base model can preserve performance while degrading or eliminating alignment properties. Organizations deploying open-source models in production often have no reliable way to assess what safety properties a given fine-tune or distillation actually retains, and model cards rarely tell the full story.

This matters as much for internal deployments as for external threats. An organization that allows employees to run locally-hosted open-source models without governance is creating an undocumented attack surface inside its own perimeter. The model generating outputs inside your network may have no content restrictions at all.

What defenders need to understand

Treat dark AI as a capability multiplier, not a separate threat category. The attack classes (phishing, malware, social engineering, vulnerability research) are not new. What dark AI changes is the cost, scale, and skill requirement to execute them. Defenses should focus on attack outcomes rather than trying to detect AI involvement in attacks, which is difficult to do reliably and prone to false negatives.

Inventory what AI is running in your environment. Shadow AI deployments (employees running uncensored models locally, using unvetted AI tools through personal accounts, or connecting AI agents to corporate systems without authorization) are a real and underappreciated risk. The same governance framework that applies to sanctioned AI deployments should extend to detecting and classifying unsanctioned ones. Repello's AI Asset Inventory is built around this problem: before you can secure AI in your environment, you need visibility into what is actually there.

Test your own AI systems against dark-AI-powered attacks. If your organization deploys customer-facing LLMs, internal AI assistants, or agentic systems, those systems will be targeted by attackers using unconstrained models to generate and iterate attack payloads at machine speed. Point-in-time assessments are insufficient when the attacker's tooling is continuously improving. The methodology for doing this rigorously is covered in Repello's guide to AI red teaming.

How Repello approaches dark AI threats

The relevant question for security teams is not whether dark AI tools exist. It is whether your AI deployments can withstand attacks generated by them.

ARTEMIS, Repello's automated red teaming engine, runs continuous attack batteries against AI applications, including prompt injection, jailbreak attempts, data exfiltration probes, and adversarial inputs that mirror the techniques unconstrained models are used to generate. The goal is not compliance box-checking. It is identifying exploitable weaknesses before attackers using dark AI tooling find them first.

ARGUS, Repello's runtime security layer, monitors production AI systems for attack patterns regardless of whether they originated from a human attacker, a safety-aligned model, or a dark AI tool. At the point of runtime detection, the source of an attack payload is irrelevant. What matters is whether it reaches the model and whether the model's response creates risk.

The combination of proactive red teaming and runtime monitoring addresses both sides of the dark AI problem: the attack surface your AI exposes, and the ongoing attempts to exploit it.

Frequently asked questions

What is dark AI?

Dark AI refers to language models that have been stripped of safety alignment, trained on harmful or criminal datasets, or purpose-built to assist with cyberattacks and fraud. The category includes purpose-built criminal tools like WormGPT and FraudGPT, uncensored open-source model variants with alignment deliberately removed, and jailbroken commercial models manipulated into bypassing their content restrictions. The defining characteristic is the absence of the safety properties that responsible AI development requires.

Is dark AI the same as shadow AI?

No. Shadow AI refers to AI tools deployed within an organization without authorization or governance: employees using consumer AI tools, connecting personal AI assistants to corporate systems, or running local models without IT visibility. Dark AI refers to models with no safety alignment, typically used by external threat actors. Both are security risks, but they require different responses. Shadow AI is a governance and visibility problem. Dark AI is an external threat capability problem.

How are threat actors using dark AI right now?

The most documented use cases are business email compromise attacks using models like WormGPT to generate highly convincing phishing content without language barriers, commodity malware generation and modification to evade signature detection, and social engineering script generation at scale. Research from SlashNext confirmed operational criminal use of these tools from mid-2023 onward. The FBI's Internet Crime Report has tracked BEC losses exceeding $2.9 billion annually as AI-assisted generation has become a documented component of these campaigns.

Can safety-aligned commercial models be turned into dark AI?

Partially. Commercial models can be manipulated through jailbreaking techniques to produce outputs they would normally refuse, though the outputs are typically more constrained than purpose-built tools. Open-source models can have their safety fine-tuning removed or overwritten entirely: the base model capabilities remain intact while alignment properties are stripped, producing a model that is functionally uncensored. Distilled variants of frontier models present additional risk because their safety properties are often undocumented and may not match the base model.

How should security teams respond to dark AI threats?

Focus defenses on attack outcomes rather than trying to detect AI involvement: strengthen email security against AI-generated phishing, maintain malware detection that does not rely solely on signatures, and implement governance over what AI tools are running in your environment. For organizations deploying AI systems, continuous red teaming against adversarial inputs and runtime monitoring of production AI behavior are the most effective responses to an attack surface that dark AI tooling is actively probing.

Conclusion

Dark AI is not a single thing. It is a spectrum from purpose-built criminal tools to uncensored open-source models to jailbroken commercial ones. What they share is the removal of the friction that safety alignment imposes: lower costs, higher scale, and reduced skill requirements for phishing, malware generation, social engineering, and vulnerability research.

For security teams, the practical response remains consistent: understand the attack surface your AI systems expose, test it rigorously against the techniques dark AI enables, and monitor it continuously in production. The tooling attackers use to probe that surface has changed significantly since 2023. The fundamentals of defending it have not.

If you want to understand how your AI systems hold up against the attack patterns dark AI enables, get a demo from Repello AI.

Share this blog

Subscribe to our newsletter

Repello tech background with grid pattern symbolizing AI security
Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.

Repello tech background with grid pattern symbolizing AI security
Repello AI logo - Footer

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

Follow us on:

LinkedIn icon
X icon, Twitter icon
Github icon
Youtube icon

© Repello Inc. All rights reserved.