What is dark AI? The security risks of uncensored and unvetted models

TL;DR

Dark AI refers to language models stripped of safety guardrails, trained on harmful data, or purpose-built to assist cybercrime

The category spans three tiers: purpose-built criminal tools (WormGPT, FraudGPT), uncensored open-source variants, and jailbroken commercial models

These tools lower the skill floor for phishing, malware generation, and social engineering attacks at scale

Security teams need to understand what dark AI enables, inventory what AI runs in their environment, and test their own systems against it

A security researcher at SlashNext first publicly documented WormGPT in July 2023: a GPT-based model with no content policies, no usage restrictions, and a clear commercial pitch to criminals looking to automate business email compromise attacks. It was not a theoretical threat. It was a subscription service, sold on dark web forums, with a product page and a pricing tier.

Since then the ecosystem has expanded. FraudGPT, DarkBERT, EvilGPT, and a rotating cast of successor tools have appeared on Telegram channels and criminal marketplaces, each promising to remove the friction that safety-aligned models impose on users with malicious intent. The term "dark AI" has emerged to describe this class of tools, but the boundaries of the category are worth examining carefully, because the risk extends well beyond purpose-built criminal software.

This post covers what dark AI actually means across its three distinct tiers, what it enables for attackers, and what security teams need to do about it.

What "dark AI" actually means#

The term covers at least three categories, and conflating them leads to imprecise defenses.

Purpose-built malicious models are the clearest case. WormGPT, FraudGPT, and tools like them are explicitly marketed to criminals. They are typically fine-tuned on datasets of malware samples, phishing templates, exploit code, and cybercrime forum content. They have no content filtering by design. SlashNext's July 2023 research documented WormGPT generating compelling, contextually accurate business email compromise attacks with no safety restrictions, in seconds, with no technical expertise required from the operator.

Uncensored open-source models form a larger and more ambiguous tier. Models like Dolphin (fine-tuned on Mistral and LLaMA variants) and various community releases have had their safety alignment deliberately removed. They are not purpose-built for crime. Their maintainers typically argue they are for research or bypassing overly restrictive content policies. In practice they are accessible to anyone, require no subscription, and produce outputs that commercial models refuse to generate. These models are running locally on commodity hardware right now, and many of the people running them have no criminal intent, which does not reduce the risk they represent when that access is abused.

Jailbroken commercial models occupy a third tier. Standard models including GPT-4 and Claude variants can be prompted into producing harmful outputs through AI jailbreaking techniques: multi-step prompt manipulation, role-play framing, encoding tricks, and context injection. The outputs are more constrained compared to purpose-built tools, but the barrier to entry is near-zero. No dark web subscription required.

Understanding which tier an attack is using matters because it shapes the detection and response strategy.

What dark AI enables for attackers#

The primary effect of removing safety alignment from a language model is lowering the skill floor for attacks that previously required domain expertise.

Phishing at scale and quality. Constructing a convincing spear-phishing email that impersonates a specific executive, references real organizational context, and adapts to multiple languages traditionally required either native-language proficiency or expensive labor. Dark AI removes that constraint entirely. Research from IBM's X-Force team has tracked the increasing use of AI-generated content in phishing campaigns, noting a measurable improvement in grammatical quality and contextual accuracy compared to campaigns from prior years.

Malware generation and modification. Purpose-built tools and uncensored models can generate functional exploit code, write ransomware variants, and modify existing malware to evade signature-based detection. This does not replace skilled malware authors for sophisticated campaigns, but it significantly reduces the cost of producing commodity malware and polymorphic variants that defeat static analysis tools.

Social engineering at scale. Voice cloning combined with dark AI text generation creates automated social engineering pipelines. An attacker can generate targeted scripts, adapt them to specific individuals based on scraped public information, and deploy them at a volume no human attacker can match. The FBI's 2023 Internet Crime Report recorded $2.9 billion in business email compromise losses, a category where AI-assisted generation is now a documented component of attack toolchains.

Vulnerability research assistance. Dark AI models trained on exploit databases and security research can assist in identifying attack vectors, generating proof-of-concept code, and explaining vulnerability classes in operational detail. This accelerates the time from vulnerability disclosure to working exploit for attackers operating well below the nation-state skill tier.

The consistent pattern is scale and accessibility. Attacks that once required specialized skills or significant labor are becoming automatable at low cost.

The open-source dimension: distilled and fine-tuned models#

The dark AI risk extends into models that were not designed to be malicious. Open-source model releases, including distillations of frontier models, carry safety properties that are often poorly documented and easily modified.

Repello AI's research on safety in models derived from DeepSeek-R1 illustrates the problem directly: distillation from a capable base model can preserve performance while degrading or eliminating alignment properties. Organizations deploying open-source models in production often have no reliable way to assess what safety properties a given fine-tune or distillation actually retains, and model cards rarely tell the full story.

This matters as much for internal deployments as for external threats. An organization that allows employees to run locally-hosted open-source models without governance is creating an undocumented attack surface inside its own perimeter. The model generating outputs inside your network may have no content restrictions at all.

What defenders need to understand#

Treat dark AI as a capability multiplier, not a separate threat category. The attack classes (phishing, malware, social engineering, vulnerability research) are not new. What dark AI changes is the cost, scale, and skill requirement to execute them. Defenses should focus on attack outcomes rather than trying to detect AI involvement in attacks, which is difficult to do reliably and prone to false negatives.

Inventory what AI is running in your environment. Shadow AI deployments (employees running uncensored models locally, using unvetted AI tools through personal accounts, or connecting AI agents to corporate systems without authorization) are a real and underappreciated risk. The same governance framework that applies to sanctioned AI deployments should extend to detecting and classifying unsanctioned ones. Repello's AI Asset Inventory is built around this problem: before you can secure AI in your environment, you need visibility into what is actually there.

Test your own AI systems against dark-AI-powered attacks. If your organization deploys customer-facing LLMs, internal AI assistants, or agentic systems, those systems will be targeted by attackers using unconstrained models to generate and iterate attack payloads at machine speed. Point-in-time assessments are insufficient when the attacker's tooling is continuously improving. The methodology for doing this rigorously is covered in Repello's guide to AI red teaming.

How Repello approaches dark AI threats#

The relevant question for security teams is not whether dark AI tools exist. It is whether your AI deployments can withstand attacks generated by them.

ARTEMIS, Repello's automated red teaming engine, runs continuous attack batteries against AI applications, including prompt injection, jailbreak attempts, data exfiltration probes, and adversarial inputs that mirror the techniques unconstrained models are used to generate. The goal is not compliance box-checking. It is identifying exploitable weaknesses before attackers using dark AI tooling find them first.

ARGUS, Repello's runtime security layer, monitors production AI systems for attack patterns regardless of whether they originated from a human attacker, a safety-aligned model, or a dark AI tool. At the point of runtime detection, the source of an attack payload is irrelevant. What matters is whether it reaches the model and whether the model's response creates risk.

The combination of proactive red teaming and runtime monitoring addresses both sides of the dark AI problem: the attack surface your AI exposes, and the ongoing attempts to exploit it.

Frequently asked questions#

What is dark AI?#

Dark AI refers to language models that have been stripped of safety alignment, trained on harmful or criminal datasets, or purpose-built to assist with cyberattacks and fraud. The category includes purpose-built criminal tools like WormGPT and FraudGPT, uncensored open-source model variants with alignment deliberately removed, and jailbroken commercial models manipulated into bypassing their content restrictions. The defining characteristic is the absence of the safety properties that responsible AI development requires.

Is dark AI the same as shadow AI?#

No. Shadow AI refers to AI tools deployed within an organization without authorization or governance: employees using consumer AI tools, connecting personal AI assistants to corporate systems, or running local models without IT visibility. Dark AI refers to models with no safety alignment, typically used by external threat actors. Both are security risks, but they require different responses. Shadow AI is a governance and visibility problem. Dark AI is an external threat capability problem.

How are threat actors using dark AI right now?#

The most documented use cases are business email compromise attacks using models like WormGPT to generate highly convincing phishing content without language barriers, commodity malware generation and modification to evade signature detection, and social engineering script generation at scale. Research from SlashNext confirmed operational criminal use of these tools from mid-2023 onward. The FBI's Internet Crime Report has tracked BEC losses exceeding $2.9 billion annually as AI-assisted generation has become a documented component of these campaigns.

Can safety-aligned commercial models be turned into dark AI?#

Partially. Commercial models can be manipulated through jailbreaking techniques to produce outputs they would normally refuse, though the outputs are typically more constrained than purpose-built tools. Open-source models can have their safety fine-tuning removed or overwritten entirely: the base model capabilities remain intact while alignment properties are stripped, producing a model that is functionally uncensored. Distilled variants of frontier models present additional risk because their safety properties are often undocumented and may not match the base model.

How should security teams respond to dark AI threats?#

Focus defenses on attack outcomes rather than trying to detect AI involvement: strengthen email security against AI-generated phishing, maintain malware detection that does not rely solely on signatures, and implement governance over what AI tools are running in your environment. For organizations deploying AI systems, continuous red teaming against adversarial inputs and runtime monitoring of production AI behavior are the most effective responses to an attack surface that dark AI tooling is actively probing.

Conclusion#

Dark AI is not a single thing. It is a spectrum from purpose-built criminal tools to uncensored open-source models to jailbroken commercial ones. What they share is the removal of the friction that safety alignment imposes: lower costs, higher scale, and reduced skill requirements for phishing, malware generation, social engineering, and vulnerability research.

For security teams, the practical response remains consistent: understand the attack surface your AI systems expose, test it rigorously against the techniques dark AI enables, and monitor it continuously in production. The tooling attackers use to probe that surface has changed significantly since 2023. The fundamentals of defending it have not.

If you want to understand how your AI systems hold up against the attack patterns dark AI enables, get a demo from Repello AI.