Why do manual discovery checklists miss so much shadow AI?

Manual discovery relies on employee self-reporting, OAuth audit logs, and administrative dashboards, which are effective for SaaS applications but insufficient for AI tools. Locally installed models, code-level API integrations, and MCP servers running on developer machines leave no audit trail in standard discovery tools. Employees frequently do not categorize the tools they use as "AI tools" in a security sense. Automated discovery scans code repositories for API calls, monitors MCP registrations, and detects AI-related dependencies across infrastructure and application layers.

Shadow AI: The Unauthorized AI Tools Already Running Inside Your Enterprise

TL;DR

Shadow AI refers to AI tools, models, and integrations in use within an enterprise without the knowledge or approval of the security or IT team. The category is growing faster than any previous shadow IT wave.
The risks are categorically different from traditional shadow IT: data submitted to external LLMs may be used for model training, PII leaves the organization without consent, and compromised AI tools can introduce adversarial content into internal workflows.
Shadow AI hides in browser extensions, Slack and Teams bots, CI/CD pipeline scripts, MCP server connections, and developer local environments: none of which traditional SaaS discovery tools are designed to detect.
Discovery requires a combination of network traffic analysis, endpoint scanning, API key auditing, and automated AI asset discovery. Manual checklists catch the obvious cases; automated scanning is the only approach that scales.
Repello's AI Asset Inventory finds AI integrations that manual audits consistently miss, typically uncovering between three and ten times more AI touchpoints than an organization's self-reported inventory contains.

Shadow AI is any AI tool, model, API, or integration operating inside an enterprise without the knowledge or approval of the security or IT team. The category spans consumer LLMs used for work tasks, unauthorized browser extensions, Slack bots added without IT review, developer coding assistants, and MCP server connections running on developer machines. It is growing faster than any previous wave of shadow technology, and research demonstrates that most organizations have three to ten times more AI integrations than their official inventories reflect.

A sales representative at a mid-market SaaS company is preparing for a renewal call. She has a list of 200 customer accounts with contact names, contract values, and internal notes on account health. She wants a summary of each account's risk factors before the call. Her company's internal tools do not have this capability, so she pastes the full spreadsheet into a consumer LLM she found through a browser extension.

The tool works well. She gets her summaries. What she does not know: the LLM provider's terms of service, which she did not read, permit the use of submitted content to improve the model. The 200 customer records (names, contract values, internal account health assessments) have just left the organization's control, without consent, without a data processing agreement, and without any record that it happened.

This is not a hypothetical. Variants of this scenario are occurring in organizations daily. A 2024 report from Cyberhaven found that 4.2% of workers had pasted company data into ChatGPT, with sensitive data categories including source code, financial information, and personally identifiable information appearing consistently. As the Repello AI Research Team notes, the data leaves in a query and may never come back. Shadow AI is the infrastructure enabling these incidents at scale.

What is shadow AI, and why it is not just shadow IT with a new name#

Shadow IT has existed as long as enterprise software procurement has. When IT approval processes are slow and business needs are immediate, employees find tools that work and use them. The pattern is consistent: SaaS adoption outpaced procurement controls in the 2010s, cloud storage outpaced DLP controls before that. Security teams have been playing catch-up with shadow technology adoption for decades.

Shadow AI fits this pattern structurally, but the risk profile is different in three ways that matter.

Shadow AI is active, not passive. A shadow SaaS tool stores data. A shadow AI tool acts on data: it generates summaries, writes code, answers questions, and in agentic configurations, executes tasks. An employee using an unauthorized project management tool introduces a data residency risk. An employee using an unauthorized agentic AI tool with email access introduces an execution risk. The tool can send emails, create calendar entries, and interact with other systems on behalf of the user, with the user's credentials, based on instructions from an external model.

Shadow AI learns from what it is given. Software does not change based on what data it processes. AI models can. Consumer LLM providers vary significantly in whether submitted content is used for training, and their terms of service are frequently unclear, frequently changed, and rarely read by employees making in-the-moment decisions about which tool to use. Data submitted to an external model during normal work activity may become part of that model's training corpus, accessible to other users in different organizations through the model's outputs.

Shadow AI introduces supply chain risk, not just data risk. A browser extension that wraps an LLM is a software supply chain dependency the enterprise did not evaluate. If that extension is compromised (through a malicious update, a hijacked extension account, or a prompt injection attack embedded in content the extension processes) it becomes an attack vector sitting inside the browser of every employee using it, with access to whatever those employees can access.

Traditional shadow IT discovery tools scan for SaaS applications by monitoring network traffic, checking OAuth app authorizations, or reviewing SSO logs. These approaches catch Dropbox. They do not catch a locally installed AI coding assistant, a Slack bot added to a workspace by a team member, or an MCP server running on a developer's machine. The discovery problem is harder, and the risk is higher.

The real risks shadow AI introduces#

Data exfiltration and training data exposure#

The most immediate risk is data leaving the organization through AI query interfaces. Unlike traditional data exfiltration, where data moves through recognizable channels (email attachments, USB drives, cloud uploads), AI-based data exfiltration looks like normal productivity activity: an employee using a tool. The data leaves in the request payload, structured as a prompt rather than a file, and arrives at an external model provider whose data handling practices the organization has not reviewed.

The exposure categories are consistent across documented incidents: source code (submitted for debugging, code review, or documentation), customer data (submitted for analysis or communication drafting), financial information (submitted for modeling or reporting), and internal strategy documents (submitted for summarization). Under GDPR Article 28, personal data cannot be submitted to any third-party processor without a data processing agreement. Most shadow AI usage involves no such agreement, creating both security and compliance exposure.

Model misuse and policy violation#

Shadow AI tools frequently lack the system-prompt controls and output filtering that enterprise-grade AI deployments implement. A consumer LLM used without enterprise safeguards may produce outputs that violate the organization's content policies, generate legally problematic content in the organization's name, or provide employees with incorrect information they act on. When the tool is unauthorized, the organization also has no audit trail of what was submitted, what was returned, or what decisions were made based on the output.

Compliance violations#

Beyond GDPR, shadow AI creates exposure under multiple regulatory frameworks. The EU AI Act requires documentation and monitoring for high-risk AI systems; those requirements cannot be met for systems the security team does not know exist. HIPAA applies to healthcare organizations whose employees submit patient-adjacent information. Financial services regulators require ongoing AI model risk management. SEC cybersecurity disclosure rules require disclosure of material AI-related risks and incidents. You cannot disclose incidents involving systems you did not know were in use, creating both operational and legal risk.

Supply chain exposure#

Shadow AI tools are third-party software dependencies that have not gone through security review. Browser extensions with AI capabilities have access to everything a user's browser can see: page content, form fields, authentication tokens, and session cookies. A compromised AI browser extension is a keylogger and credential harvester with a productivity tool interface. The OWASP Agentic Top 10 explicitly addresses supply chain risk in AI tooling; shadow AI is the category where supply chain risk is highest because the tools have not been evaluated at all.

Where shadow AI hides in your enterprise#

Browser extensions#

AI browser extensions are the most widespread and least visible category of shadow AI. Extensions like AI writing assistants, grammar checkers with AI backends, and meeting summarizers operate with broad browser permissions: they can read page content, intercept form submissions, and in many cases access clipboard content. An employee installing an AI browser extension from the Chrome Web Store is introducing an unreviewed AI dependency with access to everything they do in their browser. Standard SaaS discovery does not catch extensions; they do not appear in SSO logs, OAuth authorization lists, or network traffic in a way that distinguishes them from other HTTPS traffic.

Slack and Teams bots#

Collaboration platforms are a common shadow AI entry point because adding a bot to a Slack workspace or Teams channel requires only workspace-level permissions, not IT approval. AI-powered bots that summarize channels, generate responses, or integrate with external services can be added by any workspace member with the relevant permission level. Once added, these bots have access to channel history, direct messages in some configurations, and file attachments. The bot's backend model and its data handling practices are invisible to the security team.

CI/CD pipeline scripts and coding assistants#

Developer environments are the highest-density shadow AI environments in most organizations. AI coding assistants (locally installed or IDE-integrated), scripts that call LLM APIs for code generation or review, and automated pipeline steps that use AI for testing or documentation all represent AI integrations that typically bypass procurement entirely. Developers introduce these tools because they improve productivity, the tools work, and the approval process for developer tooling is often informal. The risk is that source code, internal architecture documentation, and API credentials present in the development environment are submitted to external model APIs with no security review of the receiving endpoint.

MCP server connections#

Model Context Protocol servers are a newer and rapidly growing shadow AI category. An MCP server extends what an AI agent can do by connecting it to file systems, databases, APIs, and external services. A developer running a local MCP server connected to an AI assistant has, by definition, created an agentic AI integration that the security team has no visibility into. That MCP server may have access to internal codebases, database credentials, or cloud infrastructure APIs. Repello's research on MCP tool poisoning to RCE demonstrates that MCP connections are not just a visibility gap; they are an active attack surface. A malicious or compromised MCP server can redirect agent execution to attacker-controlled infrastructure.

Developer local environments#

Beyond MCP servers, developer local environments host a range of AI integrations that are entirely invisible to network-based discovery: locally running models (via Ollama or similar tooling), locally installed AI agents, fine-tuning jobs running on local hardware, and direct API integrations built into personal scripts and tools. Some of these use external model APIs (and therefore send data externally); others run entirely locally (and therefore represent an undocumented AI capability operating on internal data with no oversight). Each is a node on the AI attack surface with a blast radius that is unknown until it is discovered.

How to discover shadow AI#

Manual discovery checklist#

A manual discovery exercise cannot find everything, but it surfaces the highest-risk categories quickly:

OAuth and API authorizations: Review authorized OAuth applications across Google Workspace, Microsoft 365, Slack, and GitHub. Flag any application with AI in its name or description, and any application requesting scopes that include read access to emails, documents, or calendar data.
Browser extension audit: Deploy a browser management tool (or use existing endpoint management) to enumerate installed extensions across the organization. Flag extensions from providers not on an approved vendor list, particularly those with broad host permissions.
Slack and Teams app directory: Review all installed apps in each workspace and team. Flag any bot with message read permissions or external integration capabilities.
Developer tooling survey: Send a structured survey to engineering teams listing common AI tools (GitHub Copilot, Cursor, Continue, Ollama, local LLM tools) and asking which are in active use. Treat the results as a lower bound, not a complete inventory.
Expense and procurement review: Search expense reports and SaaS subscription lists for AI tool names and known AI providers (OpenAI, Anthropic, Google, Mistral, Cohere). API key charges from these providers appearing in personal or team budget codes indicate unauthorized API usage.
Network traffic sampling: Sample DNS query logs and HTTPS traffic destinations for known AI provider endpoints (api.openai.com, api.anthropic.com, generativelanguage.googleapis.com, and equivalents). Unexpected traffic volumes to these endpoints from non-approved systems indicate shadow usage.

Automated AI asset discovery#

Manual checklists catch what employees are willing to disclose and what is visible in administrative dashboards. They consistently miss locally installed tools, API integrations embedded in code, and AI usage patterns that do not generate recognizable traffic signatures.

Automated AI asset discovery scans the environment continuously for AI integrations at the infrastructure and application layer: identifying model API calls in code repositories, detecting AI-related dependencies in package manifests, flagging new OAuth authorizations as they occur, and monitoring MCP server registrations across developer environments. Repello's AI Asset Inventory is purpose-built for this discovery function, operating continuously rather than as a point-in-time audit.

The gap between manual and automated discovery is consistently large. Automated assessments typically find three to ten times more AI integrations than an organization's self-reported inventory contains. The difference is not primarily dishonesty on employees' part. It is that employees frequently do not recognize the tools they use as "AI tools" in the sense that a security team would categorize them, and locally running tools and code-level integrations are invisible to self-reporting entirely.

Discovery feeds directly into the broader AI security posture management program. The AI BOM a complete inventory generates is the prerequisite for risk classification, adversarial testing, and the compliance documentation that NIST AI RMF and the EU AI Act require. Without discovery, every downstream security function operates against an incomplete picture.

Frequently asked questions#

What is shadow AI?

Shadow AI refers to AI tools, models, APIs, and integrations used within an enterprise without the knowledge or approval of the security or IT organization. It includes consumer LLMs used for work tasks, unauthorized AI browser extensions, Slack and Teams bots added without IT review, AI coding assistants installed by developers, MCP server connections in developer environments, and API integrations calling external model providers. The category has grown rapidly since consumer AI tools became widely available in 2022 and 2023.

How is shadow AI different from shadow IT?

Shadow IT typically involves SaaS applications that store or process data without IT approval. Shadow AI introduces additional risk categories: data submitted to external LLMs may be used for model training, making exfiltration effectively permanent; AI tools are active rather than passive and can execute tasks with user credentials in agentic configurations; and shadow AI tools are software dependencies that have not been evaluated for supply chain risk. The discovery problem is also harder: standard SaaS discovery tools are not designed to detect AI-specific integration patterns.

What data is most at risk from shadow AI?

The data categories most consistently present in shadow AI incidents are source code (submitted for debugging and code review), customer and prospect data (submitted for analysis and communication drafting), financial information (submitted for modeling), internal strategy documents (submitted for summarization), and authentication credentials present in development environments where AI coding tools are used. All of these have clear compliance implications under GDPR, the EU AI Act, and industry-specific regulations.

How do I find shadow AI in my organization?

Start with the manual checklist: OAuth authorization reviews across major platforms, browser extension audits, Slack and Teams app directory reviews, developer tooling surveys, expense report analysis, and DNS traffic sampling for known AI provider endpoints. Follow up with automated AI asset discovery tooling that continuously monitors the environment for new AI integrations. Treat manual discovery results as a lower bound and automated results as directionally complete.

Is using an AI tool at work without approval a policy violation?

In most organizations with an acceptable use policy that covers data handling, yes. Submitting customer data, source code, or proprietary information to an external AI provider almost always violates data classification and data handling policies, regardless of whether those policies explicitly mention AI tools. For organizations subject to GDPR, submitting personal data to an unreviewed external processor without a data processing agreement is a compliance violation independent of internal policy.

Conclusion#

Shadow AI is not a future risk. It is a present condition in virtually every enterprise with knowledge workers. The tools are accessible, effective, and adopted faster than any previous wave of shadow technology because the productivity gains are immediate and visible while the risks are invisible until an incident occurs.

The response is the same one that worked for shadow IT: visibility first, then policy, then controls. You cannot govern AI tools you do not know exist. A complete AI asset inventory (covering browser extensions, collaboration bots, developer tools, MCP connections, and API integrations alongside the official AI deployments) is the starting point for every control that follows.

To see how Repello discovers shadow AI across enterprise environments and builds the asset inventory that security programs require, visit repello.ai/inventory or request a demo.