Back to all blogs

|
|
7 min read


TL;DR: NVIDIA Agent Toolkit, announced at GTC 2026, is the most enterprise-backed agentic AI platform yet, with 15+ partners including CrowdStrike, Cisco, Palantir, and SAP building on top of it. Its OpenShell runtime enforces policy-based sandboxing, network egress controls, and least-privilege access. But sandboxes enforce known rules. They do not catch indirect prompt injection, multi-turn goal hijacking, or malicious skills delivered through the supply chain. Security teams being asked to greenlight Agent Toolkit deployments need to know exactly where OpenShell's coverage ends and independent testing begins.
The scope of what NVIDIA just shipped
On March 16, 2026, Jensen Huang announced NVIDIA Agent Toolkit at GTC: an open-source platform for building and running enterprise AI agents. The stack includes OpenShell (a runtime for self-evolving agents), NVIDIA AI-Q (an agentic search blueprint that topped the DeepResearch Bench II leaderboard), Nemotron models for local inference, and a library of open skills.
Adobe, Atlassian, Box, Cadence, Cisco, CrowdStrike, Palantir, Red Hat, Salesforce, SAP, ServiceNow, and Siemens all announced integrations at launch. These are not roadmap commitments. They are production-grade integrations from companies whose software runs critical enterprise infrastructure.
What this means practically: security teams at organizations using any of these platforms will, within months, be managing AI agents running on OpenShell. What OpenShell enforces, and what it leaves uncovered, is now a live operational question.
What OpenShell's security model actually covers
OpenShell is the security and privacy enforcement layer at the core of Agent Toolkit. It acts as an intermediary between the agent and the underlying infrastructure, controlling what the agent can access, execute, and where inference tasks run.
The controls it ships with:
Policy-based network egress. Operators define which external endpoints an agent is permitted to call. Any outbound connection to an unapproved endpoint requires explicit operator approval before it completes. This directly addresses one of the most reliable data exfiltration paths in agentic attacks: an agent manipulated into calling an attacker-controlled endpoint.
Sandboxed execution. Agents run in isolated environments. A compromised agent cannot directly access the host file system, adjacent processes, or network sockets outside the sandbox boundary. Lateral movement from a compromised agent is constrained to what the sandbox policy permits.
Least-privilege access control. OpenShell enforces per-agent permissions. A customer service agent running on the same infrastructure as a code execution agent cannot inherit the latter's access rights. This limits the blast radius when an agent is successfully manipulated.
Privacy routing. For cloud inference calls, OpenShell's Privacy Router strips or obfuscates PII before the query leaves the operator's environment. For teams with hardware that supports it, local Nemotron inference eliminates cloud calls entirely.
Gartner projects that by 2028, 33% of enterprise software applications will include agentic AI, up from under 1% in 2024. Every one of those deployments creates an execution environment that can be manipulated. Platform-level sandboxing and egress controls are the right response to that scale.
Where OpenShell's coverage ends
Every policy enforcement layer has the same structural limitation: it enforces the rules operators write in advance. Attacks that operate within those rules, or exploit gaps the operator did not anticipate, pass through.
Indirect prompt injection via tool responses. OpenShell validates what an agent proposes to do before execution. It does not, based on current documentation, deeply inspect the content returned by external data sources before that content enters the agent's context window. An attacker who controls a document, web page, calendar entry, or API response that the agent retrieves can inject instructions directly into the agent's reasoning chain without triggering any outbound network block. This is the core mechanism Repello documented in its analysis of MCP prompt injection attacks, and it applies to any agentic framework that retrieves external content. OpenShell's network controls do not address it because the malicious instruction arrives through an approved channel.
Multi-turn goal erosion. OpenShell's intent verification operates per action. An agent whose behavior gradually drifts across a long conversation, each individual action appearing compliant, can reach a state that violates the operator's intent without any single action triggering a policy block. Multi-turn manipulation is one of the most difficult attack classes to catch with per-action enforcement because the violation is cumulative, not instantaneous.
Supply chain attacks on Agent Toolkit skills. Agent Toolkit ships with a library of open skills that agents can acquire and run. Repello's teardown of malicious OpenClaw skills demonstrated how skills can be weaponized to exfiltrate data, establish persistence, or pivot within an environment, all while appearing functionally legitimate. A sandboxed runtime enforces boundaries at execution time; it does not validate whether the skill being installed was tampered with before it arrived. Supply chain risk enters before OpenShell's controls become relevant.
Novel attack classes post-deployment. OpenShell's policy templates reflect known attack patterns at launch time. "A policy configuration is a snapshot of your threat model at deployment time," said the Repello AI Research Team. "Every week that passes without adversarial testing is a week where the gap between your policy and the actual threat landscape grows wider." Research from the University of Illinois Urbana-Champaign found that LLM agents autonomously exploited real-world vulnerabilities with an 87% success rate when given sufficient tool access. As new agent capabilities ship and new attack techniques emerge, the gap between deployed policy and current threat landscape widens. That gap does not close without active testing.
What your team still needs to test
The OWASP LLM Top 10 identifies prompt injection, excessive agency, and supply chain vulnerabilities as three of the most critical risks in LLM deployments. OpenShell addresses excessive agency directly. Prompt injection and supply chain risk require dedicated testing regardless of what runtime the agent runs on.
A practical red teaming scope for Agent Toolkit deployments:
Test indirect injection paths. For every external data source your agent queries (documents, search results, API responses, email threads, calendar entries), test whether an attacker-controlled response can redirect the agent's behavior. The approved network endpoint is not the threat model; the content returned by that endpoint is.
Test intent verification bypass. NVIDIA's intent verification layer validates proposed agent actions against operator policy. Test whether an attacker can craft instructions that appear policy-compliant at the intent layer while achieving a malicious outcome at the action layer. This requires adversarial creativity, not just known payload libraries.
Audit every skill in your deployment. Before any Agent Toolkit skill is installed in a production environment, verify its provenance, review its permission requests, and test its behavior under adversarial inputs. Repello's security best practices for deploying OpenClaw covers this process in detail.
Run multi-turn erosion sequences. Test whether a patient attacker who spreads manipulation across many conversation turns can cumulatively redirect the agent's behavior in ways that individual action-level checks would not catch. This is an underrepresented test class in most security evaluations.
Re-test after every model or skill update. Agent Toolkit will ship updates to Nemotron models and the skill library over time. Each update changes the agent's behavior surface. Security posture validated against the previous version does not carry forward automatically.
How Repello approaches Agent Toolkit deployments
ARGUS, Repello's runtime security layer, is the direct complement to OpenShell. Where OpenShell enforces per-action policy checks, ARGUS monitors session-level behavioral drift. It tracks whether an agent's cumulative trajectory across a conversation has deviated from expected operating parameters, surfacing multi-turn manipulation before it reaches a harmful endpoint: the coverage gap that per-action intent verification structurally cannot close.
ARTEMIS, Repello's automated red teaming engine, handles the pre-deployment side. It runs structured adversarial probe sequences against Agent Toolkit deployments: indirect injection attacks through tool response channels, intent verification bypass techniques, multi-turn erosion sequences, and skill-level supply chain tests. The output is a prioritized gap analysis against the operator's current OpenShell policy configuration, not a binary pass/fail.
The combination gives security teams runtime behavioral coverage in production (ARGUS) and continuous adversarial validation of the OpenShell policy (ARTEMIS). OpenShell alone provides neither.
Learn more at repello.ai/product.
Conclusion
OpenShell's sandboxing, network egress controls, and least-privilege enforcement close specific, well-understood attack paths. The 15+ enterprise partners who shipped integrations at GTC are not doing so carelessly. But the same structural limitation that applies to every guardrail system applies here: it enforces what operators anticipated. Indirect prompt injection, multi-turn erosion, and supply chain tampering all operate in the space between what the policy covers and what attackers try. That space only gets mapped through adversarial testing, not configuration review.
Frequently asked questions
What is NVIDIA Agent Toolkit? NVIDIA Agent Toolkit is an open-source platform announced at GTC 2026 for building and running enterprise AI agents. It includes OpenShell (a runtime with policy-based security and privacy controls), NVIDIA AI-Q (an agentic search blueprint), Nemotron open models for local inference, and a library of open skills. Major enterprise software platforms including Adobe, Cisco, CrowdStrike, Palantir, SAP, and ServiceNow announced integrations at launch.
What does NVIDIA OpenShell actually enforce? OpenShell enforces policy-based network egress controls (blocking unapproved outbound connections), sandboxed agent execution (isolating agents from the host environment), least-privilege access controls (per-agent permission scoping), and privacy routing (stripping PII before cloud inference calls). These are runtime enforcement controls; they enforce rules operators define before deployment.
Does NVIDIA Agent Toolkit protect against prompt injection? Partially. OpenShell's intent verification can block out-of-policy actions that result from a prompt injection attack. However, it does not inspect content returned by external data sources before it enters the agent's context window. Indirect prompt injection, where a malicious instruction is embedded in a document or API response the agent retrieves, can still reach the agent's reasoning chain through approved channels.
What security testing does an Agent Toolkit deployment still require? Teams should test indirect injection paths through every external data source the agent queries, test intent verification bypass techniques, audit every skill for supply chain integrity before installation, run multi-turn erosion sequences to test cumulative goal hijacking, and re-test after every model or skill update. OpenShell enforces known-good policy; red teaming identifies what the policy misses.
Which enterprise companies are building on NVIDIA Agent Toolkit? The GTC 2026 announcement included integrations from Adobe, Amdocs, Atlassian, Box, Cadence, Cisco (AI Defense), Cohesity, CrowdStrike, Dassault Systèmes, IQVIA, Palantir, Red Hat, Salesforce, SAP, Siemens, and ServiceNow. Hardware partners include Dell, HP, Lenovo, ASUS, and Supermicro.
How do Repello's ARGUS and ARTEMIS work with NVIDIA Agent Toolkit? ARGUS is the runtime complement to OpenShell. It monitors session-level behavioral drift across conversations, catching multi-turn goal erosion that per-action intent verification does not track. ARTEMIS is the pre-deployment validation layer: it runs automated adversarial probe sequences against Agent Toolkit deployments, testing indirect injection paths through tool responses, intent verification bypass techniques, and skill supply chain integrity. Together they provide the production monitoring and adversarial testing that OpenShell's policy enforcement does not cover on its own.
Share this blog
Subscribe to our newsletter











