Introduction: AI in the Wild

AI is no longer confined to R&D teams and academic benchmarks. It’s powering help desks, generating product recommendations, writing code, and increasingly connecting directly to critical systems via toolchains and APIs. These new capabilities come with new risks, and security teams are being asked to manage systems that behave unpredictably and evolve constantly.

Traditional security tools weren’t designed for a world that operates at the pace of AI. They rely on static configs, periodic scans, and assumptions that don’t hold when AI systems are reasoning, inferring, and executing in real time.

At Upwind, we think security should reflect the nature of what it’s protecting. That means focusing on runtime, observing real behavior, and building controls that adapt as fast as the systems themselves.

Executive Summary

AI workloads introduce new attack surfaces at every layer of the stack. For example:

  • What looks like a standard container could be running model-serving logic that interprets live user prompts. 
  • What used to be a benign API call might now be triggered by an LLM agent executing an unverified instruction. 
  • What appears to be an authorized cloud function could be the result of an attacker hijacking an orchestration loop.

Upwind secures AI systems at runtime, which is where activity is visible, and where attacks unfold.

We provide:

  • GPU & container telemetry to catch lateral movement through AI compute nodes.
  • Model-layer ADR to detect abuse or anomalies in inference behavior.
  • Dataflow inspection to surface prompt injections and exfiltration.
  • MCP-layer tracing to govern tool use and block unsafe actions.
  • Agent observability to make reasoning and decisions traceable.
  • Governance overlays to unify trust and compliance across the stack.

You can’t secure AI just by scanning its code or auditing its settings. You need to see what it’s doing, while it’s live and making decisions.

Framing the AI Stack

To secure AI systems effectively, you have to understand how they’re structured in practice. These systems are composed of multiple distinct layers, from the underlying compute to the orchestration logic that turns model output into real-world action.

Each of these layers introduces its own category of risks, from GPU-level lateral movement to agentic loops triggering unintended API calls. Below, we break down each layer of the modern AI stack, explain how attackers are already exploiting them, and show how Upwind delivers protection at runtime.

Framing-the-AI-Stack-1-1

Each of the following sections zooms into a specific layer of the stack, starting from the infrastructure that runs it all, up through models, data, orchestration logic, and finally to agents acting on their own. Think of this as peeling back the layers of a live system to see where risk actually lives.

GPU Infrastructure & Orchestration

Start where most AI systems do: on GPU-backed infrastructure. Whether running on-prem, in the cloud, or via managed platforms like SageMaker or Vertex AI, these workloads depend on accelerated hardware and complex container orchestration.

Compromising a GPU node gives attackers a foothold to:

  • Access model weights or prompts in memory
  • Evade traditional monitoring tools
  • Move laterally through poorly segmented workloads
GPU-Infrastructure-Orchestration-1-1

Upwind collects telemetry directly from running nodes, monitoring CUDA/ROCm activity, runtime kernel access, and workload behavior inside Kubernetes or ECS environments. When something breaks isolation, or behaves differently than it should, we surface it in real time.

Model Hosting & AI Platforms

In this layer, the risks are less about break-ins and more about blind spots.

Model-Hosting-AI-Platforms-1-1

Many organizations now use hosted models via services like AWS Bedrock or Azure OpenAI, or deploy their own LLMs in containers. These models often run with broad access and minimal oversight.

Questions that often go unanswered:

  • Is this container doing what it was designed to do?
  • Are there unmonitored model endpoints accessible from the internet?
  • Are runtime behaviors (e.g. data access patterns, API calls, or memory usage) consistent with what was tested?

Upwind’s Application Detection & Response engine uses function-level instrumentation (via Nyx) to analyze real-time model execution, detect anomalies in behavior, and respond automatically. If a model suddenly spikes in outbound data usage or starts returning atypical tokens – security will know.

Inference Pipelines & Data Stores

This level of the stack is where attackers often land.

Inference-Pipelines-Data-Stores-2-1

Inference is the point of contact between models and sensitive data. Prompts hit APIs, get routed through pipelines, and pull context from vector databases and object stores. If you can poison the prompt, reroute the query, or manipulate the embedding, you can leak data or corrupt results.

One known tactic: prompt injection aimed at stealing results from Pinecone or S3-backed retrieval. Another: chaining together docstore inputs to induce unintended tool use.

Upwind monitors this flow in production:

  • We flag suspicious access patterns in vector DBs.
  • We intercept risky inference calls in real time.
  • We trace data lineage across stores and pipelines.

Instead of relying on configuration audits, we watch how inference actually behaves.

MCP & External APIs

This layer didn’t exist a year ago. Now, it’s where real-world AI decisions turn into real-world consequences. The Model Context Protocol (MCP), championed by Anthropic and increasingly adopted across the ecosystem, provides a secure, structured way for models to call tools. Frameworks like LangChain and AutoGen build on top of MCP to orchestrate multi-step reasoning and tool execution.

MCP-External-APIs-1-1-1024x470

The benefit: your AI can do more. The risk: it might do the wrong thing.

An attacker doesn’t have to break into your system. They just have to manipulate the model into calling the wrong tool, with the wrong inputs.

Upwind integrates directly at the MCP server layer to:

  • Trace each tool call and its originating prompt chain.
  • Block unauthorized or suspicious action requests.
  • Enforce runtime policy before a tool is touched.

This deep integration is critical for AI security. A manipulated prompt could cause them to initiate a financial transaction, write to a production database, or spin up new cloud infrastructure. These are actions with real operational impact. Quite often they’re changing state, submitting forms, and triggering systems. As seen in the above examples, models aren’t just answering questions anymore, and compromised AI actions can cause catastrophic real-world consequences. 

AI Agents & APIs

Once a model has access to tools and memory, you’re no longer dealing with a stateless service. You’re dealing with a planner, a reasoning engine, sometimes even a recursive self-correcting loop. That’s what agentic AI really is: persistent, semi-autonomous systems that decide what to do next.

AI-Agents-APIs-2

And when something goes wrong, you’ll want to answer questions like:

  • Who initiated the toolchain?
  • What memory did the agent reference?
  • Was this the expected action, or a deviation?

Upwind gives you the visibility to:

  • Reconstruct full agent workflows
  • Detect deviations from expected sequences
  • Block dangerous combinations of tools, prompts, or memory access

If your AI stack is evolving in real time, your security has to evolve with it.

Governance & Trust Across the Stack

Whether you’re complying with SOC 2 or the EU AI Act, or just trying to pass a vendor risk assessment, your ability to explain how your AI systems behave matters.

Upwind provides:

  • Full audit trails from input to action
  • Cross-layer identity correlation
  • Policy definitions that apply at runtime, not just deploy time

We tie together telemetry from infrastructure, applications, identity providers, and model behavior into a single graph that highlights potential attack paths, policy violations, and risky user-initiated actions. That lets you say with confidence: this action was taken by this agent, using this tool, based on this input.

And that’s what auditors and customers are going to start expecting.

Governance-Trust-Across-the-Stack-1-1

Closing Perspective: Security Where AI Lives

The risks facing AI systems today aren’t hypothetical anymore. From prompt injection campaigns to vector database exploits and autonomous agents triggering unintended API calls, the attack surface has already expanded.

Most traditional security tools weren’t built for this environment. They weren’t designed to operate at runtime, track evolving model behavior, or understand how orchestration frameworks like LangChain and MCP influence downstream system state. These gaps leave organizations exposed.

Upwind takes a different approach. We give security teams real-time telemetry from GPU and containerized infrastructure. We monitor model behavior and flag anomalies as they happen. We trace inference and data flows, detect prompt-driven misuse, and enforce policy at the point of tool invocation. Whether the risk stems from the model itself, the orchestration layer, or an autonomous agent’s decisions, Upwind provides the visibility and control to step in before something goes wrong.

AI systems evolve quickly, and if your security program isn’t keeping pace, it’s going to fall behind.

See what securing live AI workloads looks like. Book a customized demo with us.