Why is pre-deployment testing not enough for AI security?

Pre-deployment testing captures a snapshot of model behavior at one point in time. But AI systems change: prompts evolve, user inputs shift, upstream models get updated, and adversaries adapt. A model that passes every test on Tuesday can behave differently by Friday. Runtime security fills that gap by monitoring behavior continuously.

How does continuous AI monitoring detect behavioral drift?

Continuous monitoring uses statistical methods like CUSUM (cumulative sum control charts) to detect when an AI system's behavior deviates from its established baseline. This catches subtle drift — like a clinical AI becoming more agreeable over extended conversations — that point-in-time testing would miss.

Learn

AI runtime security

AI runtime security makes AI behavior visible inside the customer environment, controls what systems can do as they execute, and proves which controls ran.

This guide covers continuous behavioral monitoring, tamper-evident receipts, drift detection, and incident containment — mapped to NIST AI RMF Measure and Manage functions, which require ongoing monitoring for any high-risk AI system.

What is AI runtime security?

AI runtime security is the practice of continuously monitoring AI systems after they’ve been deployed to production. It observes model behavior in real time — tracking outputs, detecting anomalies, and mapping findings to governance frameworks — so that drift, vulnerabilities, and compliance gaps are caught as they emerge, not weeks or months later.

Think of it as the difference between inspecting a bridge once before opening day and instrumenting it with sensors that monitor stress, vibration, and load every second it’s in service. The inspection matters. But the sensors are what keep people safe.

Why runtime security matters

AI systems don’t stand still. Even when the model weights don’t change, the environment around them does:

Prompts evolve. Teams update system prompts, add context, change instructions. Each change can subtly alter behavior.
User inputs shift. Real-world users find interaction patterns that testing never anticipated. Adversarial users probe for weaknesses.
Upstream models change. If you’re calling a hosted model via API, the provider may update the model behind the same endpoint. Your system prompt stays the same; the behavior underneath it changes.
Regulatory expectations tighten. The EU AI Act, Colorado AI Act, and NIST AI RMF all expect ongoing monitoring — not one-time assessments.

A model that passes every evaluation on Tuesday can behave differently by Friday. Runtime security fills that gap.

How it differs from pre-deployment testing

Pre-deployment testing answers the question: “Does this model behave correctly right now, on these inputs?” That’s a valuable question. But it’s not the only one.

	Pre-deployment testing	Runtime security
Timing	Before deployment	Continuous, post-deployment
Coverage	Known test cases	All production interactions
Drift detection	No — single snapshot	Yes — CUSUM, statistical baselines
Adversarial inputs	Simulated attacks	Real-world attack detection
Compliance	Point-in-time evidence	Continuous attestation

Both are necessary. Pre-deployment testing establishes the baseline. Runtime security ensures that baseline holds. Regulations like the NIST AI RMF (Govern 1.5, Measure 2.6) and the EU AI Act (Article 9) explicitly require continuous post-deployment monitoring.

What continuous monitoring looks like

Effective AI runtime security combines several capabilities:

1. Automated red teaming on a schedule

Rather than running penetration tests once before launch, continuous red teaming runs adversarial probes on a recurring cadence — daily, weekly, or triggered by events like prompt changes or model updates. Tools like Scan test across prompt injection, PII extraction, role confusion, jailbreaking, and more.

2. Behavioral drift detection

CUSUM (cumulative sum) control charts track model behavior over time. When the cumulative deviation from a baseline exceeds a threshold, the system flags a drift event. This catches subtle changes — like a clinical AI becoming gradually more agreeable over extended conversations — that no single test would reveal.

3. Governance framework mapping

Every finding is automatically mapped to the relevant controls in OVERT, NIST AI RMF, and MITRE ATLAS. This turns raw security data into audit-ready evidence. A prompt injection vulnerability isn’t just a technical finding — it’s mapped to ATLAS AML.T0051 and OVERT RT-3.

4. Cryptographic attestation

Runtime findings are captured in append-only attestation logs (following the OVERT standard), creating tamper-evident records of what was tested, when, and what the result was. This is the “receipts” layer — proof that monitoring actually happened, not just a policy claiming it does.

Who needs AI runtime security?

Any organization deploying AI systems into production — particularly in regulated industries. Healthcare systems using ambient scribes or clinical decision support. Financial services firms running credit models or fraud detection. AI labs shipping products to millions of users. If the model touches real decisions, runtime security isn’t optional.

The regulatory landscape is making this explicit. The Colorado AI Act requires “ongoing monitoring” of high-risk AI systems. The EU AI Act mandates post-market surveillance. And the NIST AI RMF treats continuous monitoring as a core function, not an afterthought.

Getting started

The fastest path to AI runtime security is to start with a scan. Scan is open source and runs a behavioral assessment in five minutes. Point it at any model endpoint. It probes across seven attack categories, detects behavioral drift with CUSUM, and maps every finding to OVERT controls.

From there, you can set up recurring scans, integrate results into your compliance workflows, and build toward continuous attestation. The point isn’t to boil the ocean. It’s to stop relying on snapshots and start watching what your AI actually does.

Explore further

Pillar guide

See it in action

Run a free behavioral scan against your AI system. Five minutes, seven attack categories, mapped to OVERT controls.

Book the Sprint View on GitHub

Navigate

Solutions

Evidence

Regulations

AI runtime security

What is AI runtime security?

Why runtime security matters

How it differs from pre-deployment testing

What continuous monitoring looks like

1. Automated red teaming on a schedule

2. Behavioral drift detection

3. Governance framework mapping

4. Cryptographic attestation

Who needs AI runtime security?

Getting started

Explore further

AI penetration testing

AI agent security

Prompt injection

OWASP LLM Top 10

AI attestation

AI incident response

AI red teaming

See it in action