Carapace
Technical Whitepaper

Hardware-Rooted Security for Autonomous AI Agents

A deep dive into Carapace's architecture — hardware enclaves, credential lifecycle, policy engine, and cryptographic audit trails.

Version 2.0February 2026

Carapace: Hardware-Rooted Security for Autonomous AI Agents

A Technical Whitepaper

Version 2.0 — February 2026


Abstract

The proliferation of autonomous AI agents — systems that operate with persistent access to user credentials, communication channels, and external APIs — has introduced a new class of security risk. Existing deployment models force users into an unacceptable tradeoff: grant powerful agents broad access to sensitive accounts, or forgo the productivity gains they offer. An internal security assessment in January 2026 found over 950 publicly exposed agent instances with zero authentication, root shell access, and credential material in world-readable files — a structural consequence of asking individual users to solve infrastructure security problems.

Carapace eliminates this tradeoff through three architectural primitives: hardware-backed credential isolation using AMD SEV-SNP Confidential Virtual Machines, brokered action execution with cryptographic policy enforcement, and tamper-evident Merkle-chained audit trails. Credentials are decrypted only within hardware-attested enclaves — provably inaccessible to the cloud operator, the platform operator, and the agent itself. Every agent action is mediated by a credential broker that enforces user-defined policies, and every outcome is recorded in an Ed25519-signed receipt chain. This paper describes the architecture, threat model, cryptographic protocols, and security properties of the platform, covering both its current implementation and its design trajectory.


Table of Contents

  1. Introduction
  2. Problem Statement
  3. Architecture Overview
  4. Threat Model
  5. Trusted Execution Environment
  6. Credential Lifecycle
  7. The Credential Broker
  8. Policy Engine
  9. Cryptographic Audit Trail
  10. Perception Attack Defenses
  11. Process Isolation and Sandboxing
  12. Agent Adapter Framework
  13. Infrastructure and Network Security
  14. Future Directions
  15. Conclusion

1. Introduction

Large language models have evolved from conversational assistants into autonomous agents capable of reading email, managing calendars, executing code, posting to social media, and interacting with arbitrary APIs on behalf of their users. Projects such as Claude Computer Use, open-source frameworks like MCP (Model Context Protocol) servers, and a growing ecosystem of community-built agent tools demonstrate both the power and the fragility of this paradigm.

The security model of these systems is, almost universally, "trust the operator." Agents run on user-managed infrastructure with direct access to plaintext credentials stored in environment variables or configuration files. There is no isolation between the agent's reasoning process and the credential material it uses. There is no enforced boundary between what the agent can do and what the user wants it to do. And there is no cryptographic guarantee that the agent's self-reported activity log accurately reflects what actually happened.

Carapace addresses these gaps by introducing three architectural primitives:

  1. Hardware-isolated credential storage — Credentials are encrypted at rest and decrypted only inside AMD SEV-SNP Confidential Virtual Machines, where even the cloud hypervisor cannot inspect memory.

  2. Brokered action execution — Agents never receive raw credential material. Instead, a credential broker mediates all interactions with external services, enforcing user-defined policies before each action.

  3. Signed audit receipts — Every action produces a cryptographically signed receipt chained into a Merkle tree, providing tamper-evident proof of what the agent did, when, and under what policy.

The remainder of this paper details how these primitives are implemented, composed, and defended.


2. Problem Statement

2.1 The Agent Credential Paradox

An AI agent that manages your email needs your email credentials. An agent that posts to social media needs your API tokens. An agent that writes code needs access to your repositories. This creates a fundamental tension: the more capable the agent, the more credential material it must hold, and the greater the blast radius if the agent — or the system running it — is compromised.

Current deployment models resolve this tension in the user's disfavor:

  • Environment variable storage. Credentials are stored as plaintext environment variables on the host machine, accessible to any process with the same UID and persisted in shell history, process listings, and crash dumps.

  • File-based secrets. Tokens are written to configuration files on disk (often in /tmp or the user's home directory), where they are readable by other processes, backed up to cloud storage, and indexed by search tools.

  • No privilege separation. The agent process that performs reasoning over untrusted input (user messages, web content, API responses) is the same process that holds credential material. A single prompt injection can exfiltrate every token the agent possesses.

  • Self-reported audit logs. The agent tells the user what it did. If the agent is compromised or manipulated, it can lie. There is no independent verification mechanism.

2.2 Scale of Exposure

An internal Carapace security assessment conducted in January 2026, using Shodan and Censys queries against publicly deployed ClawdBot instances, found 954+ exposed endpoints with:

  • Unauthenticated WebSocket access providing root shell capability
  • Complete conversation histories (including credential material) accessible without authentication
  • Signal device-linking URIs stored in world-readable temporary files
  • No rate limiting, no policy enforcement, no audit trails

This is not an indictment of any particular project — it is a structural consequence of asking individual users to solve infrastructure security problems that challenge dedicated platform teams.

2.3 Design Requirements

From these observations, we derive the following requirements for a secure agent hosting platform:

RequirementProperty
R1: Credential isolationPlaintext credentials must never be accessible outside a hardware-attested execution environment.
R2: Least privilegeAgents must operate under explicit, user-defined capability grants with enforced rate limits.
R3: Verifiable auditAll agent actions must produce independently verifiable cryptographic evidence.
R4: Perception integrityThe system must defend against agents that misrepresent their actions to users.
R5: Zero-trust operatorThe platform operator (Carapace) must be unable to access user credentials, even with full infrastructure access.
R6: UsabilitySecurity properties must be achievable without requiring users to manage infrastructure, write configuration files, or understand cryptographic protocols.

3. Architecture Overview

Carapace is structured as a three-tier system with strict trust boundaries between each tier.

Tier 1

Dashboard

Next.js Web Application

  • User authentication (passkeys / WebAuthn)
  • OAuth connection management
  • Policy configuration (natural language)
  • Real-time log streaming
  • Attestation verification UI
Trust level: User's browser. Handles no secrets.
HTTPS (TLS 1.3)
Tier 2

Orchestrator

Python/FastAPI on GCP Cloud Run

  • Agent lifecycle management
  • OAuth flow handling (seals tokens on receipt)
  • Credential sealing via GCP Cloud KMS (HSM-backed)
  • Enclave provisioning and health monitoring
  • Async task processing (Celery + Redis)
Trust level: Sees sealed blobs. Never sees plaintext.
TLS (Private VPC)
Tier 3

Enclave

GCP Confidential Space (AMD SEV-SNP)

  • Credential broker (policy enforcement, rate limiting)
  • Credential unsealing via Workload Identity Federation
  • Agent adapter runtime (CarapaceBot, MCP, custom)
  • Ed25519 receipt signing with Merkle chain
  • Process sandboxing (cgroups, namespaces, seccomp)
Trust level: Hardware-attested. Only component that sees plaintext credentials.
Fig. 1 — Three-tier architecture with trust boundaries

3.1 Trust Boundaries

The system maintains three explicit trust boundaries:

Boundary 1: Browser ↔ Orchestrator. Standard HTTPS with TLS 1.3. The dashboard authenticates users via WebAuthn passkeys and communicates with the orchestrator's REST API. No credential material crosses this boundary — only sealed blobs and metadata.

Boundary 2: Orchestrator ↔ Enclave. TLS over a private GCP VPC with no public IP addresses on enclave instances. The orchestrator sends sealed credential blobs and policy configurations. The enclave returns signed receipts and log streams. The orchestrator cannot decrypt credential blobs — it lacks the key, which is released only to attested enclaves via Workload Identity Federation.

Boundary 3: Enclave ↔ External Services. The enclave's credential broker makes authenticated API calls to external services (Gmail, Telegram, GitHub, etc.) on behalf of the agent. Credentials are held in secure memory, used for the duration of the API call, and zeroed immediately after.

3.2 Shared Packages

Cross-cutting concerns are isolated in four shared packages:

  • carapace-crypto: Ed25519 signing and verification, AES-256-GCM symmetric encryption, SHA-256 hashing, Merkle tree construction and proof verification, and encoding utilities. This package has no network dependencies and undergoes independent security review.

  • carapace-policy: Policy data models (capabilities, rate limits, security modes), a natural language parser that converts plain-English policy statements into structured rules, a validation engine, and pre-built policy templates for common use cases.

  • carapace-common: Shared TypeScript type definitions for the dashboard and API boundaries, including agent status enumerations, API request/response schemas, and billing types.

  • carapace-adapters: Agent adapter implementations that translate between agent-specific tool call formats and the broker's standardized Action interface. Includes the CarapaceBot adapter (Carapace's own Claude-powered agent), an MCP adapter for third-party tool servers, and a base class for custom agents.


4. Threat Model

4.1 Adversary Classes

Carapace's threat model considers five adversary classes, ordered by increasing capability:

AdversaryCapabilitiesExample
A1: Network observerCan observe and manipulate network traffic between components.ISP, nation-state, compromised router.
A2: Compromised agentThe AI agent itself is adversarial — via prompt injection, jailbreak, or malicious fine-tuning.Poisoned training data, adversarial user input.
A3: Platform insiderA Carapace employee or contractor with access to orchestrator infrastructure.Rogue engineer, compromised CI/CD.
A4: Cloud insiderA GCP employee or contractor with hypervisor-level access to the host machine running the enclave.Compromised cloud operator, government subpoena to cloud provider.
A5: Physical attackerPhysical access to the server hardware running the enclave.Data center breach, seized hardware.

4.2 Security Properties by Adversary

PropertyA1A2A3A4A5
Credential confidentiality✓*
Policy enforcement✓*
Audit integrity✓*
Action authenticity✓*

* A5 protection depends on the absence of practical side-channel attacks against AMD SEV-SNP. As of February 2026, no practical side-channel extraction of guest memory from SEV-SNP has been demonstrated, though academic research continues.

4.3 Explicit Non-Goals

The following are explicitly outside the current threat model:

  • User device compromise. If the user's browser or device is compromised, the attacker can impersonate the user. This is a pre-authentication concern.
  • Social engineering. If a user is tricked into approving a malicious action, the system will execute it. Carapace can delay and surface suspicious actions but cannot override explicit user approval.
  • Denial of service at the cloud layer. GCP can shut down the VM. This affects availability, not confidentiality or integrity.
  • Bugs in AMD silicon. We treat AMD SEV-SNP as a correct implementation of its specification. Known errata are tracked and mitigated through TCB version enforcement.

5. Trusted Execution Environment

5.1 AMD SEV-SNP Overview

AMD Secure Encrypted Virtualization with Secure Nested Paging (SEV-SNP) provides hardware-enforced isolation for virtual machine workloads. Key properties:

  • Memory encryption. Each VM's memory is encrypted with a unique AES key managed by the AMD Secure Processor (SP). The hypervisor, other VMs, and DMA-capable devices cannot read or tamper with encrypted memory.

  • Integrity protection. SNP extends SEV-ES with Reverse Map Table (RMP) entries that prevent the hypervisor from remapping, replaying, or aliasing guest memory pages.

  • Attestation. The AMD SP can generate a signed attestation report containing the VM's launch measurement (a SHA-384 hash of the initial memory contents), platform configuration, and user-provided data. The report is signed with a key rooted in AMD's hardware certificate chain, verifiable by any party with access to AMD's public root certificates.

5.2 Why GCP Confidential Space with AMD SEV-SNP

Carapace evaluated three confidential computing platforms:

PropertyAWS Nitro EnclavesGCP Confidential Space (AMD SEV-SNP)Intel TDX
Trust rootAWS hypervisor (software)AMD Secure Processor (silicon)Intel TDX module (microcode)
Memory encryptionParent instance has accessHardware-enforced, hypervisor excludedHardware-enforced, hypervisor excluded
Attestation signerAWS (cloud operator)AMD (silicon vendor), verified via GCP OIDCIntel (silicon vendor)
Key releaseAWS KMS policy (cloud-controlled)Workload Identity Federation (attestation-gated IAM)Similar to SEV-SNP
Container-nativeNo (custom EIF format)Yes (standard OCI containers)No
Ecosystem maturityProduction (since 2020)Production (since 2023)Early availability

GCP Confidential Space was selected for two reasons. First, the underlying AMD SEV-SNP hardware provides the strongest isolation guarantee available in production cloud environments: the trust root is in silicon, not in software or a cloud operator's signing key. A compromised GCP hypervisor cannot read enclave memory — this is enforced by the CPU, not by a software policy. Second, Confidential Space's container-native model integrates cleanly with standard OCI container workflows, enabling the enclave image to be built, measured, and deployed using familiar tooling. The attestation flow leverages GCP's OIDC infrastructure and Workload Identity Federation, providing a mature credential exchange mechanism without requiring custom attestation service clients.

5.3 Attestation Flow

GCP Confidential Space provides a container-native attestation model. Rather than requiring the enclave to directly interface with /dev/sev-guest, the Confidential Space runtime abstracts the hardware attestation into an OIDC token flow:

Enclave
(Container)
Confidential Space
Runtime
GCP STS
(WIF)
Cloud KMS
(HSM)
1. Request OIDC token
2. Generate SNP report, embed in JWT
3. OIDC token (signed JWT)
4. Exchange token for GCP creds
5. Verify OIDC sig, hwmodel, image digest
6. Impersonated credentials
7. KMS decrypt (attested SA)
8. IAM attestation conditions check
9. Unwrapped key material
Fig. 2 — Attestation-gated key release via Workload Identity Federation

The Confidential Space TEE server (accessible via a Unix domain socket at /run/container_launcher/teeserver.sock) generates OIDC tokens signed by Google's Confidential Computing identity provider. These tokens embed hardware attestation claims:

  • hwmodel: Hardware model (GCP_AMD_SEV for SEV-SNP instances).
  • swname: Software environment (CONFIDENTIAL_SPACE).
  • container_image_digest: SHA-256 digest of the exact OCI container image running in the enclave.
  • eat_nonce: Nonce for freshness binding. While the Confidential Space API treats this field as optional, Carapace always provides a nonce (a random 32-byte value generated per attestation request) to prevent replay of stale attestation tokens. Omitting the nonce would allow a captured token to be replayed indefinitely within its TTL — an unacceptable TOCTOU gap for credential release operations.

The underlying AMD SEV-SNP hardware still provides the root of trust — the OIDC token's claims are derived from the hardware attestation report. The Confidential Space runtime acts as a bridge, translating hardware attestation into the OIDC/WIF credential exchange that GCP services natively understand.

5.4 Attestation-Gated Key Release via Workload Identity Federation

GCP Cloud KMS with Workload Identity Federation (WIF) conditions the release of cryptographic key material on successful attestation verification:

  1. The enclave requests an OIDC attestation token from the Confidential Space TEE server, including a freshness nonce (always provided; see Section 5.3).
  2. The enclave exchanges this token with GCP's Security Token Service (STS) via a Workload Identity Federation pool configured to accept tokens from the confidentialcomputing.googleapis.com issuer.
  3. STS validates the OIDC token signature and returns federated credentials.
  4. The enclave uses these federated credentials to impersonate the CVM service account — a GCP service account with an IAM condition that restricts access to requests bearing specific attestation claims.
  5. The IAM condition enforces: hwmodel == "GCP_AMD_SEV" AND swname == "CONFIDENTIAL_SPACE" AND container_image_digest == "<expected-digest>".
  6. Only if the attestation claims satisfy the IAM condition does Cloud KMS permit the decrypt operation, releasing the AES-256 wrapping key for sealed credential blobs.

This flow ensures that credential decryption keys are released only to enclaves running the exact expected container image on genuine AMD SEV-SNP hardware within GCP Confidential Space. The IAM condition acts as a declarative policy — changing the expected container digest requires updating the IAM binding, providing an auditable configuration change.


6. Credential Lifecycle

6.1 Credential Ingestion

When a user connects an external service (e.g., Gmail, Telegram, GitHub), the OAuth flow proceeds through the orchestrator:

  1. OAuth initiation. The dashboard redirects the user to the service's authorization endpoint with a state parameter stored in Redis (TTL-bounded for CSRF protection).
  2. Token exchange. The service redirects back to the orchestrator with an authorization code. The orchestrator exchanges this code for access and refresh tokens.
  3. Immediate sealing. The orchestrator performs envelope encryption: it generates a random AES-256 key, encrypts the token payload with AES-GCM, then wraps the AES key with GCP Cloud KMS (HSM-backed). The plaintext tokens exist in orchestrator memory only for the duration of this operation.
  4. Storage. The sealed blob — consisting of the KMS-wrapped AES key (length-prefixed), the AES-GCM nonce (12 bytes), and the ciphertext with authentication tag (16 bytes) — is stored in PostgreSQL. The orchestrator retains no plaintext copy, and the ephemeral AES key is zeroed from memory.

6.2 Credential Unsealing (Inside Enclave)

When an agent needs to use credentials:

  1. The orchestrator sends the sealed credential blob to the enclave over TLS via the private VPC.
  2. The enclave performs the attestation flow (Section 5.3) to obtain GCP credentials via Workload Identity Federation.
  3. The enclave calls Cloud KMS decrypt with its attested identity to unwrap the AES-256 key. The KMS decrypt operation succeeds only if the IAM condition (Section 5.4) is satisfied — i.e., the enclave is running the expected container image on SEV-SNP hardware.
  4. The enclave decrypts the credential blob using AES-GCM with the unwrapped key, verifying the authentication tag for integrity.
  5. The plaintext credential is stored in a SecureDict — a memory structure that uses ctypes.memset for explicit zeroing and prevents accidental serialization (pickling is disabled, __str__ returns [REDACTED]). The AES key is immediately zeroed from memory.

6.3 Credential Zeroing

After use, credentials undergo explicit memory cleanup:

# SecureString uses ctypes.memset for byte-level zeroing
def secure_zero(data: bytearray) -> None:
    buf = (ctypes.c_char * len(data)).from_buffer(data)
    ctypes.memset(ctypes.addressof(buf), 0, len(data))

The SecureString class provides context manager semantics (with SecureString(value) as s:) that guarantee zeroing on scope exit, destructor-based zeroing as a safety net, runtime errors on access after clearing, and prevention of pickling to block accidental serialization.

Acknowledged limitations. Python's memory model introduces meaningful gaps in credential zeroing that we treat as a known risk, not a solved problem:

  • Immutable copies. Python str and bytes objects are immutable — any accessor that returns a string representation creates a copy that cannot be reliably zeroed. The SecureString class mitigates this by storing the canonical value as a mutable bytearray and raising errors on implicit string conversion, but intermediate copies may exist in the runtime's internal buffers.
  • GC compaction. CPython's garbage collector can relocate bytearray contents during memory compaction, leaving copies of credential bytes in freed (but not zeroed) heap pages. The ctypes.memset zeroing targets the current buffer address, not any prior locations.
  • String interning. If credential material is inadvertently converted to a str (e.g., via logging or f-string formatting), Python may intern the value, causing it to persist for the lifetime of the process.

The explicit zeroing of bytearray storage, combined with forced garbage collection and strict API discipline (no __str__, no pickling, context-manager scoping), provides defense-in-depth against casual memory inspection but does not constitute a cryptographic guarantee. This is an inherent limitation of managed-language credential handling. A planned mitigation is a Rust extension module for the credential hot path — SecureString operations (store, access, zero) would execute in Rust-managed memory outside the Python heap, eliminating GC-related copy risks while preserving the Python API surface.

6.4 Credential Refresh

OAuth refresh tokens follow the same sealed storage pattern. When an access token expires:

  1. The broker detects the expired token (HTTP 401 response).
  2. The broker uses the sealed refresh token (already in enclave memory) to obtain a new access token from the OAuth provider.
  3. The new access token is used immediately. The new sealed blob is transmitted back to the orchestrator for persistent storage.
  4. At no point does the refresh flow transit through the orchestrator in plaintext.

7. The Credential Broker

The credential broker is the security core of the enclave — a mandatory intermediary between agents and external services that enforces the complete security pipeline for every action.

7.1 Security Pipeline

Every action request passes through a seven-stage pipeline:

Stage 0
Quota Check
Stage 1
Policy Check
Stage 2
Rate Limit
Stage 3
Circuit Breaker
Stage 4
Anomaly Detection
Stage 5
Confirmation
Stage 6
Execution + Receipt

Every action request passes through all seven stages. Denial at any stage generates a signed receipt.

Fig. 3 — Seven-stage credential broker security pipeline

Stage 0: Quota enforcement. The broker checks whether the agent's owner has exhausted their monthly action quota. If so, the action is immediately rejected with a DENIED receipt (detailed in Section 7.5).

Stage 1: Policy check. The broker verifies the requested action type against the user's policy. If the policy does not explicitly grant the capability, the action is denied with a PolicyDeniedError and a denial receipt is generated.

Stage 2: Rate limit check. The broker enforces per-service, per-action-type rate limits (e.g., 50 emails per hour, 200 per day). Rate limits are configured in the policy and enforced with a sliding window counter. Exceeded limits produce a RateLimitExceeded error and a rate-limited receipt.

Stage 3: Circuit breaker. If a service has experienced repeated failures, the circuit breaker trips to prevent cascading errors. The breaker tracks success/failure rates per service and enters an open state after a configurable failure threshold, rejecting requests for a cooldown period before allowing a probe request.

Stage 4: Anomaly detection. The anomaly detector evaluates the action against learned behavioral baselines, producing a composite anomaly score from multiple factors (detailed in Section 7.2). Actions scoring above the confirmation threshold (default 0.7) are held for user approval.

Stage 5: Confirmation. If the anomaly score triggers confirmation, the action is queued with a PENDING_CONFIRMATION receipt. The user receives a notification (via the dashboard or out-of-band channel) describing the action and its anomaly factors. The user may approve or reject the action. A bounded confirmation queue (1,000 maximum) prevents memory exhaustion from anomaly storms.

Stage 6: Execution and receipt. The broker invokes the registered executor for the target service, passing the action parameters and the secure credential dictionary. The agent never receives the credential dictionary directly — the executor uses credentials to make the API call and returns only the result data. A signed receipt is generated regardless of the outcome (success, failure, or error).

7.2 Anomaly Detection

The anomaly detector maintains rolling behavioral baselines and evaluates five anomaly factors:

FactorWeightTrigger
New target0.3First interaction with an email address, chat ID, or endpoint never seen before.
Frequency spike0.4Action rate exceeds 3x the established baseline rate for this service and action type.
New action type0.2First use of an action type (e.g., email.delete) not previously exercised for this service.
Unusual time0.1Activity during an hour with less than 10% of the average hourly action count.
Unusual payload0.2Payload size exceeds 10x the average for this service and action type.

Factors are additive (capped at 1.0). The detector is thread-safe (mutex-protected) and maintains bounded data structures to prevent memory exhaustion (10,000 known targets per service, 100 recent payload size samples).

The confirmation threshold is configurable but defaults to 0.7 — meaning any combination of factors with a total weight of 0.7 or more triggers a confirmation request. For example, a new target (0.3) combined with a frequency spike (0.4) triggers confirmation. A new target alone (0.3) does not.

Limitations. The current weights are heuristic, not learned from data. They were chosen to minimize false positives for common usage patterns while catching the most obvious anomaly signals. A sophisticated attacker who knows the weighting scheme could craft actions that stay below the threshold — for example, slowly introducing new targets (0.3) at a steady rate during business hours to avoid stacking with frequency-spike (0.4) or unusual-time (0.1) factors. The progressive trust model (Section 14.3) partially addresses this by enabling per-target trust decay and adaptive thresholds. Future work includes learned weights from anonymized behavioral data across the platform, which would make the scheme harder to predict and game.

7.3 Receipt Push Callbacks

After every receipt is generated and appended to the hash chain, the broker optionally invokes an asynchronous receipt callback — a function provided by the orchestrator at broker initialization. This enables real-time receipt streaming to the orchestrator for dashboard display, alerting, and persistent storage, without requiring the orchestrator to poll the enclave. The callback is best-effort: delivery failures are logged but do not block the broker pipeline or affect the action outcome. This design ensures that receipt generation (a security-critical operation) is never gated on network availability to the orchestrator.

7.4 Error Sanitization

The broker sanitizes all error messages before including them in receipts or returning them to callers. Internal error details (stack traces, connection strings, credential fragments) are replaced with generic messages. Only errors from known-safe types (policy denial, rate limiting, validation errors) retain their messages.

7.5 Quota Enforcement as Denial-of-Resource Defense

Unbounded agent execution creates a denial-of-resource vector: a compromised or runaway agent could exhaust API rate limits, cloud compute budgets, or external service quotas on behalf of a user. Carapace enforces per-account action quotas as a security boundary, not merely a billing mechanism.

Quota enforcement is split across two trust domains to prevent bypass:

  1. Orchestrator (durable tracking). The orchestrator maintains per-billing-period action counts in PostgreSQL and sets a quota_exhausted flag on the broker when the ceiling is reached.
  2. Broker (hard gate). The broker's Stage 0 quota check (Section 7.1) rejects all actions when the flag is set, generating a cryptographically signed DENIED receipt with a quota_exhausted reason. Because this check runs inside the enclave before any policy evaluation, it cannot be bypassed by agent logic, policy manipulation, or direct enclave communication.

This two-layer design ensures that quota enforcement is both durable (orchestrator-tracked, survives enclave restarts) and immediate (broker-enforced, blocks the next action without polling delay). The receipt trail provides auditable proof of every quota decision — including the exact action that triggered the limit.


8. Policy Engine

8.1 Policy Model

Policies are structured documents composed of four layers:

Security mode. A top-level preset that configures defaults:

  • Paranoid: Strict allowlists. All novel actions require explicit confirmation. Minimal rate limits.
  • Balanced: Reasonable defaults. First-contact delays for new recipients. Moderate rate limits.
  • Permissive: Generous limits and minimal friction. Users who accept higher risk in exchange for reduced confirmation prompts.

Capabilities. Per-service, per-action grants. Each capability specifies:

  • An ActionType from a standardized enumeration of 125 action types (see Appendix B) spanning communication (email, messaging, Slack), productivity (calendar, files, Notion, Obsidian, ClickUp), social platforms (Twitter/X, Reddit, GitHub, Farcaster), trading (Polymarket, Hyperliquid), code execution, memory, agent lifecycle (heartbeat, sub-agents), and platform operations (auth, MCP, credentials).
  • An allowed flag (grant or deny).
  • Optional rate limits (max_per_hour, max_per_day).
  • A requires_confirmation flag for actions that should always prompt the user.

Trusted contacts. Per-service allowlists of email domains and specific contacts that bypass first-contact confirmation delays. For example, a policy might trust @company.com for email sending but require confirmation for all other domains.

Global rate limit. An aggregate ceiling across all services and action types. When both service-specific and global limits exist, the more restrictive limit applies.

8.2 Natural Language Parsing

Users define policies in plain English. The policy parser converts these statements into structured Policy objects:

User inputParsed result
"Can read and send email, no delete"Gmail: EMAIL_READ ✓, EMAIL_SEND ✓, EMAIL_DELETE
"Up to 50 emails per hour, 200 per day"Gmail: EMAIL_SEND rate limit: 50/hr, 200/day
"Trust emails from @acme.com"Gmail: trusted domain: acme.com
"Balanced security"Security mode: BALANCED

The parser handles compound statements, negations, numeric rate limits, and domain-specific terminology. Ambiguous inputs are resolved conservatively (deny-by-default).

8.3 Policy Templates

Pre-built templates serve common use cases:

  • Email Assistant: Read/send email with moderate rate limits. No delete capability.
  • Social Media Manager: Read timelines and mentions, post tweets/updates with daily limits. Supports Twitter/X, Reddit, and Farcaster with per-platform rate controls.
  • Code Assistant: Read repositories, create issues and PRs, run sandboxed code.
  • Customer Support Bot: Read and reply to messages on configured channels (Telegram, Slack, Discord, WhatsApp).
  • Triage Agent: Multi-channel message reading with response routing. Heartbeat-enabled for periodic inbox monitoring.
  • Workspace Manager: Full Notion and ClickUp access for project management automation — page/database CRUD, task tracking, and comment threads with workspace-scoped rate limits.
  • Trading Bot: Polymarket and Hyperliquid DEX operations with per-market position limits and confirmation requirements on all order placement.

Templates provide starting points that users can customize through the natural language interface.


9. Cryptographic Audit Trail

9.1 Receipt Structure

Every action processed by the credential broker produces a Receipt — an immutable, cryptographically signed record:

Receipt {
    id:              UUID v4 (unique identifier)
    timestamp:       Unix epoch (action processing time)
    action_type:     Standardized action type (e.g., "email.send")
    service:         Service identifier (e.g., "gmail")
    target:          HMAC-SHA256 pseudonym (not raw PII)
    status:          success | failed | denied | rate_limited | pending_confirmation
    prev_hash:       SHA-256 hash of the previous receipt (chain link)
    policy_version:  Version of the policy that was applied
    metadata:        Structured context (no sensitive data)
    signature:       Ed25519 signature over the receipt hash
}

9.2 Chain Integrity

Receipts form a hash chain analogous to a blockchain's block linkage:

Receipt 0
prev: "genesis"
sig: σ
chain◀──
Receipt 1
prev: H(R0)
sig: σ
chain◀──
Receipt 2
prev: H(R1)
sig: σ
chain◀──
Receipt 3
prev: H(R2)
sig: σ

Each receipt includes SHA-256(predecessor). Tampering requires recomputing all subsequent hashes and forging Ed25519 signatures.

Fig. 4 — Hash-chained receipt structure with Ed25519 signatures

Each receipt includes the SHA-256 hash of its predecessor. To tamper with receipt n, an attacker would need to recompute hashes for all subsequent receipts and forge Ed25519 signatures for each — computationally infeasible without the signing key, which exists only in enclave memory.

Chain integrity can be verified by any party with access to the receipt chain and the enclave's public key:

  1. Verify that Receipt[0].prev_hash == "genesis".
  2. For each subsequent receipt, verify that Receipt[i].prev_hash == SHA256(Receipt[i-1]).
  3. For each receipt, verify the Ed25519 signature against the enclave's public key.

9.3 Merkle Tree Aggregation

For efficient verification of large receipt histories, the system periodically computes Merkle roots over batches of receipts. The Merkle tree implementation supports:

  • Root computation: Binary tree construction with leaf duplication for odd-count levels.
  • Inclusion proofs: Given a receipt and a Merkle root, generate and verify a logarithmic-size proof that the receipt is included in the batch.
  • Chain integrity verification: Verify that a sequence of receipt hashes maintains proper prev_hash linkage back to the genesis hash.

Merkle roots are checkpointed when the in-memory receipt buffer reaches capacity (default 10,000 receipts). Checkpoints are persisted to GCP Cloud Storage with a retention policy and uniform bucket-level access. The enclave's service account is granted only the objectCreator role on the audit bucket — it can append new objects but cannot modify or delete existing ones, providing append-only semantics at the IAM layer.

9.4 Target Pseudonymization

Receipt targets (email addresses, phone numbers, chat IDs) constitute personally identifiable information. To protect PII in audit logs while preserving audit utility, targets are pseudonymized using HMAC-SHA256:

pseudonym = "target:" + HMAC-SHA256(secret, target)[:16]

The HMAC secret is loaded from KMS during enclave startup, ensuring consistent pseudonymization across enclave restarts. A party with knowledge of both the HMAC secret and the original target can verify a match, but the pseudonym cannot be reversed to recover the target without the secret.

This design allows audit queries of the form "show me all actions involving user@example.com" (by computing the pseudonym and matching) while preventing the audit log itself from being a source of PII exposure.


10. Perception Attack Defenses

10.1 The Perception Problem

Autonomous agents mediate between users and external services. This creates a "perception gap" — the user's understanding of what the agent did is based on the agent's self-report, not on direct observation. A compromised or manipulated agent can exploit this gap:

  • Omission: The agent performs an action but does not report it.
  • Fabrication: The agent reports performing an action that it did not.
  • Misrepresentation: The agent performs action A but reports performing action B.

10.2 Defense Layers

Carapace implements four layers of perception defense:

Layer 1: Signed receipts (implemented). Every action produces an Ed25519-signed receipt regardless of success or failure. The agent cannot suppress receipt generation because receipts are generated by the broker, not the agent. The agent cannot forge receipts because it does not have access to the signing key.

Layer 2: Out-of-band digest (planned). A periodic email digest sent directly to the user's personal email (not through the agent) summarizing all actions performed since the last digest. The digest is generated by the orchestrator from the receipt chain, bypassing the agent entirely.

Layer 3: Merkle-chained audit log (implemented). The hash chain makes it evident if any receipt is modified, deleted, or reordered. WORM storage ensures the persistence layer cannot be tampered with, even by Carapace operators.

Layer 4: Verification challenges (planned). Random verification codes sent simultaneously to the user and the agent. The user can ask the agent what code it received, and any discrepancy indicates interception or manipulation of the communication channel.

10.3 Why Broker-Level Enforcement Matters

A critical design decision is that policy enforcement happens before execution, not after. The broker does not observe agent behavior and then report violations — it prevents unauthorized actions from being attempted in the first place. The agent never holds credentials directly, so it cannot bypass the broker to make unauthorized API calls.

This is in contrast to "guardrail" approaches that observe agent output and flag violations after the fact — by which point the damage (sent email, posted tweet, leaked data) may already be done.

10.4 End-to-End Encrypted User-Agent Communication

The dashboard's in-app chat interface presents a separate perception channel — if an attacker compromises the server relay, they could observe or modify messages between user and agent. Carapace mitigates this by encrypting all chat messages end-to-end using AES-256-GCM via the Web Crypto API, ensuring the server never sees plaintext message content.

Key management. A 256-bit AES-GCM key is generated in the user's browser and stored in IndexedDB as a JWK (JSON Web Key). The key never leaves the browser — the server stores and relays only encrypted blobs. If no key exists when the user opens the chat, one is automatically generated.

Encryption format. Each message is encrypted with a fresh 12-byte random IV. The encrypted payload is base64-encoded as IV (12 bytes) || ciphertext (includes GCM auth tag).

Key portability. Users can export their chat encryption key as a passphrase-protected backup using PBKDF2 (SHA-256, 600,000 iterations) for key derivation. The export format is salt (16 bytes) || IV (12 bytes) || wrapped key bytes. This enables key recovery on a new device without requiring server-side key escrow.

Transport. Real-time message delivery uses Redis pub/sub as a relay layer. The server acts as an encrypted message broker — it routes ciphertext between participants but cannot decrypt it. This ensures that even an adversary with full server access (A3: Platform insider) cannot read or tamper with user-agent communications without detection.


11. Process Isolation and Sandboxing

11.1 Multi-Tenant CVM Pool Architecture

A single Confidential VM hosts multiple agent sandboxes managed by a supervisor process. When a CVM boots without a specific AGENT_ID environment variable, it enters pool mode — the supervisor starts a heartbeat loop (5-second interval) that polls the orchestrator for agent assignments. The orchestrator responds with load_agents and unload_agents directives, enabling pull-based lifecycle management where the CVM pulls its workload rather than being told to start specific processes.

The SandboxManager orchestrates up to 28 concurrent sandboxes per CVM. This limit is memory-derived: the production CVM instance type (n2d-standard-4) provides 16 GB of RAM; reserving 2 GB for the supervisor, broker, and OS leaves 14 GB for sandboxes at 512 MB each (14,336 / 512 = 28). CPU is not the binding constraint — the 25% per-sandbox fair-share limit allows concurrent execution of 4 sandboxes with the remainder hibernated, which matches typical agent duty cycles. Each sandbox tracks state through a lifecycle: CREATING → RUNNING → HIBERNATED → TERMINATED (with FAILED as a terminal error state). Key properties:

  • Per-agent auth tokens. Each sandbox receives an HMAC-SHA256-derived authentication token, computed from a master secret and the agent ID. This token scopes all API calls from that sandbox to the specific agent, preventing cross-agent impersonation within a shared CVM.
  • Auto-hibernation. Sandboxes idle for longer than 30 minutes (configurable) are automatically hibernated — the process is suspended, freeing CPU while retaining memory state. Hibernated sandboxes that exceed a maximum lifetime (default 7 days) are terminated and cleaned up.
  • Health monitoring. A background monitor loop (60-second interval) checks all running sandboxes for resource violations, zombie processes, and unexpected terminations. Ghost sandboxes — processes that outlive their expected lifecycle — are detected and recovered.
  • Port isolation. Each sandbox is assigned a unique port from a managed range, preventing inter-sandbox communication on shared network interfaces.

11.2 Defense in Depth

Even within the hardware-isolated enclave, agent processes are further sandboxed using Linux kernel isolation mechanisms. This provides defense in depth: if an agent finds a vulnerability in the Python runtime or a dependency, the sandbox limits the blast radius.

11.3 Namespace Isolation

Each agent process runs in its own set of Linux namespaces, created via direct os.unshare() syscalls (bypassing subprocess overhead):

  • PID namespace: The agent can only see its own process tree.
  • Network namespace: Network access is mediated through the broker (no direct socket access to credential-holding services). In pool mode, sandboxes may share the host network namespace with port-level isolation.
  • Mount namespace: The agent sees a minimal filesystem with no access to the host's /proc, /sys, or credential storage directories.
  • User namespace: The agent runs as an unprivileged user with no capability to escalate.

11.4 Resource Limits (cgroups)

Each agent sandbox has cgroup-enforced resource limits:

  • Memory: Default 512 MB cap to prevent exhaustion of enclave memory (which holds credentials for all agents on the VM).
  • CPU: Fair-share scheduling at 25% per sandbox to prevent a single agent from monopolizing compute.
  • PIDs: Maximum process count to prevent fork bombs.

11.5 Syscall Filtering (seccomp)

A strict seccomp-BPF profile restricts the system calls available to agent processes. The default-deny profile allows only the syscalls necessary for Python runtime operation and broker-mediated network access:

Allowed categories:

  • Basic I/O (read, write, close, openat)
  • Memory management (mmap, mprotect, munmap, brk)
  • Thread operations (clone, futex, exit_group)
  • Time (clock_gettime, nanosleep)
  • Network (broker-mediated: socket, connect, sendto, recvfrom)
  • Event notification (epoll_create1, epoll_ctl, epoll_wait)

Explicitly blocked (KILL action):

  • Process debugging (ptrace, process_vm_readv, process_vm_writev)
  • Filesystem manipulation (mount, umount, pivot_root, chroot)
  • Namespace manipulation (setns, unshare)
  • Kernel module operations (init_module, finit_module, delete_module)
  • System control (reboot, kexec_load)
  • Direct I/O (iopl, ioperm)
  • BPF manipulation (bpf)

Any syscall not in the allowlist causes immediate process termination (SCMP_ACT_KILL). A permissive profile that logs instead of killing is available for development and debugging.


12. Agent Adapter Framework

12.1 Design Philosophy

Carapace separates the concerns of agent intelligence (the LLM, prompting, reasoning) from agent security (credential management, policy enforcement, auditing). The platform ships CarapaceBot — its own Claude-powered agent built from the ground up for the brokered security model — while providing a framework-agnostic adapter interface that can host any agent conforming to the AgentProtocol. Future releases will support wrapping third-party agents (including community tools like ClawdBot) behind the same security pipeline.

12.2 Adapter Architecture

Agent Process
LLM Engine
Adapter
Broker Client
IPC
Credential Broker
Quota
Policy
Rate Limit
Anomaly
Execute
Credentials in SecureDict — never exposed to agent process
Fig. 5 — Agent adapter architecture with credential isolation

The adapter translates agent-specific tool calls into the broker's standardized Action format:

@dataclass
class Action:
    type: ActionType      # Standardized enum (email.send, message.send, etc.)
    service: str          # Target service (gmail, telegram, etc.)
    target: str           # Recipient/endpoint (plaintext at this stage)
    payload: dict         # Action-specific parameters

Note that the target field carries the plaintext recipient through the broker pipeline for execution purposes. Pseudonymization (Section 9.4) occurs at receipt generation time — the broker applies HMAC-SHA256 to the target before writing it into the signed receipt. The plaintext target never appears in the audit trail.

12.3 Supported Agent Types

CarapaceBot (primary agent). Carapace's own agent implementation, designed from the ground up for the brokered security model. CarapaceBot implements the AgentProtocol interface and routes all tool calls through the credential broker without ever holding credential material. It supports interchangeable LLM backends (Anthropic Claude, OpenAI GPT) with identical security guarantees — the choice of model affects capability and cost, not the security posture. The agent operates across 8 communication channels (Telegram, Slack, Discord, WhatsApp, Signal, iMessage, WebSocket, REST) with 125+ tool definitions spanning 18 integration categories.

From a security architecture perspective, the relevant properties are:

  • Dynamic tool resolution with security invariants. A three-tier SkillRegistry (Bundled → Managed → Workspace) resolves tool names to ActionType/service pairs at runtime. When a higher-tier override changes a skill's action type or target service, the system emits a security warning — this prevents a workspace-level skill from silently rerouting email.send to an unmonitored service, which would bypass the user's policy expectations.
  • Encrypted session persistence. Conversation history is encrypted at rest using AES-256-GCM with per-session keys derived via HMAC-SHA256 from a master key and a scoped identifier (user, cron task, webhook, or sub-agent). Sessions are stored as encrypted JSONL on the CVM filesystem with TTL-based expiration. This prevents conversation history — which may contain sensitive user instructions — from being readable in the event of a CVM disk snapshot or forensic image.
  • Bounded sub-agent spawning. CarapaceBot can spawn up to 4 concurrent sub-agents as isolated asyncio tasks with timeout enforcement. Sub-agents inherit the parent's broker context but operate with an explicit tool exclusion list — critically, spawn_subagent, check_subagent, and cancel_subagent are always excluded, preventing recursive nesting. This bounds the resource consumption and blast radius of any single agent invocation.
  • Webhook authentication. External systems trigger agent execution via HMAC-SHA256-signed payloads. The broker verifies the signature before processing, preventing unauthenticated external parties from driving agent actions.
  • Scheduling with temporal constraints. Autonomous execution (interval, cron, one-shot) enforces active-hours restrictions and day-of-week filtering. This limits the temporal attack surface — an agent compromised at 3 AM cannot act until business hours, giving monitoring systems time to detect anomalies.
  • Runtime security patches. At initialization, before agent code executes, the adapter applies Python-level patches that block environment variable access to sensitive keys, filesystem writes to restricted paths, and subprocess/shell execution.

Any agent framework implementing the AgentProtocol interface — a single async process() method that yields responses given a message and a tool executor callback — can be substituted for CarapaceBot's LLM engine. This enables integration with LangChain, CrewAI, AutoGen, or custom agent implementations while retaining the full security pipeline. A complete reference of CarapaceBot's feature set (channel details, scheduling modes, LLM backends, cost optimization) is available in the CarapaceBot Technical Reference.

MCP (Model Context Protocol) adapter. Implements the MCP protocol for agents that use standardized tool definitions. The adapter registers available tools (filtered by the user's policy), handles tool call requests by routing them through the broker, and returns results in MCP format.

Channel adapters. For agents that operate across communication channels (Telegram, Slack, Discord, WhatsApp, Signal, iMessage, web), channel adapters handle message routing, format translation, and platform-specific API interactions — all mediated through the broker. Each channel enforces its own security constraints: WhatsApp and iMessage use JID/address allowlists, Signal leverages end-to-end encryption natively, and Slack uses Socket Mode for firewall-friendly bidirectional communication.

Third-party agent wrapping (planned). A future capability to wrap existing agent frameworks (including community tools like ClawdBot) behind the Carapace security pipeline, applying the same credential isolation, policy enforcement, and audit trail to agents that were not originally designed for brokered execution.

12.4 Security Patches

Agent adapters apply security patches at the Python runtime level:

  • Environment blocking: os.environ access is intercepted for sensitive key patterns (API keys, tokens, secrets). Attempts to read these values return empty strings and are logged.
  • Filesystem restrictions: Write operations to credential directories, /tmp, and other sensitive paths are intercepted and denied.
  • Shell blocking: subprocess.run, os.system, os.popen, and related functions are disabled. Code execution, where permitted by policy, goes through the broker's sandboxed execution environment.

13. Infrastructure and Network Security

13.1 Network Topology

The infrastructure is deployed on Google Cloud Platform as a custom VPC with strict subnet segmentation and firewall rules:

VPC10.0.0.0/16
Orchestrator Subnet(10.0.10.0/24)
  • Cloud Run v2 (FastAPI)
  • Private access to DB + enclave
  • KMS encrypt/decrypt SA
  • HTTPS ingress (JWT-auth)
Database Subnet(10.0.20.0/24)
  • Cloud SQL PostgreSQL (private IP)
  • Memorystore Redis (private access)
Confidential Subnet(10.0.30.0/24)
  • Confidential Space MIG (AMD SEV-SNP)
  • NO external IP addresses
  • Cloud NAT for outbound HTTPS only
  • Firewall: deny all except orchestrator
VPC Connector (10.0.100.0/28) — Serverless VPC Access for Cloud Run
Fig. 6 — VPC network topology with subnet segmentation

13.2 Key Security Properties

  • No public IPs on enclaves. Confidential Space VMs have no external IP addresses. Inbound traffic arrives only from the orchestrator subnet on port 8443 (TLS). Cloud NAT provides outbound connectivity restricted to HTTPS (443) and Cloudflare Tunnel (7844) for external API access.

  • Private endpoints for data stores. Cloud SQL PostgreSQL uses private IP via VPC peering; Memorystore Redis uses private service access. Neither service has a public endpoint. SSL/TLS is enforced on all database connections.

  • GCP service accounts with least privilege. The orchestrator service account has KMS encrypt/decrypt and Compute Instance Admin roles. The CVM service account has KMS decrypt only (attestation-gated via IAM conditions), Cloud Storage object creator (append-only for audit logs), and logging writer. No service account has broader permissions than its function requires.

  • Egress restriction via private DNS. The confidential subnet overrides DNS for run.app to route through GCP's restricted VIPs (199.36.153.4/30), preventing CVMs from reaching arbitrary internet endpoints. Only Google APIs and explicitly allowlisted external services are reachable.

  • Append-only audit storage. Audit logs (receipt chains, Merkle roots) are stored in GCP Cloud Storage with a retention policy and versioning enabled. The CVM service account has only the objectCreator role — it can write new objects but cannot modify or delete existing ones.

  • Secrets management. Operational secrets (database passwords, JWT signing keys, OAuth client secrets) are stored in GCP Secret Manager with automatic replication. These secrets are accessible only to the orchestrator service account — the CVM never accesses Secret Manager, receiving all credential material exclusively through the KMS-sealed envelope encryption flow.

13.3 Infrastructure as Code

All infrastructure is defined in Pulumi (Python), providing:

  • Reproducible deployments. The exact same infrastructure can be deployed across development, staging, and production environments.
  • Drift detection. Any manual modifications to infrastructure are detected and can be reverted.
  • Review-gated changes. Infrastructure changes go through the same pull request review process as application code.

Multi-stack configuration supports different scale points:

PropertyDevelopmentProduction
CVM instances1 (n2d-standard-4)3+ (n2d-standard-4, auto-scaling MIG)
PostgreSQLdb-f1-micro, 10 GBdb-custom-4-16384, 128 GB, HA (regional)
RedisBasic, 1 GBStandard HA, 4 GB, transit encryption
Cloud Run1–3 instances2–10 instances
MonitoringCloud LoggingCloud Monitoring + alert policies

Container images for both the orchestrator and enclave are stored in GCP Artifact Registry, with the enclave image digest pinned in the Pulumi configuration and referenced in IAM conditions to ensure attestation-gated key release matches the exact deployed image.


14. Future Directions

14.1 Multi-Cloud Confidential Computing

While the current implementation targets GCP Confidential Space (AMD SEV-SNP), the architecture is designed for portability. The enclave application code supports multiple attestation backends, and the credential broker is cloud-agnostic. Future work includes:

  • Azure Confidential VM support. Azure's Secure Key Release (SKR) and Azure Attestation Service (MAA) provide an alternative attestation path for customers with Azure-centric infrastructure. The enclave already includes an AMD SEV-SNP report parser that interfaces directly with /dev/sev-guest for non-GCP environments.
  • Intel TDX support. As Intel Trust Domain Extensions mature in cloud environments, Carapace will add TDX as an alternative TEE backend.
  • On-premises deployment. Enterprise customers with their own AMD EPYC hardware can run Carapace enclaves on-premises, with attestation verification performed by a Carapace-hosted or self-hosted attestation service.
  • Multi-cloud redundancy. Agents can fail over between cloud providers while maintaining the same credential isolation guarantees, since the trust root is in silicon (AMD/Intel), not in any cloud provider's software.

14.2 Attestation Registry

A public, append-only registry of enclave measurements and attestation reports:

  • Users can verify that the enclave running their agent matches a published, audited code measurement.
  • Third-party auditors can independently verify the integrity of Carapace deployments.
  • Historical measurements are preserved, enabling retrospective verification of what code was running at any point in time.

The registry could leverage a transparency log (similar to Certificate Transparency) or a blockchain-based anchoring mechanism for maximum decentralization of trust.

14.3 Progressive Trust Model

As agents build behavioral history, the system can progressively relax confirmation requirements:

  • Cold start: All novel actions require confirmation. First-contact delays for new recipients.
  • Warm: After N successful actions to a target without incident, reduce confirmation frequency.
  • Hot: Established behavioral patterns within policy bounds proceed without confirmation.
  • Anomaly reset: Any anomaly detection event resets the trust level for the affected service/target pair.

This reduces friction for well-behaved agents operating within established patterns while maintaining vigilance for novel or suspicious behavior.

14.4 Agent Marketplace

A curated registry of pre-configured agent templates with:

  • Published policy templates reviewed for security.
  • Pre-measured enclave images with known attestation measurements.
  • Community ratings and security audit results.
  • One-click deployment from template to running agent.

14.5 Compliance and Certification

  • SOC 2 Type II: Audit controls for the credential lifecycle, access logging, and change management.
  • HIPAA: For healthcare agents handling protected health information, with additional controls on data residency and access logging.
  • ISO 27001: Information security management system certification for the platform.

14.6 Advanced Perception Defenses

  • Verification challenges: Random codes sent to user and agent simultaneously. Discrepancy detection indicates communication channel compromise.
  • Cross-agent correlation: Detecting coordinated anomalous behavior across multiple agents that might indicate a systemic attack.
  • LLM-based action summarization: Using a separate, isolated LLM to generate human-readable summaries of agent activity from the receipt chain, providing an independent "second opinion" on what the agent did.

15. Conclusion

The security challenges of autonomous AI agents are not fundamentally different from those of any privileged software system — they are, however, more acute because of the breadth of access these agents require and the difficulty of predicting the behavior of LLM-driven systems. Traditional approaches to agent security (environment variables, file-based secrets, trust-the-operator models) are structurally inadequate for systems that operate autonomously with access to sensitive user credentials.

Carapace demonstrates that hardware-rooted security, brokered credential access, and cryptographic auditing can be composed into a practical platform that provides meaningful security guarantees without requiring users to become security engineers. By isolating credentials in AMD SEV-SNP enclaves, mediating all actions through a policy-enforcing broker, and generating Ed25519-signed Merkle-chained audit receipts, the platform achieves properties that are impossible in software-only architectures:

  • Credentials are hardware-isolated from the platform operator, the cloud operator, and the agent itself — with managed-language caveats documented in Section 6.3.
  • Policy enforcement is pre-execution, not post-hoc — unauthorized actions are prevented, not merely detected.
  • The audit trail is tamper-evident at the cryptographic level, not merely access-controlled at the storage level.
  • Behavioral anomaly detection provides adaptive defense against novel agent behaviors, with acknowledged limitations in the current heuristic model (Section 7.2).

The platform's current implementation demonstrates the viability of this approach at production scale: multi-tenant CVM pools with per-agent namespace isolation, a seven-stage broker pipeline with quota enforcement and anomaly detection, encrypted session persistence that protects conversation history from disk-level extraction, and end-to-end encrypted in-app communication where the server sees only ciphertext. Known limitations — particularly around Python's memory model for credential zeroing (Section 6.3) and the heuristic nature of the anomaly detection weights (Section 7.2) — are documented candidly and have concrete mitigation paths. The remaining engineering work is execution against a proven architecture. The security primitives are in place and tested. The infrastructure is defined and deployable.

Autonomous AI agents are becoming a permanent feature of how people interact with digital services. The question is not whether to deploy them, but how to deploy them safely. Carapace provides an answer rooted not in promises, but in silicon.


Appendix A: Cryptographic Primitives

PrimitiveAlgorithmKey SizePurpose
Receipt signingEd25519256-bitSigning action receipts
Credential encryptionAES-256-GCM256-bitEncrypting credential blobs at rest
Receipt hashingSHA-256Hash chain and Merkle tree construction
Launch measurementSHA-384Enclave code measurement (AMD specification)
Attestation signatureECDSA P-384384-bitAMD Secure Processor report signing
Target pseudonymizationHMAC-SHA256256-bitPII protection in audit logs
Per-agent token derivationHMAC-SHA256256-bitDeriving per-agent auth tokens from master secret
Session key derivationHMAC-SHA256256-bitDeriving per-session encryption keys from master key
Session encryptionAES-256-GCM256-bitEncrypting conversation history on CVM disk
Config-update signingHMAC-SHA256256-bitAuthenticating configuration changes from orchestrator
Webhook verificationHMAC-SHA256256-bitVerifying webhook payload authenticity
Email hashingHMAC-SHA256256-bitPseudonymizing user email addresses in logs
Chat encryption (dashboard)AES-256-GCM256-bitEnd-to-end encrypted in-app chat (Web Crypto API)
Chat key wrappingPBKDF2 (SHA-256)Passphrase-based key export (600,000 iterations)
Key wrappingGCP Cloud KMS (HSM)AES-256Wrapping key release conditioned on attestation via WIF
Attestation tokenGCP OIDC (Confidential Space)Hardware claims embedded in signed JWT for WIF exchange

Appendix B: Action Type Taxonomy (125 Types, 22 Subcategories)

CategoryAction Types
Email (4)email.read, email.send, email.delete, email.draft
Messaging (3)message.read, message.send, message.delete
Slack (4)slack.add_reaction, slack.get_thread, slack.create_list_item, slack.update_list_item
Calendar (4)calendar.read, calendar.create, calendar.update, calendar.delete
Files (5)file.read, file.write, file.delete, file.edit, file.list
Web (2)web.fetch, web.search
HTTP (1)http.request
Twitter/X (17)twitter.tweet, twitter.read_timeline, twitter.send_dm, twitter.read_dms, twitter.like, twitter.retweet, twitter.follow, twitter.unfollow, twitter.search, twitter.read_mentions, twitter.read_profile, twitter.read_followers, twitter.read_following, twitter.read_lists, twitter.bookmark, twitter.quote_tweet, twitter.delete_tweet
Reddit (2)reddit.post, reddit.comment
GitHub (2)github.create_issue, github.create_pr
Farcaster (11)farcaster.post_cast, farcaster.read_feed, farcaster.read_notifications, farcaster.like_cast, farcaster.recast, farcaster.reply, farcaster.follow, farcaster.unfollow, farcaster.search_casts, farcaster.read_profile, farcaster.read_channel
Notion (22)notion.read_page, notion.create_page, notion.update_page, notion.archive_page, notion.read_database, notion.create_database, notion.query_database, notion.update_database, notion.create_database_item, notion.read_block, notion.append_block, notion.update_block, notion.delete_block, notion.read_block_children, notion.search, notion.read_user, notion.list_users, notion.create_comment, notion.read_comments, notion.read_page_property, notion.update_database_item, notion.list_databases
Obsidian (7)obsidian.read_note, obsidian.write_note, obsidian.list_notes, obsidian.search_notes, obsidian.create_folder, obsidian.delete_note, obsidian.read_structure
ClickUp (7)clickup.get_spaces, clickup.get_lists, clickup.get_tasks, clickup.create_task, clickup.update_task, clickup.create_comment, clickup.get_members
Trading (9)polymarket.get_markets, polymarket.get_positions, polymarket.place_order, polymarket.cancel_order, polymarket.get_balances, hyperliquid.get_markets, hyperliquid.get_positions, hyperliquid.place_order, hyperliquid.cancel_order
Code (4)code.exec, code.process, code.eval, code.task
Memory (3)memory.read, memory.write, memory.search
Heartbeat (3)heartbeat.save, heartbeat.delete, heartbeat.list
Sub-agents (3)subagent.spawn, subagent.check, subagent.cancel
Auth (3)auth.header, auth.oauth.initiate, auth.oauth.callback
MCP (1)mcp.tool
Credentials (1)credential.store

Appendix C: Seccomp Profile Summary

Syscall CategoryCountPolicy
File I/O16ALLOW
Memory management6ALLOW
Signals4ALLOW
Process basics8ALLOW
Thread operations5ALLOW
Time5ALLOW
Network (broker-mediated)14ALLOW
Event notification7ALLOW
Process debugging3KILL
Filesystem manipulation5KILL
Namespace manipulation2KILL
Kernel modules3KILL
System control3KILL
All unlisted syscallsKILL (default)

Copyright 2026 Carapace. All rights reserved.