AISOC 2EU AI ActGDPRCompliance

Audit Logging for AI Applications: SOC 2, GDPR, and EU AI Act Compliance

AuditKit Team9 min read

Why AI Applications Need Different Audit Logs

If your SaaS app uses an LLM — Claude, GPT-4, Gemini, an open-source model on your own infrastructure — your audit logging requirements just got more complicated. The "who did what to which resource" model that satisfies SOC 2 for traditional CRUD apps does not capture what auditors, regulators, and enterprise procurement teams now want to see for AI features.

Three regulatory and contractual pressures are converging in 2026:

  • EU AI Act (effective August 2026) — Article 12 mandates "automatic recording of events" for high-risk AI systems, with specific retention requirements. The definition of "high-risk" includes a broader range of SaaS use cases than founders typically assume.
  • SOC 2 auditor expectations shifted in 2025 — auditors now expect to see prompt/response logging, model version tracking, and prompt injection detection as part of CC7.2 (system monitoring). Not yet codified in the TSC, but consistently flagged in audit reports.
  • Enterprise procurement security questionnaires — the 2026 vendor security questionnaires from large enterprises (Fortune 500, financial services, healthcare) now include 15-30 specific questions about AI logging that did not exist in 2024 questionnaires.

The good news: the underlying infrastructure (immutable, tenant-scoped, queryable audit logs) is the same one you would build for traditional SOC 2 compliance. What changes is the schema and the events you capture.

What to Log for Every AI Inference

For every call to an LLM (or any AI model that affects user-facing behavior), capture these fields:

  • actor — the user or system that triggered the inference (same as traditional audit logs)
  • tenant_id — for multi-tenant SaaS, scope the log to the customer org
  • model_id — exact model identifier (e.g., claude-opus-4-1-20250805, gpt-4o-2024-08-06) — not just "Claude" or "GPT-4"
  • model_provider — Anthropic, OpenAI, AWS Bedrock, self-hosted, etc.
  • system_prompt_hash — SHA-256 of the system prompt used. Do NOT store the full system prompt in every log entry (storage explosion); store the hash and link to a versioned system_prompt table.
  • user_input_hash — SHA-256 of the user's input. For high-risk applications, also store the full input (with PII redaction).
  • output_hash — SHA-256 of the model's output
  • tokens_in, tokens_out — usage metrics for cost attribution and abuse detection
  • safety_filter_triggered — boolean for whether the provider's safety filters fired
  • prompt_injection_score — your classifier's score for suspected injection (or null if not run)
  • output_filter_action — what your post-processing layer did to the output (passed, redacted, refused)
  • latency_ms — performance metric
  • occurred_at — timestamp

The hashes are the operational compromise that makes this scale. You can verify what was said without storing every prompt verbatim. For the small subset of inferences flagged as suspicious (high prompt-injection score, safety filter fired, or sampled for QA), store the full content with PII redaction.

EU AI Act Article 12 Requirements (the Specifics)

The EU AI Act takes effect in stages, with most high-risk system requirements active by August 2026. Article 12 specifically requires "automatic recording of events" for high-risk AI systems. The text is general enough that compliance is interpretive, but the European Commission's draft guidance from Q4 2025 clarifies the expectation:

  • Recording during the entire lifecycle — from training (or fine-tuning) through every inference
  • Tamper-evident records — hash-chained or equivalent integrity protection
  • Retention period — 6 months minimum for inference logs; longer for training event logs
  • Identifiable to natural persons — for inferences that affect a specific person, the log must support tracing back to that person (with appropriate access controls under GDPR)
  • Accessible to authorities — when requested under Article 12(3), the operator must be able to provide logs within a defined timeframe (likely 30 days for non-urgent requests)

"High-risk" under Annex III covers a broader range of SaaS use cases than founders typically realize. Examples that fall into high-risk territory include AI features that affect employment decisions (resume screening, performance evaluation), creditworthiness, educational outcomes, access to essential services, or law enforcement. If your product touches any of these areas, Article 12 applies.

SOC 2 + AI: The Three Controls That Change

SOC 2 itself has not added an "AI" control yet. But three existing Trust Services Criteria are now interpreted more strictly when your system uses AI:

  • CC6.1 (Logical Access) — must extend to AI APIs. Document who can call your LLM provider APIs and through what intermediaries. Service-account credentials for OpenAI/Anthropic need the same access control as your database credentials.
  • CC7.2 (System Monitoring) — auditors now expect monitoring of model performance, prompt injection attempts, and output anomalies as part of "monitoring system performance and security events." Generic application-level logs are no longer sufficient.
  • CC7.3 (Anomaly Detection) — your audit log queries need to surface AI-specific anomalies: a 10x spike in safety_filter_triggered events, unusual prompt-injection scores from a single tenant, large outputs that bypass post-filters. The log infrastructure has to support these queries.

GDPR Considerations for AI Audit Logs

AI audit logs intersect with GDPR in two places:

  1. The log itself contains personal data — actor identifiers are usually personal data. If you store user inputs (or output content that references the user), that's personal data under Article 4. Standard GDPR principles apply: lawful basis (legitimate interest or contract), purpose limitation, retention limits, and subject access rights.
  2. The log enables the right to explanation — Article 22 of GDPR grants users the right not to be subject to "solely automated" decisions. When users request information about an automated decision, your audit log is what surfaces the relevant inference: the model, the prompt, the output, the time. Without this log, you cannot respond to Article 22 requests.

Retention recommendation: 12 months for AI inference logs (sufficient for most GDPR access requests and SOC 2 evidence), with PII fields hashed by month 3 unless flagged for active investigation.

How AuditKit Handles AI Logging

AuditKit's schema supports AI-specific fields out of the box. A typical AI inference log call looks like:

auditkit.log({
  actor: 'user_abc123',
  action: 'ai.inference',
  resource: 'support_response',
  tenant_id: 'tenant_xyz',
  metadata: {
    model_id: 'claude-opus-4-1-20250805',
    model_provider: 'anthropic',
    system_prompt_hash: 'sha256:abc...',
    user_input_hash: 'sha256:def...',
    output_hash: 'sha256:ghi...',
    tokens_in: 1240,
    tokens_out: 380,
    safety_filter_triggered: false,
    prompt_injection_score: 0.02,
    output_filter_action: 'passed',
    latency_ms: 1840
  }
});

Every event is SHA-256 hash-chained on write, so any tampering with historical logs breaks the chain and is detectable. Tenant-scoped queries let you respond to Article 22 GDPR requests (per-user inference history) without exposing other tenants' logs.

For high-risk systems under the EU AI Act, AuditKit's retention policies can be set per-event-type — 6 months minimum on ai.inference events, 24 months on ai.model_change events, indefinite on ai.training_run events.

The Common Mistakes

  • Logging the full prompt in every inference event — storage explosion. Use hashes and a versioned prompt table.
  • No model version tracking — "we use Claude" is not sufficient. Auditors need to know which exact model version made which inference. Versions change behavior; logs must too.
  • Treating user input and model output as opaque blobs — they're personal data when they reference identifiable individuals. GDPR Article 5 (data minimization) applies.
  • Not logging safety filter triggers — when the LLM provider's safety filter fires, that's a meaningful security event. Auditors will ask for the rate of these events; you should know.
  • Forgetting training-time events — if you fine-tune a model on customer data, that training event is loggable and may be subject to longer retention than inference events.

The Bottom Line

Your existing audit log infrastructure (the one you built for SOC 2) is the foundation. AI applications add a schema layer on top — model identifiers, prompt/output hashes, safety filter flags, prompt injection scores — and a retention layer that matches the regulatory environment (EU AI Act for high-risk systems, GDPR for personal data, SOC 2 for security operations).

The teams that get this right in 2026 will close enterprise deals 30-60 days faster than teams who treat AI logging as a future problem. Vendor security questionnaires are already asking about it. SOC 2 auditors are already flagging it. The EU AI Act enforces it by August.

AuditKit is open-source (AGPLv3) and self-hostable, or use the cloud tier starting at $99/mo. Either way, the AI logging schema is built in.

Ready to ship audit logging?

AuditKit gives you tamper-evident audit trails and SOC 2 evidence collection in one platform. Start free, or skip the trial below.

Related Articles