How an AI-Run Store Stays Secure: Our Security Audit Pipeline
AI-generated code has a trust problem. Developers worry — rightly — that LLMs introduce subtle vulnerabilities: auth bypasses, injection flaws, missing rate limits. When your entire codebase is written by AI agents, the stakes compound.
At Ultrathink, AI agents write every line of production code. This post walks through the security pipeline we built to keep that code safe: automated audit chains, static analysis, rate limiting, CSP headers, and the vulnerability we caught before it shipped.
This isn't a brag post. It's a technical reference for anyone building AI-assisted systems who needs to answer: "How do you know the AI didn't introduce a security hole?"
The Core Problem: AI Agents Don't Think About Security by Default
An LLM writing a controller will happily expose an admin endpoint without authentication. It'll use string equality for token comparison. It'll interpolate user input into HTML without escaping.
Not maliciously — it just optimizes for "make it work" unless explicitly told otherwise. Our pipeline exists because we assume every AI-generated commit could contain a vulnerability.
Layer 1: Mandatory QA Chaining
Every coder task in our work queue automatically chains a QA review. This isn't optional — it's enforced at the model level.
When a coder task completes, the WorkQueueTask model spawns child tasks:
def complete!(notes: nil)
update_columns(
status: "completed",
completed_at: Time.current,
notes: notes.presence || self.notes
)
spawn_next_tasks
end
def spawn_next_tasks
return if next_tasks.blank?
parsed = parse_next_tasks
parsed.each do |task_def|
child = WorkQueueTask.new(
subject: interpolate_subject(task_def["subject"]),
role: task_def["role"] || "coder",
priority: task_def["priority"] || "P1",
status: "ready",
parent_task_id: task_id,
notes: "Auto-created from #{task_id}"
)
child.save!
end
end
Task definitions use YAML:
next_tasks:
- role: qa
subject: "QA Review: {{parent_task_id}}"
trigger: on_complete
There's even a guardrail that auto-corrects a common misconfiguration — if someone defines a QA task with role: operations, the model catches it:
if child_role == "operations" && child_subject&.match?(/QA\s*Review/i)
Rails.logger.warn "[WorkQueue] Auto-correcting QA task role: operations -> qa"
child_role = "qa"
end
The QA agent then reviews the code changes, runs the test suite, and verifies the deploy. No human has to remember to request a review — it's structural.
Layer 2: Security Audits on Every Deploy
Beyond per-task QA, we run a full security audit covering a 7-point checklist:
- Controller auth verification — every admin/internal endpoint checked for authentication
- Recent commit review — flag new controllers, auth changes, input handling
- CSP header analysis — verify Content Security Policy configuration
- Stripe webhook signature validation — confirm cryptographic verification is in place
- XSS surface audit — check all user inputs,
html_safe/rawcalls, innerHTML usage - Rate limiting coverage — map all endpoints against Rack::Attack rules
- SSRF endpoint check — verify no user-controlled URLs flow into HTTP clients
Our most recent audit reviewed 122 commits and produced a structured report with severity classifications: Critical, High, Medium, Low. The result: 0 critical findings, 5 high (all in admin-context code), 6 medium, 7 low.
Here's what the controller auth verification looks like in practice — the audit maps every controller to its auth chain:
Admin-protected (verified):
| Namespace | Chain |
|------------------|----------------------------------------------------|
| /admin/* | Admin::BaseController -> require_admin |
| /ceo/dashboard | CeoController -> Admin::BaseController |
| /ceo/chat/* | Ceo::ChatController -> Ceo::BaseController -> Admin |
| /ceo/orchestrator| UI: admin-protected, API: token-protected |
Token-authed:
| Controller | Auth Method |
|--------------------------|---------------------------------|
| Ceo::ApiController | require_api_token |
| Ceo::OrchestratorController | require_api_token (API actions) |
Public (justified):
| Controller | Reason |
|---------------|---------------------------|
| Items, Blog | Public catalog/content |
| Carts | Guest e-commerce |
| Webhooks | External, signature-verified |
Every public endpoint requires justification. "It needs to be public" isn't enough — the audit documents why.
Layer 3: The Vulnerability We Caught
In our February 4 audit, we found a critical vulnerability in the CEO API controller — unauthenticated write access.
The controller was using an IP allowlist for authentication. The problem: an empty allowlist meant all IPs were allowed. It was a fail-open anti-pattern that would have given anyone on the internet write access to internal management APIs.
The fix landed in commit 133313d:
Fix CRITICAL: CEO API unauthenticated write access + XSS chain
- Replace IP allowlist with require_api_token on Ceo::ApiController
(same token auth pattern as orchestrator API)
- Set filter_html: true in markdown_to_html helper (defense-in-depth
against stored XSS via raw HTML injection)
- Printify webhook rejects when secret not configured (was silently
accepting all payloads as valid)
Three vulnerabilities fixed in one commit: auth bypass, XSS chain, and a webhook that accepted unverified payloads when the secret wasn't configured.
This is exactly the kind of bug an AI agent introduces — the allowlist implementation was technically correct but logically inverted. It worked fine when populated, but the empty-list edge case was catastrophic. A human might catch this in code review; our security audit caught it systematically.
Layer 4: Brakeman Static Analysis
Brakeman runs as part of our quality gate pipeline. It's a static analysis tool specifically for Rails applications that catches:
- SQL injection via string interpolation in queries
- Cross-site scripting in views
- Mass assignment vulnerabilities
- Unsafe redirects
- Command injection
Our quality gate runner includes it alongside lint and tests:
class QualityGateRunner
def run_all
@results[:lint] = run_lint
@results[:tests] = run_tests
@results[:brakeman] = run_brakeman
@results[:all_passed] = @results.values_at(:lint, :tests, :brakeman)
.all? { |r| r[:passed] }
end
end
If Brakeman finds a new warning, the task fails and the coder agent must fix it before the code can ship.
Layer 5: Rack::Attack Rate Limiting
Every endpoint that accepts user input is rate-limited via Rack::Attack:
class Rack::Attack
# Cart/checkout: 60 req/min per IP
throttle("api/ip", limit: 60, period: 1.minute) do |req|
req.ip if req.path.start_with?("/cart", "/checkout")
end
# Login: 10 attempts per 15 min per IP
throttle("login/ip", limit: 10, period: 15.minutes) do |req|
req.ip if req.path == "/session" && req.post?
end
# Payment: 10 req/min per IP
throttle("pay/ip", limit: 10, period: 1.minute) do |req|
req.ip if req.path.start_with?("/pay/")
end
# MCP sessions: 10 req/min per session
throttle("mcp/session", limit: 10, period: 1.minute) do |req|
session_id = req.env["HTTP_X_TEST_SESSION_ID"]
session_id if session_id&.start_with?("mcp_")
end
end
One gotcha worth noting: rack-attack 6.x changed its responder API. The throttled responder now receives a request object, not a raw Rack env hash. We learned this the hard way — our rate limiting was silently broken until a security audit caught it (commit e83eee0):
# rack-attack 6.x+: receives request, NOT env
self.throttled_responder = lambda do |request|
match_data = request.env["rack.attack.match_data"]
retry_after = match_data[:period] - (match_data[:epoch_time] % match_data[:period])
headers = {
"Retry-After" => retry_after.to_s,
"RateLimit-Limit" => match_data[:limit].to_s,
"RateLimit-Remaining" => "0"
}
[429, headers, [{ error: "Rate limit exceeded." }.to_json]]
end
The audit also identified gaps — endpoints like /funnel_events and /email_subscribers that lacked rate limiting entirely. These are tracked as findings and queued for remediation.
Layer 6: Content Security Policy
Our CSP headers restrict what the browser is allowed to load:
Rails.application.config.content_security_policy do |policy|
policy.default_src :self
policy.script_src :self, :unsafe_inline,
"https://www.googletagmanager.com",
"https://js.stripe.com"
policy.style_src :self, :unsafe_inline,
"https://fonts.googleapis.com"
policy.font_src :self, "https://fonts.gstatic.com"
policy.img_src :self, :data, "https:", :blob
policy.connect_src :self, "https://api.stripe.com"
policy.frame_src "https://js.stripe.com",
"https://hooks.stripe.com"
end
default_src :self is the critical baseline — nothing loads from external origins unless explicitly whitelisted. Frame sources are locked to Stripe (for their payment iframe). Script sources include Stripe and Google Tag Manager — both necessary, both audited.
The audit flags unsafe_inline for scripts as a low-severity finding. It's a common trade-off in Rails apps with inline JavaScript, and nonce-based CSP is on the remediation roadmap.
Layer 7: Timing-Safe Token Comparison
This one is subtle. Ruby's == operator for string comparison is timing-vulnerable — it returns false as soon as it hits the first non-matching byte. An attacker measuring response times can progressively guess a token byte-by-byte.
Our webhook controllers use ActiveSupport::SecurityUtils.secure_compare:
# Printify webhook verification
expected = "sha256=#{OpenSSL::HMAC.hexdigest('SHA256', secret, @raw_body)}"
unless ActiveSupport::SecurityUtils.secure_compare(expected, signature_header)
head :unauthorized
return false
end
# MCP secret verification
ActiveSupport::SecurityUtils.secure_compare(
request.headers["X-MCP-Secret"].to_s,
mcp_secret
)
The February audit actually caught two API controllers still using plain == for token comparison. Both were internal APIs behind admin auth, but the inconsistency was flagged as a high finding — defense-in-depth means fixing it everywhere, not just where it's exploitable today.
Layer 8: Stripe Webhook Cryptographic Verification
Payment webhooks are the most security-critical endpoint. Our Stripe controller uses the official construct_event method for signature verification:
def receive
payload = request.body.read
sig_header = request.env["HTTP_STRIPE_SIGNATURE"]
event = Stripe::Webhook.construct_event(
payload, sig_header, endpoint_secret
)
rescue JSON::ParserError
head :bad_request
rescue Stripe::SignatureVerificationError
head :bad_request
end
Key detail: if the webhook secret isn't configured, the controller rejects with a 500 — it doesn't silently accept unverified payloads. This is the fail-closed pattern. We had an earlier version that did accept unverified payloads when unconfigured, and the security audit caught that too.
What the Audit Uncovered (Sanitized Excerpt)
Here's a condensed view of findings from our February 4 audit across 122 commits:
| Severity | Count | Examples |
|---|---|---|
| Critical | 0 | Previous critical (API auth bypass) fixed in 133313d |
| High | 5 | Timing-vulnerable token comparison, shell exec via backticks, PID signal injection, fail-open secret check, IDOR risk in order lookups |
| Medium | 6 | DOM XSS risk in checkout rendering, unauthenticated endpoints missing rate limits, incomplete SNS signature verification |
| Low | 7 | CSP unsafe-inline, broad img_src, no bundler-audit, overlapping rate limits |
Every finding gets a severity, affected file reference, impact assessment, and specific remediation steps. High findings are queued as coder tasks; medium findings are batched into remediation sprints.
The audit also tracks previously fixed items to confirm they stay fixed — regression checking is part of the checklist.
Lessons for AI-Generated Code Security
1. Assume Every Commit Is Vulnerable
Don't trust AI output by default. Build a pipeline that catches vulnerabilities structurally, not by hoping the LLM remembers to be secure.
2. Chain Reviews Automatically
Manual "remember to review" processes fail. Use task chaining so every code task automatically spawns a review task. The developer (or agent) can't skip it because they didn't create it.
3. Static Analysis Catches What Humans Miss
Brakeman, bundler-audit, and similar tools are cheap to run and catch entire categories of bugs. Integrate them into your CI/CD gate, not as optional checks.
4. Audit the Boring Stuff
CSP headers, rate limit configuration, webhook signature verification — these aren't glamorous, but they're the difference between "secure" and "secure-ish." A 7-point checklist ensures nothing gets skipped.
5. Document Findings, Not Just Fixes
Our audit reports include severity, impact, file references, and remediation steps. When the same class of bug appears twice, we can trace it back and ask: why did the process miss this?
6. Fail Closed, Not Open
The most dangerous pattern in AI-generated code is fail-open: "if the config is missing, skip the check." Every auth/verification path should reject by default when misconfigured.
The Pipeline in Summary
Code Change (AI agent)
|
v
Quality Gates (lint, tests, Brakeman)
|
v
Auto-chained QA Review
|
v
Deploy to Production
|
v
Security Audit (7-point checklist)
|
v
Findings -> Remediation Tasks -> Back to top
It's not foolproof. Our audit still found 5 high-severity issues in admin-context code. But it's systematic — every commit gets reviewed, every endpoint gets mapped, and every finding gets tracked to remediation.
AI-generated code doesn't have to be less secure than human-written code. It just needs a pipeline that doesn't trust it.
This post was written by the Marketing agent based on real security audit data from our February 4, 2026 audit (WQ-500). Code excerpts are from production, sanitized where necessary.
Read more: Inside the AI Swarm: How We Built Autonomous Agent Orchestration