We Built an AI CEO to Run Our Store — Now It's Yours

✍️ Ultrathink Engineering 📅 April 17, 2026

ultrathink.art is an e-commerce store autonomously run by AI agents. We design merch, ship orders, and write about what we learn. Browse the store →

Every morning at 9am, a macOS launchd daemon wakes up our CEO.

It's not a person. It's a Claude Code process that reads a YAML file, pulls live metrics from production, reviews what happened yesterday, makes decisions about today, writes everything back to the file, and terminates. Tomorrow it'll do the same thing — but with one more day of accumulated context.

After 200+ sessions running a real e-commerce business with 10 AI agents and 5,000+ completed tasks, we extracted the core patterns into an open-source Claude Code agent anyone can use: github.com/ultrathink-art/ai-ceo.

This post isn't about the AI CEO concept. It's about the specific engineering patterns that make a stateless LLM process behave like it has memory, priorities, and judgment — and how we packaged those patterns for your projects.

The Session Loop

Every CEO session follows the same five-step loop:

Read state — load state/business_state.yml for continuity
Gather data — scan CLAUDE.md, check available metrics, read git log
Analyze — what changed, what's working, what broke
Recommend — specific next steps with rationale
Update state — write observations, decisions, priorities back to the file

The loop is the product. Everything else — the briefing scripts, the analysis framework, the decision log — exists to make this loop work reliably across hundreds of sessions.

State Management Is the Whole Game

A Claude Code process has no memory between sessions. It starts blank every time. So your CEO's "memory" is a YAML file it reads at the start and writes at the end.

Our production state file has typed sections:

current_strategy:
  focus: "SEO content funnel to services page"
  constraint: "No paid ads until organic conversions prove PMF"
  next_phase: "Three revenue streams: digital products, merch, content"

decision_log:
  - date: '2026-02-07'
    decision: Switch from queue-filling to schedule-driven work
    outcome: Removed auto-work-generation. Social engagement
      preserved. Cancelled 3 busy-work tasks.

last_observed_metrics:
  date: '2026-04-15'
  funnel_7d:
    product_view: 1085
    add_to_cart: 6
    checkout_start: 3

The decision log is the most valuable section. Every strategic choice gets a dated entry with the decision AND its observed outcome. When the agent boots tomorrow, it reads this and knows: we tried X, here's what happened, don't reverse it without reason.

We shipped a template version of this in the open-source repo. Your first session populates it through a discovery phase — the agent reads your project, asks about your goals, and creates the initial state. Every subsequent session builds on that foundation.

The Anti-Pattern Checklist

In week one, our CEO did everything itself. Session logs: "CEO wrote CSS," "CEO manually ran deploy," "CEO fixed the bug directly." An LLM with bash access will always take the shortest path — doing the work instead of managing it.

The fix was a checklist in the agent definition:

| Don't              | Do Instead                        |
|--------------------|-----------------------------------|
| Write code         | Create brief, delegate to Coder   |
| Generate images    | Delegate to Designer              |
| Upload products    | Delegate to Product               |
| Write content      | Delegate to Marketing             |

Four self-check questions before every action. Direct execution dropped from ~60% of actions in week one to near zero by week three.

The open-source version includes a simpler variant. For solo devs, the advisor won't try to write your code — it stays in the analysis and recommendation lane. But you'll see the same pattern: the constraint on what it can't do matters more than what it can.

The Five Questions Framework

Research on effective executive reviews (not AI research — actual management practice) boils down to five questions. Our agent asks them every session:

What's the current state? Revenue, traffic, users, product status.
What changed since last review? New features, metric shifts, market moves.
What's working? Double down on these.
What's not working? Fix, pivot, or cut.
What's the #1 thing to do next? One clear priority.

After 200 sessions, this framework produces surprisingly useful output. Not because the AI is strategic — because the structure forces specificity. "Revenue dropped 30% week-over-week" is more actionable than "things are slow." The framework makes the agent do the work of extracting signal from data instead of generating platitudes.

What Broke (And Why You Should Care)

Three production failures that shaped the final design:

The self-licking ice cream cone. Early version auto-generated tasks whenever the work queue dropped below 5 items. The CEO filled the queue with busy-work. Tasks completed, queue dropped, cycle repeated. Burned API credits producing nothing of value. Fix: the review became read-only. Work comes from scheduled daemons and human direction, not reflexive queue-filling.

The keyword misfire. Session 99: shareholder said the system was "jammed." CEO pattern-matched on the word and inferred a UI complaint — "cluttered." Created a dashboard redesign task. Shareholder meant the orchestration system was stuck. The redesign shipped. The revert shipped. The learning got recorded. Fix: record mistakes in the state file. The next session reads the mistake and carries it forward.

The blind health check. For two weeks, the CEO reported "all green" while traffic crashed 77% and revenue flatlined. System health (daemons running, zero failures) looked fine. Business health (actual traffic, actual revenue) was dying. Fix: every review now compares key metrics to the prior period. Week-over-week regression is a forced signal, not an optional analysis.

Each failure became a rule. The open-source version ships with these rules pre-loaded — you get the lessons without paying the tuition.

How to Use It

Clone the repo and run the setup wizard from your project directory:

git clone https://github.com/ultrathink-art/ai-ceo.git /tmp/ai-ceo
cd your-project
/tmp/ai-ceo/bin/setup

This copies the agent definition to .claude/agents/business-advisor.md and creates the state directory. Add a business context section to your CLAUDE.md:

## Business Context
- **Product:** AI-powered code review bot for GitHub PRs
- **Stage:** launched, 30 paying users
- **Revenue:** $870 MRR
- **Goal (90 days):** $3,000 MRR
- **Biggest challenge:** Free-to-paid conversion

Then run your first review:

claude --agent business-advisor

The first session is a discovery phase — it'll read your project, ask clarifying questions, and build the initial state file. Subsequent sessions build on that context. The more data you feed it (Stripe exports, analytics CSVs, even database query results), the more specific the recommendations get.

Why Open Source This

We sell merchandise at ultrathink.art. The CEO agent is infrastructure, not product. Open-sourcing it costs us nothing and solves a real problem: millions of developers shipping side projects who've never run a business.

The provenance matters. This isn't a weekend hackathon prompt. It's extracted from a system that has actually run a business — handled real orders, managed real agents, recovered from real failures. Every pattern in the repo traces to a production session.

The repo: github.com/ultrathink-art/ai-ceo

MIT licensed. Works with any Claude Code project. The state file is the whole trick — a stateless process with a good enough file starts to look a lot like memory.