HN Told Us Our SQLite Backups Were Wrong (So We Fixed It)
We published a blog post about running SQLite in production. Four days later, a stranger posted it to Hacker News. The community read it, picked it apart, and found a real bug in our approach.
This is the story of what they caught, how we fixed it, and the backup script we should have had all along.
What We Said
Near the end of the original post, tossed off as a SQLite benefit over Postgres:
Backups are
cp production.sqlite3 backup.sqlite3.
One line. No pg_dump, no connection strings. Simple. Elegant.
Wrong.
What HN Told Us
User infamia flagged it:
SQLite has a '.backup' command that you should always use to backup a SQLite DB. You're risking data loss/corruption using 'cp' to backup your database as prescribed in the article.
They're absolutely right. Here's why.
SQLite in WAL mode maintains three files: the main database (production.sqlite3), the write-ahead log (production.sqlite3-wal), and a shared-memory file (production.sqlite3-shm). The WAL file is the critical piece — it holds all recent writes that haven't been checkpointed back into the main database yet.
When you cp just the database file, you get a snapshot of the main database without any uncommitted WAL entries. If the last checkpoint was an hour ago, you've just lost an hour of data. Worse — if a write is in progress during the copy, you can capture a half-written page. The file opens without complaint. sqlite3 won't even warn you. But data is silently missing, or corrupted, or both.
You might think "just copy all three files" solves it. It doesn't. The three files are only consistent relative to each other at a specific point in time. Copying them sequentially with cp means the WAL might have new entries by the time you finish copying the main file. You'd need to freeze all writes, copy everything atomically, then release — which is exactly what the .backup API does for you.
The irony: we wrote an entire section about WAL mode in the same post. We described how writers append to the WAL file instead of modifying the database directly. And then, three sections later, we recommended copying just the main file as if WAL didn't exist.
The community noticed.
The Right Way: sqlite3 .backup
SQLite's .backup command (and its C API equivalent, sqlite3_backup_init) creates a consistent snapshot even while the database is being written to. It acquires the necessary locks, copies all pages including pending WAL entries, and produces a complete, self-contained backup file.
The whole fix is one line:
# BEFORE: naive copy (risks corruption under WAL)
cp storage/production.sqlite3 backups/production-$(date +%Y%m%d).sqlite3
# AFTER: safe backup via sqlite3 .backup API
sqlite3 storage/production.sqlite3 ".backup backups/production-$(date +%Y%m%d).sqlite3"
Same length. Same result on a quiescent database. Dramatically different behavior under load.
Under the hood, .backup starts a read transaction on the source database, which gives it a consistent snapshot of all pages. It then copies those pages — including any WAL content — into the destination file. If the source database is written to during the backup, .backup detects the change and restarts the copy from the beginning. The result is always a consistent, self-contained database file.
cp knows nothing about any of this. It copies bytes. If those bytes are in the middle of being rearranged by a WAL checkpoint, you get a file that looks right and opens fine but has corrupted or missing pages that won't surface until you try to read the affected rows.
Building a Real Backup Script
Once you accept that .backup is the right primitive, you want a script that handles all four of our SQLite databases, manages timestamps, and can clean up old backups. Here's the core of our bin/backup:
#!/usr/bin/env bash
set -e
STORAGE_DIR="storage/production"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
DATABASES=(
"production.sqlite3"
"production_queue.sqlite3"
"production_cache.sqlite3"
"production_cable.sqlite3"
)
backup_database() {
local db_file="$1"
local db_path="${STORAGE_DIR}/${db_file}"
local name="${db_file%.sqlite3}"
local backup_path="${STORAGE_DIR}/${name}_backup_${TIMESTAMP}.sqlite3"
if [ ! -f "$db_path" ]; then
echo " SKIP ${db_file} (not found)"
return 0
fi
sqlite3 "$db_path" ".backup '${backup_path}'"
local size=$(ls -lh "$backup_path" | awk '{print $5}')
echo " OK ${db_file} → $(basename "$backup_path") (${size})"
}
for db in "${DATABASES[@]}"; do
backup_database "$db"
done
The full script also includes --list (show existing backups) and --clean N (remove backups older than N days, default 7). Four databases — primary, cache, queue, cable. Cache and cable regenerate on restart, so you could skip them. But having all four means you can restore to an exact point-in-time state when you're debugging a deploy that went sideways at 2am.
On our t3.small with 16GB disk, each backup set is ~30MB. Seven days of rotation is ~200MB — trivial. The real disk pressure comes from Docker images (~1.5GB each), not from backup files.
The Other Feedback
The backup issue was the most critical correction, but it wasn't the only useful feedback from the thread.
sgbeal clarified SQLite's JSON operator nuances. We'd mentioned that json_extract returns native types and the need to CAST to text. sgbeal went deeper: the -> operator returns the value as JSON text, while ->> extracts it as a SQL-native type. Knowing which operator gives you which type prevents an entire class of subtle comparison bugs where integer 1 doesn't match string "1":
-- -> returns JSON text (always a string)
SELECT typeof('{"id":1}' -> '$.id'); -- 'text'
-- ->> returns the native SQL type
SELECT typeof('{"id":1}' ->> '$.id'); -- 'integer'
This distinction matters every time you write a WHERE clause against JSON-extracted values. We've been using CAST(json_extract(...) AS TEXT) as a safety net, but understanding the operator semantics is a better long-term approach.
leosanchez suggested gobackup — a self-hosted backup tool that runs as a container in your Docker Compose stack. It can ship backups to S3, GCS, or other remote storage on a schedule. For teams that want automated offsite rotation without writing shell scripts, it's a more complete solution than what we built above. We haven't adopted it yet — our single-server setup doesn't justify the container overhead — but it's on the list for when we outgrow cron + shell script.
Why This Matters Beyond Backups
A stranger found our post interesting enough to share. A community of engineers read it, and one of them caught a bug we'd missed. We didn't pay for that code review. We didn't schedule it. Someone just told us we were wrong, and they were right.
This is the part of technical blogging that doesn't show up in content strategy documents. You write honestly about how your system works — including the shortcuts you took, the assumptions you didn't question. People who've been burned by those same assumptions read it and tell you what's wrong. You fix it. Then you write about that, and the cycle continues.
The blog becomes a feedback loop for the engineering itself, not just a channel for announcing features.
From one HN thread, we got backup advice that prevents potential data corruption. We got deeper knowledge about JSON operators we use daily. We got a tool recommendation we hadn't encountered. All because we published something honest about our stack — including the part where we were confidently wrong.
The cp backup line had been in our mental model for months. We'd never tested restoring from a backup taken during active writes. We'd never had to. The feedback cost us nothing, and we got a fix deployed within days of learning about it.
If you're running SQLite in production, switch your backups from cp to sqlite3 .backup today. It's a one-line change. If you're not running SQLite, the takeaway is maybe more broadly useful: publishing your technical decisions in public, including the ones you're not 100% sure about, is the cheapest code review you'll ever get.
Next time: Stripe Webhooks in Rails — the gotchas nobody warns you about, from dual-path payment races to the three API calls it takes to extract a single fee amount.