Skip to main content
Blog.engineeringHow We Scaled Our Transaction Processing to 10,000 TPS
engineering10 min read

How We Scaled Our Transaction Processing to 10,000 TPS

The engineering challenges of building a high-throughput payment system that handles both traditional banking and blockchain settlement.

AK

Alex Kim

Senior Engineer · Jan 14, 2026

How We Scaled Our Transaction Processing to 10,000 TPS

When we hit 1,000 transactions per second, our original architecture started showing cracks. Here's how we rebuilt our transaction processing pipeline to handle 10,000 TPS without breaking a sweat.

The Original Architecture

Our v1 system was straightforward: a Node.js API server, a MySQL database, and synchronous processing. When a user initiated a transfer, the API handler would validate the request, check balances, create the transaction record, update both account balances, and return the result — all in a single database transaction.

This worked well up to about 500 TPS. Beyond that, we started seeing lock contention on the accounts table. Two users sending money simultaneously would compete for row-level locks, and under high load, the average transaction time climbed from 50ms to over 2 seconds.

The Redesign

We broke the pipeline into three stages: validation, execution, and settlement. Each stage runs independently and communicates through an event queue.

**Validation** checks the request format, verifies the user's identity token, confirms the account exists, and performs a soft balance check (reading from a cached balance that's updated every second). If validation passes, the request enters the execution queue. This stage handles 15,000 requests per second because it's entirely read-only.

**Execution** is where the actual balance changes happen. We partition accounts across 16 execution workers based on a hash of the source account ID. This means two transactions from the same account always go to the same worker (preserving ordering), but transactions from different accounts execute in parallel. Each worker processes its queue sequentially, eliminating lock contention entirely.

**Settlement** handles the downstream effects: updating the transaction history, triggering notifications, syncing with our banking partners, and writing audit log entries. This stage is eventually consistent — a user might see their balance update before the transaction appears in their history, but the delay is typically under 500 milliseconds.

The Balance Cache

The soft balance check in the validation stage deserves special attention. We maintain a Redis cache of every account balance, updated by the execution workers after each transaction. The validation stage reads from this cache instead of hitting the database, which eliminates 80% of our database read load.

The cache is eventually consistent with the database, but we handle this carefully. If the cached balance shows $100 and the user tries to send $95, we allow it through to the execution stage, where the worker performs a hard balance check against the database. If the balance has changed (perhaps another transaction was processed between the cache read and the execution), the worker rejects the transaction and the user sees an insufficient funds error.

Idempotency at Scale

At 10,000 TPS, duplicate requests are inevitable — network retries, client-side double-clicks, and load balancer replays all contribute. Every transaction request includes a client-generated UUID. The execution worker checks this UUID against a Bloom filter before processing. If the UUID might exist (Bloom filters have false positives but no false negatives), it falls back to a database lookup. This two-tier approach handles 99.7% of duplicate checks without touching the database.

Monitoring and Alerting

We track p50, p95, and p99 latency for each pipeline stage. Our alert thresholds are: validation p99 > 100ms, execution p99 > 500ms, settlement p99 > 2 seconds. We also monitor queue depth — if any execution worker's queue exceeds 1,000 pending items, we trigger an auto-scaling event that spins up additional workers.

Results

After the redesign, our production metrics show: p50 latency of 45ms, p95 of 180ms, and p99 of 450ms at sustained 8,000 TPS. The system has handled bursts of 12,000 TPS during peak hours without degradation. Database CPU utilization dropped from 85% to 30%, and we eliminated the lock contention that was causing timeout errors for users.