How WarmySender handles load

What this page covers

WarmySender is a 4-pillar outreach platform — Cold Emailing, Email Warmup, LinkedIn Outreach, and Multichannel sequences. Across these pillars we send tens of thousands of emails and LinkedIn actions per day on behalf of customers, and we ramp gracefully past traffic spikes without missing sends, over-sending against caps, or putting customer accounts at risk. This page documents the architecture, the safety guarantees, and what you should expect during peak times.

The short version:

Queue architecture

Every send (email or LinkedIn) is materialized as a row in the campaign_send_jobs table with status pending when it's first scheduled. A scheduler tick reads pending jobs whose run_at has arrived and pushes them onto a BullMQ queue backed by Upstash Redis. Worker processes pull jobs from BullMQ, attempt the send via the appropriate provider (SMTP for email; Unipile for LinkedIn), and write the result back to Postgres.

The two-layer design (Postgres for source-of-truth + Redis for fast dispatching) means we get the best of both worlds:

For a deeper dive on the Phase 1 (in-process) vs Phase 2 (Redis/BullMQ) auto-switch architecture, see the main documentation.

Cap enforcement

LinkedIn campaigns enforce per-action daily caps via an atomic Redis Lua script. The full design is documented in How invite, message, and InMail caps work together; here's the load-handling summary:

For email campaigns, daily limits are enforced at the mailbox-pacer layer with a similar atomic INCR pattern keyed on mailbox + UTC date. The fail-CLOSED behavior is identical: if the pacer can't talk to Redis, we defer the send rather than burst.

Recurring heals (the evergreen sweepers)

Some race conditions cannot be fully eliminated at write time — Unipile webhook delivery skew, post-accept pipeline transients, timing-window edge cases at the boundary between wait_accept timeout and accept arrival. Rather than hand-waving "it's eventually consistent," we run two recurring heal sweeps every 6 hours that find any prospect stuck in one of these edge cases and self-heal:

Both sweeps:

Why 6 hours and not faster? Because the cohorts these sweeps target are small (typically 0–20 rows platform-wide at any given time after a deploy) and the underlying live engine is already self-healing for the common case. Six hours is a sweet spot between "fast enough that customers don't notice the gap" and "infrequent enough that we never thrash the database."

Upstash Redis best practices we follow

We follow the published Upstash Redis best practices across every Redis-touching code path:

Load capacity limits

The platform's current sustained throughput envelope (single deployment region):

If you ever feel like sends are slower than expected, check the campaign-not-sending diagnosis page first — most "slow" reports trace back to the campaign's configured cap, sending window, or per-account ramp ceiling, not to platform-wide load. We monitor queue depth + processing latency + cap-block rate per workspace, and we'd see the issue before you do if it were systemic.

What "deferred" status means

If you see a campaign or prospect in deferred status, it means the engine tried to send but a safety gate said "not now." Common reasons:

Deferred sends are never lost. They sit in campaign_send_jobs with status='pending' and run_at set to the next valid slot. The next scheduler tick picks them up and dispatches when the gate clears. If you want to see what's blocking, the campaign detail page has a "Why deferred?" section that lists the active gates per pending send.

Incident response posture

If something goes wrong (Upstash outage, Unipile downtime, Postgres latency spike), our incident response is built around three principles:

Post-incident, we publish what happened, what the customer impact was, and what we changed to prevent recurrence — see the missed-accept, InMail count correction, and other troubleshooting pages for examples of past incidents and how they shaped the platform.

Common questions

Will my campaign keep sending if Upstash Redis goes down?

Email campaigns: yes, with degraded-mode pacing. The mailbox-pacer falls back to a Postgres-only gate that still enforces daily cap, but loses atomic-strength under high concurrency. With single-worker deployments (typical), this is functionally equivalent to the Redis fast path. LinkedIn campaigns: cap enforcement returns circuit-open, and the caller falls back to the Postgres SELECT-COUNT gate which still blocks at-or-above the cap. Sends continue but with slightly higher per-send latency. When Redis recovers (typically within minutes for Upstash incidents), the cap counters cold-start-backfill from Postgres on the next miss — no manual intervention.

How does the platform recover from a worker crash mid-send?

Every job in processing status has a processing_started_at timestamp. A watchdog scans for processing rows older than 15 minutes (email) or longer for LinkedIn, and flips them back to pending for re-dispatch. The send is idempotent at the provider boundary (we use Unipile's idempotency_key for LinkedIn; SMTP duplicates are rare and detected by the message-ID dedup). Per LL#324 (May 6, 2026), Upstash post-addBulk silent drops also auto-heal at the queue layer.

What's the typical end-to-end latency from "scheduled" to "sent"?

For a job scheduled with run_at = NOW(): the scheduler tick runs every 60 seconds, picks up the job, dispatches to BullMQ, the worker pulls it within milliseconds, and the actual send happens within a few seconds (Unipile API latency for LinkedIn; SMTP handshake for email). End-to-end: 60–90 seconds typical, ~3 minutes p99. Recurring heal latency is bounded by the 6-hour tick cadence — see recurring heals above for why we picked that cadence.

How do I know if my campaign is hitting the cap vs. sending normally?

The campaign detail page shows daily counts vs. cap for each action type (invites, messages, InMails). If the cap is hit, the badge color shifts and the "Why deferred?" section lists "daily cap" as the active gate. The platform also logs [Cap-Redis] decision=blocked on every cap-block event; admin users can grep these from the deployment console.

Does the recurring heal scheduler cost extra Unipile API quota?

No. The heal sweeps make zero Unipile API calls — they only stamp database sentinels and enqueue follow-ups via the existing send pipeline. The follow-ups themselves count against the campaign's daily message cap (one slot per follow-up), not invite cap. If the cap is exhausted, the follow-up defers to the next day.

What happens if both ticks (late-accepts + stuck-post-accepts) try to fire at the same time?

They don't, by design — the second tick is staggered 30 minutes after the first. If somehow both did fire, each holds its own Redis SETNX lock with a 30-minute TTL, so a second invocation while one is mid-flight returns "lock held" and skips. Multi-pod deployments use the same Redis lock, so we never double-heal across pods. If Redis is unavailable, each tick falls back to a process-local lock (we may double-heal across pods during a Redis outage, but the underlying helper is idempotent at the row level, so the worst case is a few duplicate audit rows in linkedin_events — no double-sends).

Is the platform multi-region today?

No, currently single-region (us-east-1 on Neon + Upstash + Replit). Multi-region adds significant complexity (cap-counter consensus, webhook routing, DB-replication latency) and we haven't seen the throughput need to justify it. If your campaign is highly latency-sensitive (e.g., regulatory windows in Europe), let us know and we'll prioritize accordingly.

How do I reach the team if I see a load issue?

Email hello@warmysender.com with the campaign name, time-window, and what behavior you're seeing (deferred status, slow sends, missed webhook). We'll dig into the queue depth and per-account ramp state for that workspace. For real-time issues, the Support page has the on-call rotation contact.

WarmySender is a 4-pillar outreach platform: Cold Emailing, Email Warmup, LinkedIn Outreach, and Multichannel sequences. Load handling principles apply across all four pillars; specifics differ per provider but the architecture invariants (Postgres source-of-truth, Redis hot path, fail-CLOSED safety gates, idempotent recovery) are universal.