How WarmySender handles load

What this page covers

WarmySender is a 4-pillar outreach platform — Cold Emailing, Email Warmup, LinkedIn Outreach, and Multichannel sequences. Across these pillars we send tens of thousands of emails and LinkedIn actions per day on behalf of customers, and we ramp gracefully past traffic spikes without missing sends, over-sending against caps, or putting customer accounts at risk. This page documents the architecture, the safety guarantees, and what you should expect during peak times.

The short version:

Queue architecture

Every send (email or LinkedIn) is materialized as a tracked job with status pending when it's first scheduled. A scheduler tick reads pending jobs whose run_at has arrived and pushes them onto a high-throughput dispatch queue. Worker processes pull jobs from the queue, attempt the send via the appropriate provider (SMTP for email; our LinkedIn integration for LinkedIn), and write the result back to durable storage.

The two-layer design (durable store for source-of-truth + cache layer for fast dispatching) means we get the best of both worlds:

For a deeper dive on the Phase 1 (in-process) vs Phase 2 (durable-queue) auto-switch architecture, see the main documentation.

Cap enforcement

LinkedIn campaigns enforce per-action daily caps via an atomic counter script. The full design is documented in How invite, message, and InMail caps work together; here's the load-handling summary:

For email campaigns, daily limits are enforced at the mailbox-pacer layer with a similar atomic-counter pattern keyed on mailbox + UTC date. The fail-CLOSED behavior is identical: if the pacer can't talk to the cache layer, we defer the send rather than burst.

Recurring heals (the evergreen sweepers)

Some race conditions cannot be fully eliminated at write time — LinkedIn webhook delivery skew, post-accept pipeline transients, timing-window edge cases at the boundary between wait_accept timeout and accept arrival. Rather than hand-waving "it's eventually consistent," we run two recurring heal sweeps every 6 hours that find any prospect stuck in one of these edge cases and self-heal:

Both sweeps:

Why 6 hours and not faster? Because the cohorts these sweeps target are small (typically 0–20 rows platform-wide at any given time after a deploy) and the underlying live engine is already self-healing for the common case. Six hours is a sweet spot between "fast enough that customers don't notice the gap" and "infrequent enough that we never thrash the database."

Cache-layer best practices we follow

We follow industry-standard best practices for high-throughput cache use across every cache-touching code path:

Load capacity limits

The platform's current sustained throughput envelope (single deployment region):

If you ever feel like sends are slower than expected, check the campaign-not-sending diagnosis page first — most "slow" reports trace back to the campaign's configured cap, sending window, or per-account ramp ceiling, not to platform-wide load. We monitor queue depth + processing latency + cap-block rate per workspace, and we'd see the issue before you do if it were systemic.

What "deferred" status means

If you see a campaign or prospect in deferred status, it means the engine tried to send but a safety gate said "not now." Common reasons:

Deferred sends are never lost. They remain in the durable job store with status='pending' and run_at set to the next valid slot. The next scheduler tick picks them up and dispatches when the gate clears. If you want to see what's blocking, the campaign detail page has a "Why deferred?" section that lists the active gates per pending send.

Incident response posture

If something goes wrong (cache-layer outage, LinkedIn integration downtime, durable-store latency spike), our incident response is built around three principles:

Post-incident, we publish what happened, what the customer impact was, and what we changed to prevent recurrence — see the missed-accept, InMail count correction, and other troubleshooting pages for examples of past incidents and how they shaped the platform.

Common questions

Will my campaign keep sending if the cache layer goes down?

Email campaigns: yes, with degraded-mode pacing. The mailbox-pacer falls back to a durable-store-only gate that still enforces daily cap, but loses atomic-strength under high concurrency. With single-worker deployments (typical), this is functionally equivalent to the cache fast path. LinkedIn campaigns: cap enforcement returns circuit-open, and the caller falls back to the durable-store count gate which still blocks at-or-above the cap. Sends continue but with slightly higher per-send latency. When the cache recovers (typically within minutes), the cap counters cold-start-backfill from the durable store on the next miss — no manual intervention.

How does the platform recover from a worker crash mid-send?

Every job in processing status has a processing_started_at timestamp. A watchdog scans for processing rows older than 15 minutes (email) or longer for LinkedIn, and flips them back to pending for re-dispatch. The send is idempotent at the provider boundary (we use an idempotency key for LinkedIn; SMTP duplicates are rare and detected by the message-ID dedup). Post-batch silent drops also auto-heal at the queue layer.

What's the typical end-to-end latency from "scheduled" to "sent"?

For a job scheduled with run_at = NOW(): the scheduler tick runs every 60 seconds, picks up the job, dispatches to the queue, the worker pulls it within milliseconds, and the actual send happens within a few seconds (provider API latency for LinkedIn; SMTP handshake for email). End-to-end: 60–90 seconds typical, ~3 minutes p99. Recurring heal latency is bounded by the 6-hour tick cadence — see recurring heals above for why we picked that cadence.

How do I know if my campaign is hitting the cap vs. sending normally?

The campaign detail page shows daily counts vs. cap for each action type (invites, messages, InMails). If the cap is hit, the badge color shifts and the "Why deferred?" section lists "daily cap" as the active gate. The platform also logs cap-block events that admin users can grep from the deployment console.

Does the recurring heal scheduler cost extra LinkedIn API quota?

No. The heal sweeps make zero new LinkedIn API calls — they only stamp internal sentinels and enqueue follow-ups via the existing send pipeline. The follow-ups themselves count against the campaign's daily message cap (one slot per follow-up), not invite cap. If the cap is exhausted, the follow-up defers to the next day.

What happens if both ticks (late-accepts + stuck-post-accepts) try to fire at the same time?

They don't, by design — the second tick is staggered 30 minutes after the first. If somehow both did fire, each holds its own distributed lock with a 30-minute TTL, so a second invocation while one is mid-flight returns "lock held" and skips. Multi-pod deployments use the same shared lock, so we never double-heal across pods. If the cache layer is unavailable, each tick falls back to a process-local lock (we may double-heal across pods during a cache outage, but the underlying helper is idempotent at the row level, so the worst case is a few duplicate audit rows — no double-sends).

Is the platform multi-region today?

No, currently single-region. Multi-region adds significant complexity (cap-counter consensus, webhook routing, replication latency) and we haven't seen the throughput need to justify it. If your campaign is highly latency-sensitive (e.g., regulatory windows in Europe), let us know and we'll prioritize accordingly.

How do I reach the team if I see a load issue?

Email hello@warmysender.com with the campaign name, time-window, and what behavior you're seeing (deferred status, slow sends, missed webhook). We'll dig into the queue depth and per-account ramp state for that workspace. For real-time issues, the Support page has the on-call rotation contact.

WarmySender is a 4-pillar outreach platform: Cold Emailing, Email Warmup, LinkedIn Outreach, and Multichannel sequences. Load handling principles apply across all four pillars; specifics differ per provider but the architecture invariants (durable source-of-truth, fast cache hot path, fail-CLOSED safety gates, idempotent recovery) are universal.