LinkedIn bridge-miss diagnostic alerts — operations runbook

What this page covers

This page is for WarmySender's operations and customer-support team. It explains how the bridge_miss_diagnostic alert pipeline works, when alerts fire, when they auto-acknowledge, and how to triage them. If you're a customer looking for the plain-language explainer of bridge misses themselves, see What is a "bridge miss" and when does it happen?.

WarmySender is a 4-pillar outreach platform — Cold Emailing, Email Warmup, LinkedIn Outreach, and Multichannel sequences. The bridge-miss alert pipeline is part of the LinkedIn Outreach pillar's observability surface.

The problem the pipeline solves

WarmySender's LinkedIn webhook handler emits a structured diagnostic log line on every bridge-miss trace branch — that's been live since early May 2026. However, until a follow-up shortly after, no corresponding row was being written to the operations alerts store. An audit found dozens of bridge-miss trace rows in a 24-hour window alongside zero alert rows: the alert pipeline was dormant despite the trace observability being healthy.

This was a silent regression. A trace branch with a non-success path needs a corresponding alert with idempotency and acknowledgement semantics — otherwise dashboards and ops paging never see the signal. The fix wires the alert helper at every bridge-miss / bridge-partial-late-accept-only trace site and auto-acknowledges any open alerts when the same prospect later bridges successfully.

When alerts fire

The pipeline fires an alert from inside the accept-handler trace on these two branches:

The alert is NOT fired on:

Alert row shape

A bridge-miss diagnostic alert record carries:

Idempotency: 1-hour window per (account, member-id) pair

Inside the 1-hour idempotency window, repeated misses for the same (account, member-id) pair do NOT create new alert rows. This prevents alert flooding when a single misbehaving prospect retriggers the same trace branch across multiple webhook deliveries (upstream retries on transient errors, manual replays, deploy races).

The window is 60 minutes. After 60 minutes, if the same pair is still misbridging, a fresh row is appropriate — that's a structural problem that has persisted past the recurring 6h heal sweep, and ops should see a fresh alert.

Severity escalation: warning to critical at >5 misses/account/hour

Default severity is "warning". The pipeline escalates to "critical" when the receiving account already has more than 5 open bridge-miss diagnostic alerts in the last hour. This points at a structural matcher problem on that account — likely:

The 5-per-hour threshold is calibrated: enough headroom for normal background noise (1-3 misses/hour on a busy account is expected) but tight enough that a sustained matcher regression escalates within an hour.

Auto-acknowledge on successful bridge

When a webhook later arrives for the same member id AND the matcher successfully bridges, the same trace call fires an acknowledge step for the (account, member-id) pair. This stamps an acknowledged-at timestamp on every open alert for that pair, removing them from the open-alert backlog.

The result: the open-alerts dashboard reflects only currently-misbridging accounts. If a prospect previously hit the bridge-miss path but later bridged (e.g. because the public-handle backfill caught up, or the user reconnected an account), their alert auto-clears — no manual cleanup required.

How to triage an open bridge-miss diagnostic alert

  1. Check severity. Critical rows are higher priority — they indicate more than 5 misses on a single account in the last hour. Warning rows are likely individual misses.
  2. Check the cumulative counts. If the strict-candidate count is above zero, the matcher saw rows but didn't bridge — that's a real bug. If both strict and relaxed counts are zero, the miss is likely a cold accept (no candidate row exists) — possibly a Sales Navigator URL prospect that pre-dates the public-handle backfill.
  3. Check the late-accept recorded count. If above zero, the matcher found rows but recorded them as late accepts only — the recurring 6h heal will likely catch the next state transition.
  4. Check the account. Look up the account in the LinkedIn accounts list. If it isn't in a connected state, or it's currently in a send cooldown, the account itself is in a constrained state — bridge misses are downstream of that.
  5. Run dashboard queries. Internal dashboards show the top accounts with bridge misses in the last 7 days and surface Sales Navigator prospects missing a public handle — backfill candidates.
  6. If unclear: contact engineering with the alert ID + the account + the truncated member id.

Hourly health monitor

A complementary cron runs every hour. It runs a subset of the dashboard queries and emits one structured log line per tick summarizing fleet-wide LinkedIn health: bridge match rate (24h), open and critical bridge-miss diagnostic alert counts, InMail credit-breaker stamps (1h), circuit-breaker defer count (1h), cap-decision distribution (24h), stuck post-accepts, and Sales Navigator prospects missing a public handle.

It has an off-switch (default on). Pure read-only — zero LinkedIn API calls, zero database mutations.

The structured log line is greppable from our health monitor and dashboard-ingestible. Use it for trend dashboards (24h delta, 7d trend) without re-running the dashboard query pack manually.

Account safety

The bridge-miss alert pipeline is read-only at the LinkedIn boundary. It performs:

And on the successful-bridge path:

All four operations are caught internally — an alert pipeline failure can NEVER break the webhook handler's load-bearing behavior (trace row + workflow advance/stop). The handler proceeds normally even if the alert insert/update crashes.

Account safety always wins: the pipeline does not introduce ANY new automatic LinkedIn API calls. It does not stamp any forward-mover-readable column on prospects, accounts, or campaigns. It is purely an observability layer.

Common questions

Why is the alert type labeled as a webhook-outage type instead of a bridge-miss-diagnostic type?

Because the alert type column uses an internal enum that doesn't have a bridge-miss-diagnostic value, and extending it requires a migration. Following the mass-disconnect rate-limiter pattern, the canonical discriminator lives in a "kind" tag on the alert's data payload instead. Filtering and dashboards key on rows whose kind tag is "bridge-miss-diagnostic" — the type column is just a thematic bucket. The webhook-outage type is the closest fit because a bridge miss is a webhook bridging failure.

How do I see only open (un-acknowledged) alerts?

Run the open-alerts dashboard query. It filters out rows that already have an acknowledged-at timestamp and orders by severity (critical first) then most recent. Critical rows surface first.

Why do critical rows fire at >5 instead of, say, >10?

Calibrated middle. At 1-2 misses on a busy account in an hour, this is normal background noise (Sales Navigator URL prospects, prospect-side URL changes, cross-account-disconnect timing). At 5+, the pattern is structural and deserves to escalate within an hour. Above 10 would mean ops only sees the symptom after the customer has already noticed.

If a customer's alert auto-acknowledges, do they see anything in their dashboard?

The customer-facing dashboard shows the bridge-miss count (which auto-decrements when the heal sweep catches up), not the admin-alert row itself. Admin alerts live in the operations dashboard (internal). Customers see the matcher state via what-is-bridge-miss-and-when-does-it-happen, the Late accepts tile, and the Webhook events tile.

Can I temporarily silence an alert without acknowledging?

Yes — manually stamp the alert as acknowledged with your operator account recorded. The auto-ack path is conditional on a later successful bridge firing, but a manual ack is also valid and supported. Don't delete the row — keep the audit trail.

Does this affect customer billing or account safety?

No. Alerts are operations-side only. The pipeline does not modify customer-facing state — no campaign pauses, no account flips, no message sends, no LinkedIn API calls. The only side effect is rows in the internal alerts store.

What happens if the alert helper itself crashes?

The helper is wrapped at three levels (the helper's own guard around the data write; the trace function's outer guard around the helper invocation; the alert helper's defensive parameter coercion). A crash is logged as a non-blocking warning but does not propagate. The webhook handler continues normally.

When was this pipeline shipped?

The bridge-miss alert pipeline shipped in early May 2026 as a follow-up to the same release that introduced the trace observability. See also What is a "bridge miss", Sales Navigator URL imports, and Why some accept webhooks are not actioned.

For ops triage of a specific alert: look up the alert by ID, identify the associated account, and check the bridge-miss and Sales Navigator backfill queries in the dashboard pack. For deeper engineering escalation, contact the on-call engineer via the standard rotation channel.