LinkedIn bridge-miss diagnostic alerts — operations runbook
What this page covers
This page is for WarmySender's operations and customer-support team. It explains how the bridge_miss_diagnostic alert pipeline works, when alerts fire, when they auto-acknowledge, and how to triage them. If you're a customer looking for the plain-language explainer of bridge misses themselves, see What is a "bridge miss" and when does it happen?.
WarmySender is a 4-pillar outreach platform — Cold Emailing, Email Warmup, LinkedIn Outreach, and Multichannel sequences. The bridge-miss alert pipeline is part of the LinkedIn Outreach pillar's observability surface.
The problem the pipeline solves
WarmySender's LinkedIn webhook handler emits a structured diagnostic log line on every bridge-miss trace branch — that's been live since early May 2026. However, until a follow-up shortly after, no corresponding row was being written to the operations alerts store. An audit found dozens of bridge-miss trace rows in a 24-hour window alongside zero alert rows: the alert pipeline was dormant despite the trace observability being healthy.
This was a silent regression. A trace branch with a non-success path needs a corresponding alert with idempotency and acknowledgement semantics — otherwise dashboards and ops paging never see the signal. The fix wires the alert helper at every bridge-miss / bridge-partial-late-accept-only trace site and auto-acknowledges any open alerts when the same prospect later bridges successfully.
When alerts fire
The pipeline fires an alert from inside the accept-handler trace on these two branches:
- Bridge miss across all strategies — the matcher tried the legacy enrollment lookup, the campaign-prospect match (including the Sales Navigator public-handle condition), and the relaxed name+workspace+recency match, and none of them bridged. The workspace HAS running campaigns (the orphan-webhook guard already ran).
- Bridge partial — late-accept only — the matcher found candidates via the relaxed strategy, recorded one or more late accepts, but did NOT bridge any active workflow. This is a softer signal than the all-strategies miss, but it still indicates the matcher couldn't tie the webhook to an active campaign step.
The alert is NOT fired on:
- Bridge matched — the success path. Instead, this branch acknowledges any open alerts for the same (account, member-id) pair.
- No running campaign — orphan webhook. By-design (workspace has zero running LinkedIn campaigns); not a bug.
- Account not found — the account on the alert payload would be null, so the helper no-ops (the trace row alone is sufficient at this branch).
- No match conditions — webhook had no profile URL or member id; no matcher ran. Already covered by other observability.
- Cross-account ambiguous abort, cross-account match, late-accept match, legacy match without unified bridge, bridge threw exception — these have their own trace and alerting paths.
Alert row shape
A bridge-miss diagnostic alert record carries:
- Alert type — the webhook-outage bucket (the existing category is re-used because a bridge miss IS a webhook bridging failure; the canonical discriminator lives in a kind tag on the payload).
- Severity — warning by default; escalates to critical if the same account already has more than 5 open bridge-miss alerts in the last hour.
- Kind tag — set to "bridge-miss-diagnostic" — the load-bearing discriminator that dashboards filter on.
- Branch — either "bridge miss across all strategies" or "bridge partial late-accept only".
- Account reference — the receiving LinkedIn account id.
- Truncated member id — first 16 chars of the inbound webhook member id (PII-safe).
- Profile URL presence — boolean (was the webhook's profile URL field populated?).
- Candidate counts — strict-candidate count, relaxed-candidate count, and late-accept-recorded count, cumulative across the strict and relaxed matcher strategies.
- Open count in last hour — the count that drove the severity decision.
- Created at — timestamp at insert time.
- Acknowledged at — empty until either an ops admin acks the row OR a subsequent successful bridge on the same (account, member-id) pair auto-acks.
Idempotency: 1-hour window per (account, member-id) pair
Inside the 1-hour idempotency window, repeated misses for the same (account, member-id) pair do NOT create new alert rows. This prevents alert flooding when a single misbehaving prospect retriggers the same trace branch across multiple webhook deliveries (upstream retries on transient errors, manual replays, deploy races).
The window is 60 minutes. After 60 minutes, if the same pair is still misbridging, a fresh row is appropriate — that's a structural problem that has persisted past the recurring 6h heal sweep, and ops should see a fresh alert.
Severity escalation: warning to critical at >5 misses/account/hour
Default severity is "warning". The pipeline escalates to "critical" when the receiving account already has more than 5 open bridge-miss diagnostic alerts in the last hour. This points at a structural matcher problem on that account — likely:
- a Sales Navigator prospect cohort that needs the public-handle backfill re-run for the affected account; or
- a deploy race that put the account's prospect rows in an inconsistent state; or
- account-data quality issues (renamed prospects, URL changes mid-flight).
The 5-per-hour threshold is calibrated: enough headroom for normal background noise (1-3 misses/hour on a busy account is expected) but tight enough that a sustained matcher regression escalates within an hour.
Auto-acknowledge on successful bridge
When a webhook later arrives for the same member id AND the matcher successfully bridges, the same trace call fires an acknowledge step for the (account, member-id) pair. This stamps an acknowledged-at timestamp on every open alert for that pair, removing them from the open-alert backlog.
The result: the open-alerts dashboard reflects only currently-misbridging accounts. If a prospect previously hit the bridge-miss path but later bridged (e.g. because the public-handle backfill caught up, or the user reconnected an account), their alert auto-clears — no manual cleanup required.
How to triage an open bridge-miss diagnostic alert
- Check severity. Critical rows are higher priority — they indicate more than 5 misses on a single account in the last hour. Warning rows are likely individual misses.
- Check the cumulative counts. If the strict-candidate count is above zero, the matcher saw rows but didn't bridge — that's a real bug. If both strict and relaxed counts are zero, the miss is likely a cold accept (no candidate row exists) — possibly a Sales Navigator URL prospect that pre-dates the public-handle backfill.
- Check the late-accept recorded count. If above zero, the matcher found rows but recorded them as late accepts only — the recurring 6h heal will likely catch the next state transition.
- Check the account. Look up the account in the LinkedIn accounts list. If it isn't in a connected state, or it's currently in a send cooldown, the account itself is in a constrained state — bridge misses are downstream of that.
- Run dashboard queries. Internal dashboards show the top accounts with bridge misses in the last 7 days and surface Sales Navigator prospects missing a public handle — backfill candidates.
- If unclear: contact engineering with the alert ID + the account + the truncated member id.
Hourly health monitor
A complementary cron runs every hour. It runs a subset of the dashboard queries and emits one structured log line per tick summarizing fleet-wide LinkedIn health: bridge match rate (24h), open and critical bridge-miss diagnostic alert counts, InMail credit-breaker stamps (1h), circuit-breaker defer count (1h), cap-decision distribution (24h), stuck post-accepts, and Sales Navigator prospects missing a public handle.
It has an off-switch (default on). Pure read-only — zero LinkedIn API calls, zero database mutations.
The structured log line is greppable from our health monitor and dashboard-ingestible. Use it for trend dashboards (24h delta, 7d trend) without re-running the dashboard query pack manually.
Account safety
The bridge-miss alert pipeline is read-only at the LinkedIn boundary. It performs:
- One indexed read against the alerts store (idempotency lookup, keyed on the kind tag and account).
- One indexed read against the alerts store (severity-decision count, same index).
- One write into the alerts store (the alert row itself).
And on the successful-bridge path:
- One update on the alerts store (acknowledge open rows for the pair).
All four operations are caught internally — an alert pipeline failure can NEVER break the webhook handler's load-bearing behavior (trace row + workflow advance/stop). The handler proceeds normally even if the alert insert/update crashes.
Account safety always wins: the pipeline does not introduce ANY new automatic LinkedIn API calls. It does not stamp any forward-mover-readable column on prospects, accounts, or campaigns. It is purely an observability layer.
Common questions
Why is the alert type labeled as a webhook-outage type instead of a bridge-miss-diagnostic type?
Because the alert type column uses an internal enum that doesn't have a bridge-miss-diagnostic value, and extending it requires a migration. Following the mass-disconnect rate-limiter pattern, the canonical discriminator lives in a "kind" tag on the alert's data payload instead. Filtering and dashboards key on rows whose kind tag is "bridge-miss-diagnostic" — the type column is just a thematic bucket. The webhook-outage type is the closest fit because a bridge miss is a webhook bridging failure.
How do I see only open (un-acknowledged) alerts?
Run the open-alerts dashboard query. It filters out rows that already have an acknowledged-at timestamp and orders by severity (critical first) then most recent. Critical rows surface first.
Why do critical rows fire at >5 instead of, say, >10?
Calibrated middle. At 1-2 misses on a busy account in an hour, this is normal background noise (Sales Navigator URL prospects, prospect-side URL changes, cross-account-disconnect timing). At 5+, the pattern is structural and deserves to escalate within an hour. Above 10 would mean ops only sees the symptom after the customer has already noticed.
If a customer's alert auto-acknowledges, do they see anything in their dashboard?
The customer-facing dashboard shows the bridge-miss count (which auto-decrements when the heal sweep catches up), not the admin-alert row itself. Admin alerts live in the operations dashboard (internal). Customers see the matcher state via what-is-bridge-miss-and-when-does-it-happen, the Late accepts tile, and the Webhook events tile.
Can I temporarily silence an alert without acknowledging?
Yes — manually stamp the alert as acknowledged with your operator account recorded. The auto-ack path is conditional on a later successful bridge firing, but a manual ack is also valid and supported. Don't delete the row — keep the audit trail.
Does this affect customer billing or account safety?
No. Alerts are operations-side only. The pipeline does not modify customer-facing state — no campaign pauses, no account flips, no message sends, no LinkedIn API calls. The only side effect is rows in the internal alerts store.
What happens if the alert helper itself crashes?
The helper is wrapped at three levels (the helper's own guard around the data write; the trace function's outer guard around the helper invocation; the alert helper's defensive parameter coercion). A crash is logged as a non-blocking warning but does not propagate. The webhook handler continues normally.
When was this pipeline shipped?
The bridge-miss alert pipeline shipped in early May 2026 as a follow-up to the same release that introduced the trace observability. See also What is a "bridge miss", Sales Navigator URL imports, and Why some accept webhooks are not actioned.
Related guides
- What is a "bridge miss" and when does it happen? — Customer-facing matcher overview
- Sales Navigator URL imports and acceptance tracking — Public-handle backfill
- Why some accept webhooks are not actioned — Orphan-webhook by-design case
- What is a "late accept"? — Recurring heal sweep that catches edge cases
- Why late-accept takes time to message — Webhook-time auto-fire
- LinkedIn campaign documentation — Schedule, sending windows, ramp, acceptance lag, disconnect flow
- Full documentation — All 90+ guides
- Support — How to get in touch
For ops triage of a specific alert: look up the alert by ID, identify the associated account, and check the bridge-miss and Sales Navigator backfill queries in the dashboard pack. For deeper engineering escalation, contact the on-call engineer via the standard rotation channel.