Engineering

Observability for Email: What You Need to Measure and Why

Most teams are flying blind when it comes to email. Here's the full observability stack — metrics, events, and alerting — that high-volume senders use to keep deliverability healthy.

Kash Sajadi

Kash Sajadi

#observability #metrics #monitoring #deliverability

Your application has dashboards for API latency, error rates, and database query times. But what about email?

For most engineering teams, email is a black box. You call SendEmail, get a 200 OK from SES, and assume the message arrived. It often doesn't — and you find out when a user complains, not when it happens.

The Email Lifecycle

Every email passes through multiple stages, each of which can fail silently: Submit → Accept → Queue → Send → Deliver → Open → Click, with possible paths to Bounce, Complaint, or Spam Folder. Without instrumentation at each stage, you're operating without observability.

Tier 1: Delivery Metrics (Mandatory)

These are the metrics SES provides via SNS events and CloudWatch. If you're not collecting these, start today: DeliveryRate (% of sends accepted by recipient MX, target > 98%), HardBounceRate (permanent delivery failures, < 0.5%), SoftBounceRate (temporary failures, < 2%), ComplaintRate (spam reports, < 0.05%), RejectRate (messages rejected by SES content policy, target 0%). Setting these up requires SES configuration sets with SNS event destinations. A Lambda subscriber processes the events and writes to your metrics store.

Tier 2: Engagement Metrics

Delivery is necessary but not sufficient. Inbox placement doesn't mean the email was seen. Open rate: Unique opens / delivered. Track trends, not absolute numbers (Apple MPP inflates opens). Click-through rate: Unique clicks / delivered. The most reliable engagement signal. Unsubscribe rate: Ideally < 0.2% per send — higher means you're sending to the wrong people or too often. Conversion rate: The downstream business metric — form fills, purchases, activations.

Tier 3: Infrastructure Metrics

These require more setup but catch problems before they affect deliverability. DNS health: Continuously monitor that your SPF, DKIM, and DMARC records are correct and haven't drifted. A missing DKIM selector after a key rotation silently fails authentication. IP reputation: Check your sending IPs against major blocklists (Spamhaus, Barracuda, SURBL). A single blocklist listing can drop your inbox rate by 30%+. Inbox placement testing: Use seed list testing (Litmus, GlockApps, or 250ok) to verify that your email is landing in the inbox — not spam — across major ISPs before sending to your full list.

Setting Up Alerting

Good alerting has two properties: it fires before the problem is critical, and it provides enough context to act quickly. Example thresholds: Bounce Rate Warning at 3% over 1h (well before the 5% SES warning), Bounce Rate Critical at 7% over 1h (approaching SES suspension), Complaint Spike at 0.08% over 15m (short window to catch campaign-level problems fast), Delivery Rate Drop at 95% over 30m.

The SendOps Approach

SendOps wires all of this up automatically when you connect your AWS account: SES events flow through EventBridge into the SendOps event processor, metrics are computed in real time and stored with 1-minute granularity, dashboards break down every metric by channel, template, and tag, and alerts are configurable per channel with in-app, email, Slack, and webhook delivery. The goal is to make email feel like the rest of your observability stack — not a blind spot you check once a quarter.