Email Alerting for AWS SES: What to Monitor, When to Alert, and How to Stop Finding Out From Customers
Amazon SES is excellent at sending email, but nearly silent about what happens after. There are no meaningful built-in alerts for rising bounce rates, complaint spikes, delivery degradation, or template failures. As a result, many teams either spend weeks stitching together SNS, Lambda, EventBridge, and CloudWatch pipelines, or they discover problems only after customers stop receiving emails. This post breaks down the operational side of SES: the four metrics that actually matter, the thresholds that should trigger action, and how to design an alerting system that surfaces issues before they become incidents. It also explores why observability and team-wide visibility are often the missing layer in otherwise reliable SES setups.

Kash Sajadi
If you're sending email at any meaningful volume through Amazon SES, you've probably already discovered the problem: SES is excellent at sending email, and nearly silent about what happens after. There's no built-in alerting. No dashboard that flags when your bounce rate is trending toward the suspension threshold. No notification when a complaint rate ticks up overnight.
So teams either build something themselves—a collection of SNS topics, Lambda functions, CloudWatch alarms, and EventBridge rules stitched together over a few sprints—or they find out something went wrong when a customer asks why they haven't received an email in three days.
This post is a practical guide to email alerting for SES: what signals actually matter, what thresholds should trigger action, and how to build a system that surfaces problems before they become incidents.
Why SES Alerting Is Harder Than It Should Be
SES publishes detailed sending events—deliveries, bounces, complaints, opens, clicks, rendering failures—but it doesn't surface them in a way that's operationally useful out of the box. To get real-time visibility, you need to configure a notification destination (SNS, Kinesis Firehose, or CloudWatch), then build a pipeline to process and store those events, and then layer alerting on top of that.
The infrastructure diagram for a complete SES observability stack looks something like this:
- SES Configuration Set → EventBridge or SNS
- SNS → Lambda (for processing and normalization)
- Lambda → S3 or DynamoDB (for storage)
- CloudWatch metrics (custom) → CloudWatch Alarms → SNS → email or Slack
That's a lot of moving parts for answering a simple question like "is my bounce rate okay right now?" And every piece of that stack needs IAM permissions configured correctly, error handling, and ongoing maintenance.
This is the core problem SendOps was built to solve—not to replace SES, but to add the operational layer it doesn't include. But regardless of whether you're building your own stack or using a tool, the alerting logic itself is what matters. So let's get into that.
The Four Metrics That Actually Matter
SES generates a lot of event data, but for alerting purposes, there are four metrics that deserve your attention first.
1. Bounce Rate
SES tracks your overall bounce rate across your account (or per sending identity, depending on your configuration). The hard limit that gets accounts suspended is 5%. In practice, you want to be alarmed well before that.
A reasonable alerting ladder:
- Warning at 2% — worth investigating, likely a list quality issue
- Critical at 3.5% — action required, review sending lists and suppression settings
- Emergency at 4.5% — stop or pause sending until root cause is identified
It's also worth distinguishing between hard bounces (permanent delivery failures—invalid addresses) and soft bounces (temporary failures—mailbox full, server unavailable). Hard bounces should trigger faster action because they directly affect your sender reputation and compound quickly if you keep sending to the same addresses.
2. Complaint Rate
Complaints—when recipients mark your email as spam—are weighted heavily by mailbox providers and by SES itself. The suspension threshold is 0.1%, which feels impossibly tight until you realize how quickly it can creep up from a single poorly-targeted campaign.
Recommended thresholds:
- Warning at 0.05% — review recent sends for relevance and list hygiene
- Critical at 0.08% — pause marketing sends, audit unsubscribe flows
- Emergency at 0.09% — halt non-transactional sending immediately
Because the complaint threshold is so low, complaint rate alerts need to be sensitive. A complaint rate that hits 0.08% on a Monday morning can hit 0.1% by noon if you're actively sending.
3. Delivery Rate
This one is often overlooked because teams focus on bounces and complaints, but a sudden drop in your delivery rate is frequently the first indicator that something is wrong—before bounce rates have had time to climb.
If you normally deliver 98% of emails successfully and that number drops to 91% over a two-hour window, that's worth knowing about immediately. The cause might be a temporary issue with a major ISP, a configuration problem, or something more systemic.
Set a rolling window alert (1-hour or 6-hour) rather than a point-in-time check, since delivery rates fluctuate naturally with send volume.
4. Rendering Failures
If you're using SES templates, rendering failures are easy to miss—the event is generated, but there's no retry and no obvious error in your application logs. A template change that introduces a bad variable reference can silently fail for every recipient in a send.
Alert on any rendering failures above 0% if your templates are mature. During active development, a threshold of 1–2% is more practical, but this should always be a zero-tolerance metric in production.
Alerting Architecture: What Good Looks Like
Beyond which metrics to watch, the architecture of your alerting system determines whether it actually works when you need it.
Latency matters. An alert that fires 45 minutes after a bounce rate spike is less useful than one that fires in 5 minutes. This sounds obvious, but CloudWatch metric aggregation has inherent delays, and if your Lambda processing pipeline backs up, your alerting latency compounds. Design for sub-10-minute alert delivery on critical thresholds.
Route alerts to the right people. Not every alert needs to wake up an engineer. Bounce rate warnings might route to the team managing deliverability. Rendering failures should go to whoever owns template deployments. Complaint rate spikes should include marketing leadership. Build routing logic into your alerting setup rather than sending everything to a single Slack channel that eventually gets muted.
Include context in the alert. "Bounce rate exceeded threshold" is less useful than "Bounce rate is 3.2% (up from 0.8% 2 hours ago). 847 hard bounces recorded. Most affected domain: @example.com." If the person receiving the alert has to open three dashboards to understand the situation, you've added friction at exactly the wrong moment.
Suppress noise, don't suppress signal. Alert fatigue is real. If your alerting fires on every minor fluctuation, people start ignoring it. Use rolling averages rather than point-in-time checks, and build in sensible cooldown periods so you're not getting paged repeatedly for the same ongoing issue.
What to Do When an Alert Fires
An alerting system is only as useful as the response playbook behind it. Here's a basic runbook for the most common SES alert scenarios.
High bounce rate:
- Identify which sending identities or configuration sets are affected
- Check whether the spike is concentrated in a specific campaign or send batch
- Review recently added addresses for list hygiene issues
- Cross-reference against your suppression list—are you sending to addresses that previously bounced?
- If above 4%, pause all non-critical sends until resolved
Complaint spike:
- Identify the campaign or template associated with the complaints
- Review subject line, sender name, and content for anything that might read as unsolicited
- Verify your unsubscribe flow is functioning and prominently placed
- Check whether the send went to a recently acquired or less-engaged segment
- At 0.08%+, pause marketing sends and review your list segmentation strategy
Rendering failures:
- Identify which template is failing
- Pull a sample of the failed events and check the error details
- Review recent template changes—compare against the last known-good version
- If failures are widespread, roll back the template change and redeploy the last working version
The Team Access Problem
There's a dimension of email alerting that's easy to overlook when you're thinking purely about technical implementation: who can actually see and respond to alerts?
In most SES setups, meaningful visibility requires AWS console access or CloudWatch query knowledge. That means when a complaint rate spikes on a Saturday, the person who can actually investigate is whoever has AWS credentials and knows which CloudWatch log group to query. Your deliverability manager, your email marketing lead, your support team—none of them can help, even if the problem is obviously in their domain.
Good alerting systems route the right information to the right people, and that means making alert data accessible without requiring everyone to be an AWS operator. This is one of the gaps SendOps addresses directly—alerts and dashboards are accessible to the whole team, not just whoever has console access.
Putting It Together
Email alerting for SES isn't a single alarm—it's a system with the right metrics, the right thresholds, meaningful context in the alert payload, and routing logic that gets information to the people who can act on it.
The teams that handle SES incidents well aren't the ones with the fastest engineers. They're the ones who built (or adopted) a monitoring layer that surfaces problems early, communicates clearly, and doesn't require tribal knowledge to operate.
If you're running SES at scale and still relying on customer reports as your primary incident detection mechanism, the first step is picking your four core metrics—bounce rate, complaint rate, delivery rate, rendering failures—and getting alerting in place for each one. Everything else is refinement from there.
If you'd rather not spend two sprints building the pipeline before you can start on the alerting logic, SendOps connects to your existing SES configuration and has these alerts configurable in minutes—no CloudWatch setup required.