Why Your AWS SES Setup Is One Bounce Spike Away From Silence

Amazon SES is genuinely good infrastructure. It's reliable, it's cheap, and if you're already running workloads on AWS, it fits naturally into your stack. For most teams, the decision to use SES isn't wrong—it's actually the right call.

The problem shows up later, usually at the worst possible time.

Bounce rate creeps from 1% to 3.5% over two weeks. Nobody notices because there's no alert. On day fifteen, SES suspends your sending. Customers stop getting password reset emails. Support tickets flood in. Someone pulls up CloudWatch and tries to piece together what happened, squinting at raw event logs that were never wired up to anything useful in the first place.

This isn't a hypothetical. It's a pattern that plays out regularly, and the teams it hits hardest are the ones who did everything right on the infrastructure side—they picked SES, they integrated it properly, they moved on. What they didn't do is build an operational layer around it. That part is easy to defer, until it isn't.

The Thresholds That Can End Your Sending

SES enforces two hard limits that will get your account suspended if you breach them:

Complaint rate above 0.10% (that's one complaint per thousand sends)
Bounce rate above 5%

These are not generous thresholds. A complaint rate of 0.10% means if you send 50,000 emails in a day and 51 recipients mark them as spam, you're in the danger zone. Bounce rates can spike quickly if you're sending to a list that hasn't been cleaned recently, or if a data import introduced bad addresses.

What makes this particularly treacherous is the feedback loop latency. Bounce and complaint notifications from SES arrive asynchronously via SNS. If you haven't wired up SNS notifications to something that actually alerts your team, you're flying blind. You can check the SES console manually, but that only helps if someone remembers to check it, knows where to look, and has AWS console access.

By the time a customer reports they're not getting emails, you've already been suspended, and the damage is done.

The Patchwork Monitoring Stack (And Why It Doesn't Hold)

The conventional solution is to build your own observability pipeline. If you go down this path, here's roughly what it looks like:

Configure SES to publish bounce, complaint, and delivery events to SNS topics
Subscribe an SQS queue or Lambda function to those topics to process events
Persist the event data somewhere—usually S3 or DynamoDB
Build CloudWatch dashboards or write custom metrics queries to surface the data
Set up CloudWatch alarms to alert when thresholds are approached
Figure out IAM permissions so the right people can actually see the dashboards
Maintain all of this indefinitely as your team, volume, and AWS account structure change

This is weeks of engineering work, minimum. And that estimate assumes everything goes smoothly, your SNS event format doesn't change, and nobody accidentally misconfigures the Lambda retry behavior so you start dropping events.

The deeper issue is that this stack is fragile by nature. It's glue code across five AWS services, maintained as a side project by engineers who have other priorities. When something breaks—and it will—diagnosing the failure means tracing events through SNS, SQS, Lambda, and CloudWatch to find where things went wrong. That debugging session happens at 11pm when a customer is yelling.

There's also a visibility problem that the pipeline doesn't solve. Even after you've built all of this, the dashboards live in CloudWatch, which means they're accessible only to people with AWS console access. Your marketing team still has to file a ticket to find out whether a campaign sent successfully. Support still can't look up whether a specific customer received a transactional email. The engineering burden was real, and you still haven't given the rest of your organization the visibility they need.

What Real-Time Monitoring Actually Requires

If you're going to monitor SES effectively—whether you're building it yourself or using a tool—there are a few things that need to be true:

Sub-minute alerting on threshold approach. By the time your bounce rate hits 5%, it's too late. You want an alert when it crosses 2%. You want another at 3.5%. The alert needs to reach someone who can act—ideally in Slack or PagerDuty, not buried in a CloudWatch email that goes to a shared inbox.

Historical trend visibility, not just snapshots. A bounce rate of 2.8% looks different if it's been stable for three months versus if it jumped from 0.4% yesterday. Point-in-time metrics without trend context lead to bad decisions—either panic about normal variance, or miss a real escalation because the current number looks acceptable.

Per-domain and per-configuration-set granularity. If you're sending transactional email and marketing email from the same SES account but different configuration sets, you need to see their metrics separately. A marketing campaign with a bad list shouldn't contaminate your view of transactional delivery health.

Delivery event tracking at the message level. "Did this email get delivered?" is a question support teams ask constantly. Answering it requires correlating your message ID with the delivery event SES eventually sends back. That correlation needs to be stored and queryable without requiring someone to write a CloudWatch Logs Insights query from scratch every time.

Access that doesn't require AWS credentials. This is underrated. The value of email visibility drops sharply if accessing it requires IAM permissions. Non-engineers need to be able to check delivery status, review complaint trends, and understand what's happening without a developer in the loop.

Template Management: The Other Foot

Bounce monitoring gets most of the attention because the consequences are immediate and severe. But template management is a slower, quieter source of operational pain that teams hit around the same time.

SES has a native template system. It works. The problem is that managing templates through the SES console means editing production content in a text box, with no version history, no review process, and no way to preview how the template renders across email clients before you push it live.

In practice, teams respond to this in one of a few ways:

They stop using SES templates and do all rendering application-side, which means template changes require code deployments
They build custom tooling to manage SES templates via the API, which becomes another maintenance burden
They just edit templates directly in the console and accept the risk

None of these are good. Template changes are high-risk operations—a broken variable reference or a malformed HTML tag can make thousands of emails render incorrectly before anyone notices. That's the kind of change that belongs in a pull request, reviewed by someone who can catch the mistake before it ships.

Git-native template management—where templates live in a repository, changes go through review, and deployment is auditable—is the operational pattern that actually fits how software teams work. It's just not something SES provides out of the box.

The Real Cost of Deferring This

Let's be direct about the math here.

Building a basic SES observability stack yourself—SNS configuration, Lambda processing, CloudWatch dashboards, alerting—takes roughly two to four weeks of engineering time, depending on how thorough you want to be. Maintaining it over the following year, accounting for debugging, keeping up with AWS API changes, and responding to operational incidents, adds up to meaningful ongoing cost.

Against that, the cost of one SES suspension event—the engineering time to diagnose and recover, the customer impact, the support load, the potential reputation damage if high-value customers were affected—can easily exceed the annual cost of a dedicated tool.

The teams that defer monitoring usually do it for a sensible reason: they're small, they're moving fast, and building an observability stack feels like yak shaving when there are product features to ship. That calculus is understandable. It just doesn't hold once you hit any kind of scale, and "any kind of scale" tends to arrive faster than expected.

What to Actually Do

If you're running SES in production and you don't have bounce and complaint rate alerting configured today, that's the first thing to fix. At minimum:

Enable SES event publishing for bounces and complaints on every configuration set you're using
Route those events somewhere actionable—an SNS topic that triggers a Lambda to post to Slack works, even if it's rough
Set a calendar reminder to manually check your SES account-level metrics once a week until you have automated alerting

This won't give you trend visibility or message-level delivery tracking, but it will prevent the scenario where you find out about a suspension from a customer.

For teams that are past the "duct tape and willpower" stage—where email is business-critical and the engineering cost of DIY observability is real—the honest answer is that a dedicated control plane makes more sense than another internal tool. SendOps is what we built for exactly this use case: it connects to your existing SES setup, surfaces the metrics and alerting you need, and gives your whole team visibility without touching your sending path or requiring AWS console access.

The Takeaway

SES is excellent infrastructure. The operational gap around it is real, and it's not something you can reason away by being careful. Bounce spikes happen. Lists get stale. Campaign sends hit bad segments. The teams that come through those events intact are the ones who knew the spike was happening before their account got flagged—not the ones who found out after.

Invest in observability before you need it. The window between "we should probably set up monitoring" and "why aren't our emails sending" is shorter than it looks.