Email Bounce Handling: Hard vs Soft & Suppression

Bounces are the loudest signal a mailbox provider will ever send you, and most pipelines treat them as background noise. Ignore them for a few days and reputation collapses; act on them inside seconds and you keep your inbox placement steady.

Hard bounce vs soft bounce: the SMTP taxonomy

Disciplined email bounce handling starts at the SMTP response. A reply that begins with 5 is a permanent failure: the address does not exist, the domain refuses you, or the policy at the receiving side has decided this conversation is over. A reply that begins with 4 is transient: the mailbox is full, the receiver is busy, greylisting kicked in, or your sending rate is being shaped.

You can squeeze a lot of meaning out of the enhanced status code that follows. The first digit repeats the class, the second describes the subject (addressing, mailbox, network, protocol), and the third pins down the specific cause. Treat the enhanced status as authoritative and the human-readable suffix as a hint.

550 5.1.1   User unknown                 hard, address does not exist
550 5.1.10  Recipient address rejected   hard, null MX or address invalid
553 5.7.1   Sender not authorized        hard, policy block (SPF/DMARC failure)
552 5.2.2   Mailbox full                 hard at most providers, soft at some
421 4.7.0   Try again later              soft, rate or reputation throttling
451 4.3.0   Temporary local problem      soft, retry with backoff
452 4.2.2   Mailbox temporarily full     soft, retry then escalate if persistent

The grey area sits in 5.2.2 and 4.2.2: Gmail says full mailboxes are temporary; Yahoo treats persistently full mailboxes as permanent after a while. The right answer is not to argue with the code, but to encode the rule that any 5xx blocks the address immediately, and any 4xx retries until a bounded window expires.

Parse the DSN, do not regex the body

When a remote server bounces a message after it has accepted it, you receive a Delivery Status Notification (DSN) at the return path. RFC 3464 defines the shape: a multipart/report MIME message with a machine-readable message/delivery-status part. Read that part. Do not grep the human prose, because every provider phrases the failure differently and they change wording without warning.

Content-Type: message/delivery-status

Reporting-MTA: dns; mx.sender.example
Action: failed
Final-Recipient: rfc822; user@destination.example
Status: 5.1.1
Diagnostic-Code: smtp; 550 5.1.1 The email account that you tried to reach does not exist

The Status: line is the source of truth. Bucket on the first digit of the enhanced code. Class 5 goes straight to the suppression list. Class 4 goes back to the retry queue with a counter and a deadline. Everything else is a parser bug, and a parser bug should fail loudly during development, not silently in production.

The suppression list is the durable record

A suppression list is the persistent record of every address you have decided not to send to, with the reason and the evidence. It lives in your database, not in the mail server's memory and not in a Redis cache that evaporates on deploy. Every send path consults it before enqueueing a message, and every bounce or complaint writes to it.

CREATE TABLE suppressions (
    address     citext PRIMARY KEY,
    reason      text NOT NULL CHECK (reason IN (
                  'hard_bounce',
                  'complaint',
                  'spam_trap',
                  'manual',
                  'unsubscribe',
                  'soft_bounce_giveup'
                )),
    smtp_code   text,
    enhanced    text,
    diagnostic  text,
    source      text NOT NULL,
    first_seen  timestamptz NOT NULL DEFAULT now(),
    last_seen   timestamptz NOT NULL DEFAULT now(),
    expires_at  timestamptz
);

CREATE INDEX suppressions_reason_first_seen_idx
    ON suppressions (reason, first_seen);

A few details matter. The address column uses citext so User@Example.com and user@example.com collapse to one row. The expires_at column is nullable on purpose: hard bounces and complaints should never expire, while soft-bounce give-ups and policy-driven suppressions usually should. Write with an upsert keyed on address so repeated bounces from a retry storm collapse into one row and refresh last_seen rather than throwing a unique-violation. Treat the table as append-mostly; downgrading a reason from hard_bounce to manual is almost always a mistake.

Robust email bounce handling also means consulting this table at every entry point: the REST API, the SMTP relay, the templated send. If the address is suppressed, you reject the send and report it to the caller; you do not silently drop or queue.

Retry policy for soft bounces

Soft bounces deserve patience, not optimism. Use jittered exponential backoff so a single overloaded MX does not get hammered by a thundering herd on the next minute boundary. A workable schedule for transactional traffic looks like 1 minute, 5 minutes, 25 minutes, 2 hours, then 12 hours.

The give-up window should be short. For password resets and verification codes, anything past 30 minutes is useless to the user; for receipts, 24 hours is a sane cap; for shipping notifications, 48 hours. Never carry a transactional retry past 72 hours, because by then the user has moved on and the receiver assumes you are abandoned mail.

When the give-up fires, do not just log and drop. Write a row to suppressions with reason soft_bounce_giveup and set expires_at = now() + 30 days. That lets a future signup with the same address recover the relationship without manual cleanup, while still stopping the bleed today.

Key the retry counter on (message_id, recipient) rather than the queue row, so requeues from worker restarts do not silently double the attempt budget. The send budget is a property of the message, not of the queue plumbing.

Webhook-driven suppression beats polling

If your upstream provider exposes a bounce report API, do not poll it. Polling adds five to fifteen minutes of lag, and mailbox providers expect you to stop within seconds of the first hard bounce. Subscribe to the bounce and complaint webhooks, and do the suppression write inside the same transaction that marks the message terminal.

{
  "event": "bounce",
  "bounce_type": "permanent",
  "smtp_code": "550",
  "enhanced_status": "5.1.1",
  "diagnostic": "smtp; 550 5.1.1 user unknown",
  "recipient": "user@destination.example",
  "message_id": "01HF7K9P3M2N4Q8R6T0V5W2X3Y",
  "timestamp": "2026-05-19T10:14:32Z"
}

Ack the webhook quickly, then do the work asynchronously. A handler that takes 800 ms to write to Postgres before returning 200 is a handler that will be retried by the provider, often duplicating the event. Return 2xx within a few hundred milliseconds and push the suppression upsert into an internal queue. Make the handler idempotent on the tuple (message_id, recipient, event) so retries collapse cleanly. Target a p99 of under 5 seconds from bounce-at-receiver to address-suppressed-in-your-database.

Spam traps and complaint spikes

Hard bounces are the friendly warnings. Spam traps and complaints are the punishments.

Pristine traps are addresses that were never opted in; they appear in scraped lists and recycled domain catches. Hitting one tells Spamhaus, SURBL, and the receiving provider that you are not curating your list. Recycled traps are old, dead addresses that have been repurposed by the provider as a sentinel. You catch recycled traps by suppressing on the first hard bounce; you miss them when you retry a 550 "just in case."

Complaint rates have hard thresholds at consumer mailbox providers. Gmail Postmaster Tools shows a user-reported spam rate, and once that rate creeps above 0.1% you see throttling; above 0.3% you are routed to bulk for most of your traffic. A single morning of unsuppressed hard bounces on a Gmail or Yahoo lane can cost you a week of inbox placement, and there is no support ticket to file.

Consumer mailbox providers expect seconds, not minutes

Gmail, Yahoo, and Microsoft publish this as a postmaster expectation, and they enforce it without notice. The penalty is silent: first throttling on new connections, then greylisting on resumed ones, then bulk-folder routing for the whole sending domain. You do not get an error message. You get a slow decline in opens and clicks that looks at first like a seasonality dip.

The implementation that meets the expectation is small. Receive the webhook, validate the signature, enqueue the suppression upsert, return 200. The worker performs the upsert with ON CONFLICT (address) DO UPDATE and writes the terminal state on the original message in the same transaction. Nothing else needs to be synchronous.

Patterns that quietly kill sender reputation

These are the failure modes that show up in postmortems after a deliverability incident. Each one is small on its own and cumulative in production.

Retrying 5xx codes "just in case" — every retry to a confirmed dead address is a vote against you at the receiver.
Holding the suppression list in memory or in a cache only, then losing it on deploy — the next boot resends to every address you just learned to avoid.
Treating 4xx as permanent — wastes legitimate addresses, surfaces as user complaints, and trains your team to ignore real soft bounces.
Sharing one suppression list across unrelated brands or IP pools — a hard bounce on brand A should not silently block a different relationship on brand B.
Mixing marketing and transactional decisions in the same table without segmentation — an unsubscribe from a newsletter should not block a receipt for the same purchase.
Skipping signature verification on the bounce webhook — forged events can be used to suppress arbitrary addresses and lock users out.
Logging the diagnostic but not storing it — six weeks later, when a customer asks why mail stopped, the answer is "we do not know."

Solid email bounce handling is not glamorous, but it is the single highest-leverage piece of deliverability hygiene you will ever ship. Get the SMTP classification right, persist the decision, react in seconds, and the rest of the stack rewards you with stable inbox placement.