System DesignMarch 3, 2026·12 min read

Design a Notification System: System Design Interview Guide

Design multi-channel notifications with priority routing, rate limiting, and user preferences

Why Design a Notification System?

Notification systems are asked frequently because they touch on event-driven architecture, message queues, user preferences, and multi-channel delivery. Every large application has one, and designing it well requires balancing reliability with user experience.

Multi-channel delivery: Push, email, SMS, in-app — each with different constraints
Priority and routing: Urgent alerts vs marketing — different SLAs
Rate limiting: Don't spam users, respect preferences
Reliability: Critical notifications (2FA, payment) must never be lost

Step 1: Requirements

Functional Requirements

Core features:
- Send notifications via multiple channels (push, email, SMS, in-app)
- User preference management (opt-in/out per channel per type)
- Priority levels (critical, high, medium, low)
- Template management (reusable notification templates)
- Delivery tracking and analytics

Out of scope:
- Content management / marketing campaigns
- A/B testing of notification content
- Rich media notifications (images, actions)
- Scheduling (send at user's local morning)

Non-Functional Requirements

Scale:
- 500M users
- 10B notifications per day
- Peak: 500K notifications/second

Performance:
- Critical notifications (2FA, alerts): < 5 seconds delivery
- Standard notifications: < 30 seconds
- Marketing: within 1 hour

Reliability:
- Critical: exactly-once delivery, 99.99% success rate
- Standard: at-least-once, best-effort
- No duplicate notifications to users

Key insight: Not all notifications are equal.
Priority-based routing with different SLA guarantees.

Step 2: High-Level Architecture

┌──────────────────────────────────────────────────────────────┐
│                     NOTIFICATION SOURCES                      │
│     (Services that trigger notifications via API/events)      │
└───────────────────────┬──────────────────────────────────────┘
                        │
                        ▼
┌──────────────────────────────────────────────────────────────┐
│                   NOTIFICATION SERVICE                        │
│         Validate, enrich, check preferences, route           │
└───────────────────────┬──────────────────────────────────────┘
                        │
              ┌─────────┼─────────┐
              │         │         │
              ▼         ▼         ▼
        ┌──────────┐ ┌─────┐ ┌──────┐
        │ CRITICAL │ │HIGH │ │ LOW  │
        │  QUEUE   │ │QUEUE│ │QUEUE │
        └────┬─────┘ └──┬──┘ └──┬───┘
             │          │       │
             ▼          ▼       ▼
        ┌──────────────────────────────────────────────────┐
        │              DELIVERY WORKERS                     │
        │    (Channel-specific: push, email, SMS, in-app)  │
        └──────┬──────────┬──────────┬──────────┬──────────┘
               │          │          │          │
               ▼          ▼          ▼          ▼
          ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
          │  APNs  │ │ SES /  │ │Twilio  │ │WebSocket│
          │  FCM   │ │SendGrid│ │        │ │  SSE   │
          └────────┘ └────────┘ └────────┘ └────────┘

Step 3: Notification Processing Pipeline

When a service triggers a notification:

1. VALIDATION
   - Verify required fields (user_id, type, content)
   - Check notification type exists in template registry
   - Validate channel-specific requirements (email needs subject, etc.)

2. USER PREFERENCE CHECK
   - Query user preferences: "Does this user want push for this type?"
   - Check quiet hours (don't send at 3am unless critical)
   - Check rate limits (max 5 marketing per day)

3. TEMPLATE RENDERING
   - Load template for notification type
   - Inject user-specific variables (name, data)
   - Render per-channel variants (push is short, email is long)

4. DEDUPLICATION
   - Hash: (user_id, notification_type, content_hash, time_window)
   - If duplicate exists within window → skip
   - Prevents "liked your post" x 50 in 1 minute

5. PRIORITY ROUTING
   - Critical (2FA, payment) → high-priority queue, dedicated workers
   - Standard (social, updates) → normal queue
   - Marketing (promotions) → low-priority queue, rate-limited

6. CHANNEL DELIVERY
   - Push: send to APNs (iOS) or FCM (Android)
   - Email: send via SES/SendGrid
   - SMS: send via Twilio
   - In-app: write to user's notification inbox + WebSocket push

Step 4: Deep Dive — Priority Queues

Different priorities need different treatment:

CRITICAL (2FA codes, payment alerts, security):
- Dedicated queue with dedicated workers
- No batching — process immediately
- Retry aggressively (3 retries, 1 second apart)
- Dead letter queue → page on-call if DLQ grows
- SLA: < 5 seconds, 99.99% delivery

HIGH (social interactions, comments, follows):
- Standard queue with auto-scaling workers
- Batch processing allowed (up to 100ms batches)
- Retry with backoff (3 retries, exponential)
- SLA: < 30 seconds, 99.9% delivery

LOW (marketing, weekly digests, recommendations):
- Low-priority queue, processed during off-peak
- Heavy rate limiting (max 3 per user per day)
- No retry on failure (best-effort)
- SLA: within 1 hour, 95% delivery

Implementation: Kafka with separate topics per priority
- notification.critical → 20 partitions, 20 consumers
- notification.standard → 50 partitions, auto-scaled consumers
- notification.marketing → 10 partitions, rate-limited consumers

Step 5: Rate Limiting

Rate limiting prevents notification fatigue:

Per-user limits:
- Critical: unlimited (safety-critical)
- Social: max 20 per hour, aggregate if exceeded
- Marketing: max 3 per day

Aggregation strategy:
When rate limit hit, aggregate instead of dropping:
- "Alice, Bob, and 12 others liked your post"
- Collect events in buffer, send aggregated after cooldown

Implementation:
- Redis sliding window counter per (user_id, notification_type)
- Key: ratelimit:{user_id}:{type}:{hour_bucket}
- INCR on each notification, check against limit
- If exceeded: add to aggregation buffer
- Background job flushes aggregation buffers every 5 minutes

Global rate limits:
- Per-channel: respect provider limits (FCM: 1000/sec per project)
- Per-sender: prevent noisy services from starving others
- Implement token bucket at the delivery worker level

Step 6: Delivery Tracking

Track every notification through its lifecycle:

States:
CREATED → QUEUED → SENT → DELIVERED → READ → CLICKED
                     ↓
                   FAILED → RETRIED → SENT (or DEAD_LETTERED)

For each state transition:
- Write event to analytics pipeline (Kafka → data warehouse)
- Update notification status in database
- Push real-time metrics to monitoring dashboard

Delivery receipts:
- Push: APNs/FCM provide delivery receipts
- Email: track opens (pixel tracking) and clicks (link wrapping)
- SMS: Twilio provides delivery status webhooks
- In-app: mark as read when user opens notification panel

Failure handling:
- Invalid device token → mark token as invalid, remove
- Bounced email → mark email as invalid after 3 bounces
- SMS delivery failure → retry once, then skip
- All channels failed → log for investigation

Step 7: User Preferences

User preference model:

preferences = {
  user_id: "user_123",
  channels: {
    push: { enabled: true, quiet_hours: "22:00-07:00" },
    email: { enabled: true, frequency: "instant" },
    sms: { enabled: false },
    in_app: { enabled: true }
  },
  types: {
    social: { push: true, email: false, sms: false },
    security: { push: true, email: true, sms: true },
    marketing: { push: false, email: true, sms: false },
    updates: { push: true, email: "weekly_digest", sms: false }
  }
}

Storage:
- PostgreSQL for user preferences (read-heavy, cache in Redis)
- Default preferences per notification type (fallback)
- Global opt-out takes precedence over everything

Quiet hours:
- Store in user's timezone
- Convert to UTC at send time
- Critical notifications bypass quiet hours

Key Takeaways for the Interview

Not all notifications are equal: Priority-based routing with different SLAs is the core insight
Rate limiting prevents fatigue: Aggregate instead of drop when limits are hit
Deduplication matters: Hash-based dedup within time windows prevents spam
User preferences are complex: Per-channel, per-type, with quiet hours and frequency options
Track everything: Full lifecycle tracking enables debugging and optimization

Practice This on HireReady

Notification system design appears at Meta, Amazon, Uber, and most SaaS companies. Practice explaining event-driven architecture with our AI voice interviewer.