System DesignFebruary 12, 2025·12 min read

System Design Interview Framework

Structured Approach to Tackle Any System Design Question

Why System Design Interviews Are Different

Unlike coding interviews with "correct" answers, system design is open-ended. Interviewers evaluate:

Structured thinking: Can you break down ambiguity?
Trade-off analysis: Do you understand pros/cons?
Scalability awareness: Can you design for millions of users?
Communication: Can you explain complex systems clearly?

There's no single "right" answer. A junior engineer might design Twitter with 1 server; a senior engineer considers 1 billion users, 500M tweets/day, and 5 data centers.

The 6-Step Framework

Step 1: Clarify Requirements (5 minutes)

Don't jump to solutions! Ask questions to scope the problem.

Step 2: Capacity Estimation (5 minutes)

Calculate traffic, storage, bandwidth to guide design decisions.

Step 3: High-Level Design (10 minutes)

Draw boxes and arrows showing main components and data flow.

Step 4: API Design (5 minutes)

Define RESTful or function interfaces for core functionality.

Step 5: Database Design (10 minutes)

Choose SQL vs NoSQL, define schema, plan for scale.

Step 6: Deep Dive (15 minutes)

Address bottlenecks, scaling, caching, monitoring, trade-offs.

Total: 50 minutes. Adjust based on 45/60 minute interview length.

Step 1: Clarify Requirements

Goal: Turn vague question into concrete requirements.

Functional Requirements (What should the system do?)

Example: "Design Twitter"

Ask:
✅ Post tweets? (Yes)
✅ Follow users? (Yes)
✅ Timeline: home feed + user profile? (Yes, both)
✅ Like/retweet? (Nice-to-have, out of scope)
✅ Search tweets? (Out of scope)
✅ Direct messages? (Out of scope)
✅ Trending topics? (Out of scope)

Result: Focus on core features only.

Non-Functional Requirements (How should it perform?)

Ask:
✅ Scale: How many users? (100M daily active users)
✅ Availability: More important than consistency? (Yes, eventual consistency OK)
✅ Latency: How fast? (Timeline loads < 1 second)
✅ Read vs Write: More reads or writes? (10:1 read-heavy)

Why these matter:
- 100M users → Need distributed system
- Availability > Consistency → Use NoSQL, caching
- Read-heavy → Focus on read optimization (caching, CDN)
- < 1s latency → Pre-compute timelines, use CDN

💡 Pro Tip

Write requirements on whiteboard/doc to reference later. Prevents scope creep: "Remember we decided search was out of scope."

Step 2: Capacity Estimation

Goal: Use rough numbers to guide design. Be transparent about assumptions.

Traffic Estimation

Example: Twitter-like System

Given:
- 100M daily active users (DAU)
- Each user views timeline 5 times/day
- Each timeline shows 20 tweets
- Users post 0.5 tweets/day on average

Read (Timeline Views):
- 100M users × 5 views = 500M timeline requests/day
- 500M / 86,400 seconds = ~6,000 requests/second (QPS)
- Peak (3x average) = 18,000 QPS

Write (Posting Tweets):
- 100M users × 0.5 tweets = 50M tweets/day
- 50M / 86,400 = ~600 tweets/second
- Peak = 1,800 tweets/second

Result: Read-heavy (10:1 ratio). Optimize reads with caching!

Storage Estimation

Tweets:
- 50M tweets/day × 280 chars × 2 bytes (Unicode) = ~28 GB/day text
- Plus metadata (user ID, timestamp, etc.) = ~30 GB/day
- 30 GB × 365 days = ~11 TB/year

Media (photos/videos):
- 20% of tweets have media
- 50M × 0.2 = 10M media uploads/day
- Avg 200 KB per image = 10M × 200 KB = 2 TB/day
- 2 TB × 365 = 730 TB/year

Total: ~750 TB/year
Result: Need distributed storage (S3, blob storage). Can't fit on 1 server!

Bandwidth Estimation

Incoming:
- 30 GB text + 2 TB media = ~2 TB/day
- 2 TB / 86,400 seconds = ~24 MB/second

Outgoing (users viewing tweets):
- 500M timeline views × 20 tweets × 300 bytes (avg) = 3 TB/day text
- Plus media views (assume 50% of media): 1 TB/day media
- 4 TB / 86,400 = ~46 MB/second

Result: Outgoing > incoming (read-heavy confirms earlier assumption)

⚠️ Common Mistake

Don't spend 20 minutes on precise calculations. Interviewers want to see you understand scale, not exact math. Say: "Roughly 10,000 QPS" not "9,847.3 QPS".

Step 3: High-Level Design

Goal: Draw 5-10 boxes showing architecture. Start simple, add complexity.

Version 1: Naive Single-Server

┌──────────┐        ┌──────────┐        ┌──────────┐
│  Client  │───────▶│  Server  │───────▶│ Database │
└──────────┘        └──────────┘        └──────────┘

Works for:
✅ 100 users
❌ 100M users (single point of failure, can't scale)

Version 2: Add Load Balancer + Multiple Servers

┌──────────┐
                   │   Load   │
   ┌───────────────│ Balancer │
   │               └──────────┘
   │                     │
   │         ┌───────────┼───────────┐
   ▼         ▼           ▼           ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Server 1│ │Server 2│ │Server 3│ │Server N│
└────────┘ └────────┘ └────────┘ └────────┘
     │         │           │           │
     └─────────┴───────────┴───────────┘
                     │
                ┌────▼─────┐
                │ Database │
                └──────────┘

Improvements:
✅ Horizontal scaling: add more servers
✅ No single point of failure (if 1 server dies, others handle)
❌ Database still bottleneck

Version 3: Add Caching Layer

┌──────────┐
                   │   Load   │
                   │ Balancer │
                   └──────────┘
                        │
         ┌──────────────┼──────────────┐
         ▼              ▼              ▼
    ┌────────┐     ┌────────┐     ┌────────┐
    │Server 1│     │Server 2│     │Server 3│
    └────────┘     └────────┘     └────────┘
         │              │              │
         └──────────────┼──────────────┘
                        │
                   ┌────▼─────┐
                   │  Redis   │  ◄── Cache hot data
                   │  Cache   │
                   └──────────┘
                        │
                   ┌────▼─────┐
                   │ Database │  ◄── Cold storage
                   └──────────┘

Benefits:
✅ 80% of reads from cache (< 1ms latency)
✅ Database load reduced 5x
✅ Redis: in-memory, very fast

Version 4: Separate Read/Write + CDN

┌────────┐      ┌─────────┐      ┌────────────┐
│  User  │─────▶│   CDN   │─────▶│ Static     │
│(Browser)│      │(Images, │      │ Assets     │
└────────┘      │ JS, CSS)│      │ (S3/Blob)  │
    │           └─────────┘      └────────────┘
    │
    │           ┌──────────┐
    └──────────▶│   Load   │
                │ Balancer │
                └──────────┘
                     │
         ┌───────────┼───────────┐
         ▼           ▼           ▼
    ┌────────┐  ┌────────┐  ┌────────┐
    │ Write  │  │ Write  │  │  Read  │
    │Server 1│  │Server 2│  │Servers │
    └────────┘  └────────┘  └────────┘
         │           │           │
         └───────────┼───────────┘
                     │
         ┌───────────┴───────────┐
         ▼                       ▼
    ┌─────────┐           ┌──────────┐
    │ Primary │──────────▶│ Replicas │
    │Database │ Replicate │ (Readers)│
    │(Writer) │           └──────────┘
    └─────────┘

Benefits:
✅ CDN: assets served from nearest edge location (20-200ms saved)
✅ Write servers: optimized for inserts (no caching)
✅ Read servers: optimized for queries (heavy caching)
✅ Database replication: reads scale horizontally

Step 4: API Design

Goal: Define clear interfaces. Use RESTful conventions.

// Post a tweet
POST /api/v1/tweets
Request Body:
{
  "user_id": "uuid",
  "text": "Hello world!",
  "media_urls": ["https://cdn.example.com/img1.jpg"]
}
Response:
{
  "tweet_id": "uuid",
  "created_at": "2025-02-12T10:30:00Z"
}

// Get user timeline (home feed)
GET /api/v1/timeline?user_id={uuid}&cursor={cursor}&limit=20
Response:
{
  "tweets": [
    {
      "tweet_id": "uuid",
      "user_id": "uuid",
      "username": "alice",
      "text": "...",
      "created_at": "...",
      "media_urls": [...],
      "likes_count": 42,
      "retweets_count": 10
    },
    // ... 19 more
  ],
  "next_cursor": "base64_encoded_timestamp"
}

// Follow a user
POST /api/v1/follow
Request Body:
{
  "follower_id": "uuid",
  "followee_id": "uuid"
}
Response:
{
  "success": true
}

// Get user profile
GET /api/v1/users/{user_id}
Response:
{
  "user_id": "uuid",
  "username": "alice",
  "bio": "...",
  "followers_count": 1000,
  "following_count": 500,
  "tweets_count": 2000
}

Key Decisions to Mention:

Pagination: Cursor-based (better for real-time feeds than offset)
Rate Limiting: 300 tweets/hour per user, 1000 API calls/15min
Authentication: JWT tokens in Authorization header
Versioning: /api/v1/ allows future breaking changes

Step 5: Database Design

Goal: Choose appropriate database(s) and define schema.

SQL vs NoSQL Decision Matrix

Factor	SQL (Postgres)	NoSQL (DynamoDB)
Schema	Fixed, enforced	Flexible ✓
Transactions	ACID ✓	Eventual consistency
Joins	Powerful ✓	Difficult/expensive
Scaling	Vertical + sharding	Horizontal ✓
Write Speed	Moderate	Very Fast ✓

For Twitter: Use Both!

-- PostgreSQL: User data (needs transactions)
CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  username VARCHAR(50) UNIQUE NOT NULL,
  email VARCHAR(255) UNIQUE NOT NULL,
  created_at TIMESTAMP DEFAULT NOW(),
  bio TEXT,
  profile_image_url TEXT
);

CREATE TABLE follows (
  follower_id UUID REFERENCES users(user_id),
  followee_id UUID REFERENCES users(user_id),
  created_at TIMESTAMP DEFAULT NOW(),
  PRIMARY KEY (follower_id, followee_id)
);

-- Why SQL: Need to enforce unique username, email constraints. 
-- Following relationships need joins for "mutual follows" queries.

// DynamoDB (NoSQL): Tweets and Timeline (needs scale + speed)
{
  TableName: "Tweets",
  PartitionKey: "tweet_id",  // UUID
  SortKey: "created_at",      // Timestamp
  Attributes: {
    user_id: "UUID",
    username: "String",       // Denormalized for fast reads!
    text: "String",
    media_urls: ["String"],
    likes_count: "Number",
    retweets_count: "Number"
  }
  // Global Secondary Index: user_id + created_at (for user profile view)
}

{
  TableName: "Timeline",
  PartitionKey: "user_id",    // Owner of timeline
  SortKey: "created_at",      // Latest first
  Attributes: {
    tweet_id: "UUID",
    // Fan-out on write: when user tweets, add to all followers' timelines
  }
  // TTL: 7 days (auto-delete old timeline entries)
}

// Why NoSQL:
// - 50M tweets/day needs horizontal scaling
// - Schema may evolve (polls, videos, etc.)
// - Read-heavy: denormalize for speed (store username in tweet)
// - Fan-out on write: pre-compute timelines in Timeline table

Step 6: Deep Dive & Scaling

Goal: Address potential bottlenecks and demonstrate senior thinking.

1. Timeline Generation: Fan-out Strategies

Problem: When user posts tweet, how do 10,000 followers see it?

Option A: Fan-out on Write (Push)
- Store tweet in each follower's timeline immediately
- Read: Fast (just query user's timeline table)
- Write: Slow for celebrities (1M followers = 1M writes)

Option B: Fan-out on Read (Pull)
- Store tweets only in user's own table
- Read: Slow (join tweets from all followed users)
- Write: Fast (1 write only)

Hybrid Solution (What Twitter Actually Does):
- Regular users (<10K followers): Fan-out on write
- Celebrities (>10K followers): Fan-out on read
- At read time: merge pre-computed timeline + celebrity tweets
- Best of both worlds!

2. Caching Strategy

What to Cache:
✅ Timeline: Top 100 tweets per user (Redis Sorted Set, TTL 5min)
✅ User profiles: Hot users (celebrities) cached (TTL 1hr)
✅ Tweet metadata: Likes/retweets count (updated async)

Cache Invalidation:
- New tweet: invalidate author's timeline + followers' timelines
- Use pub/sub (Redis) to notify cache servers
- Accept slight delay (eventual consistency)

Cache-Aside Pattern:
1. App checks cache
2. Cache miss: query database
3. Store in cache with TTL
4. Return to user

3. Database Sharding

Problem: Postgres has 1 billion users, can't fit on 1 server.

Shard by user_id:
- Hash(user_id) % N → determines which database shard
- Shard 1: user_id 0-249M
- Shard 2: user_id 250M-499M
- Shard 3: user_id 500M-749M
- Shard 4: user_id 750M-999M

Pros:
✅ Even distribution
✅ Each shard handles 250M users

Cons:
❌ Cross-shard queries hard (e.g., "users who follow user A and B")
❌ Rebalancing when adding shards is complex

Mitigation:
- Use consistent hashing to minimize re-sharding
- Denormalize to avoid cross-shard queries

4. Monitoring & Observability

Metrics to Track:
📊 Server health: CPU, memory, disk I/O per instance
📊 API latency: p50, p95, p99 per endpoint
📊 Error rates: 4xx, 5xx by endpoint
📊 Database: Query time, connection pool size, replication lag
📊 Cache: Hit rate (target >80%), eviction rate

Alerts:
🚨 P99 latency > 2 seconds
🚨 Error rate > 1%
🚨 Replication lag > 10 seconds
🚨 Cache hit rate < 70%

Tools:
- Prometheus + Grafana: Metrics dashboards
- Jaeger: Distributed tracing
- ELK Stack: Centralized logging

5. Security Considerations

Rate Limiting: Prevent spam/abuse (300 tweets/hr, 1000 API calls/15min)
Authentication: JWT tokens with 1hr expiry
Authorization: Check tweet.user_id === auth.user_id before edit/delete
Input Validation: Sanitize tweet text, validate URLs
HTTPS: Encrypt all traffic (TLS 1.3)
DDoS Protection: CloudFlare/AWS Shield at edge

Trade-offs to Discuss

Interviewers LOVE when you mention trade-offs without being asked:

Consistency vs Availability (CAP Theorem)

"I chose eventual consistency for timelines because 1-2 second delay is acceptable for availability. For payment systems, I'd choose strong consistency."

Latency vs Consistency

"Caching reduces latency to 10ms but risks showing stale data for 1 minute. Acceptable for social media, not for stock prices."

Storage Cost vs Query Speed

"Denormalizing (storing username in tweet) costs 50 bytes × 50M tweets = 2.5GB extra, but avoids join, saving 100ms per query. Worth it for read-heavy system."

Complexity vs Performance

"Hybrid fan-out adds complexity (2 code paths) but handles both regular users and celebrities efficiently. Simpler fan-out on read would break for Elon Musk tweets."

Common Mistakes to Avoid

❌ Jumping to implementation too quickly

Ask clarifying questions first! "Should we support video tweets?"

❌ Focusing only on happy path

Discuss: What if server crashes? Database is down? User spams API?

❌ Ignoring scale

"Just use Postgres" works for 1K users, not 100M. Always consider scale from step 2.

❌ Over-engineering early

Start simple (monolith), then add complexity (microservices, Kafka) when explaining scale.

❌ Silent drawing

Narrate while drawing: "I'm adding a cache here to reduce database load..."

Sample Questions to Practice

Beginner: Design URL Shortener (bit.ly), Design Pastebin, Design Rate Limiter
Intermediate: Design Instagram, Design YouTube, Design Uber, Design WhatsApp
Advanced: Design Google Search, Design Amazon, Design Netflix, Design Distributed Cache

Practice these with a friend or record yourself. The goal is to speak confidently and demonstrate structured thinking, not memorize solutions.

Key Takeaways

✅ Clarify first: Scope the problem before designing
✅ Estimate capacity: Use numbers to guide decisions
✅ Start simple: Monolith → Load balancer → Caching → Microservices
✅ Discuss trade-offs: Every decision has pros/cons
✅ Think about failure: What breaks at scale? How to recover?
✅ Communicate clearly: Draw, narrate, check understanding
✅ No perfect answer: Show thought process, adapt to feedback

Further Resources

Book: "Designing Data-Intensive Applications" by Martin Kleppmann
Course: "Grokking the System Design Interview" (educative.io)
YouTube: System Design Interview channel, Gaurav Sen
Practice: Use our System Design questions with real-world scenarios

Continue Learning

Hash Maps: When and Why — Essential for caching layers in system design
Big O Notation Explained — Understand scalability analysis fundamentals
STAR Method for Behavioral Interviews — Prepare for the behavioral portion of your interview
Start Practicing — Apply system design concepts to real scenarios