System Design Interview Framework
Structured Approach to Tackle Any System Design Question
Why System Design Interviews Are Different
Unlike coding interviews with "correct" answers, system design is open-ended. Interviewers evaluate:
- Structured thinking: Can you break down ambiguity?
- Trade-off analysis: Do you understand pros/cons?
- Scalability awareness: Can you design for millions of users?
- Communication: Can you explain complex systems clearly?
There's no single "right" answer. A junior engineer might design Twitter with 1 server; a senior engineer considers 1 billion users, 500M tweets/day, and 5 data centers.
The 6-Step Framework
Step 1: Clarify Requirements (5 minutes)
Don't jump to solutions! Ask questions to scope the problem.
Step 2: Capacity Estimation (5 minutes)
Calculate traffic, storage, bandwidth to guide design decisions.
Step 3: High-Level Design (10 minutes)
Draw boxes and arrows showing main components and data flow.
Step 4: API Design (5 minutes)
Define RESTful or function interfaces for core functionality.
Step 5: Database Design (10 minutes)
Choose SQL vs NoSQL, define schema, plan for scale.
Step 6: Deep Dive (15 minutes)
Address bottlenecks, scaling, caching, monitoring, trade-offs.
Total: 50 minutes. Adjust based on 45/60 minute interview length.
Step 1: Clarify Requirements
Goal: Turn vague question into concrete requirements.
Functional Requirements (What should the system do?)
Example: "Design Twitter"
Ask:
✅ Post tweets? (Yes)
✅ Follow users? (Yes)
✅ Timeline: home feed + user profile? (Yes, both)
✅ Like/retweet? (Nice-to-have, out of scope)
✅ Search tweets? (Out of scope)
✅ Direct messages? (Out of scope)
✅ Trending topics? (Out of scope)
Result: Focus on core features only.Non-Functional Requirements (How should it perform?)
Ask:
✅ Scale: How many users? (100M daily active users)
✅ Availability: More important than consistency? (Yes, eventual consistency OK)
✅ Latency: How fast? (Timeline loads < 1 second)
✅ Read vs Write: More reads or writes? (10:1 read-heavy)
Why these matter:
- 100M users → Need distributed system
- Availability > Consistency → Use NoSQL, caching
- Read-heavy → Focus on read optimization (caching, CDN)
- < 1s latency → Pre-compute timelines, use CDN💡 Pro Tip
Write requirements on whiteboard/doc to reference later. Prevents scope creep: "Remember we decided search was out of scope."
Step 2: Capacity Estimation
Goal: Use rough numbers to guide design. Be transparent about assumptions.
Traffic Estimation
Example: Twitter-like System
Given:
- 100M daily active users (DAU)
- Each user views timeline 5 times/day
- Each timeline shows 20 tweets
- Users post 0.5 tweets/day on average
Read (Timeline Views):
- 100M users × 5 views = 500M timeline requests/day
- 500M / 86,400 seconds = ~6,000 requests/second (QPS)
- Peak (3x average) = 18,000 QPS
Write (Posting Tweets):
- 100M users × 0.5 tweets = 50M tweets/day
- 50M / 86,400 = ~600 tweets/second
- Peak = 1,800 tweets/second
Result: Read-heavy (10:1 ratio). Optimize reads with caching!Storage Estimation
Tweets:
- 50M tweets/day × 280 chars × 2 bytes (Unicode) = ~28 GB/day text
- Plus metadata (user ID, timestamp, etc.) = ~30 GB/day
- 30 GB × 365 days = ~11 TB/year
Media (photos/videos):
- 20% of tweets have media
- 50M × 0.2 = 10M media uploads/day
- Avg 200 KB per image = 10M × 200 KB = 2 TB/day
- 2 TB × 365 = 730 TB/year
Total: ~750 TB/year
Result: Need distributed storage (S3, blob storage). Can't fit on 1 server!Bandwidth Estimation
Incoming:
- 30 GB text + 2 TB media = ~2 TB/day
- 2 TB / 86,400 seconds = ~24 MB/second
Outgoing (users viewing tweets):
- 500M timeline views × 20 tweets × 300 bytes (avg) = 3 TB/day text
- Plus media views (assume 50% of media): 1 TB/day media
- 4 TB / 86,400 = ~46 MB/second
Result: Outgoing > incoming (read-heavy confirms earlier assumption)⚠️ Common Mistake
Don't spend 20 minutes on precise calculations. Interviewers want to see you understand scale, not exact math. Say: "Roughly 10,000 QPS" not "9,847.3 QPS".
Step 3: High-Level Design
Goal: Draw 5-10 boxes showing architecture. Start simple, add complexity.
Version 1: Naive Single-Server
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Client │───────▶│ Server │───────▶│ Database │
└──────────┘ └──────────┘ └──────────┘
Works for:
✅ 100 users
❌ 100M users (single point of failure, can't scale)Version 2: Add Load Balancer + Multiple Servers
┌──────────┐
│ Load │
┌───────────────│ Balancer │
│ └──────────┘
│ │
│ ┌───────────┼───────────┐
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Server 1│ │Server 2│ │Server 3│ │Server N│
└────────┘ └────────┘ └────────┘ └────────┘
│ │ │ │
└─────────┴───────────┴───────────┘
│
┌────▼─────┐
│ Database │
└──────────┘
Improvements:
✅ Horizontal scaling: add more servers
✅ No single point of failure (if 1 server dies, others handle)
❌ Database still bottleneckVersion 3: Add Caching Layer
┌──────────┐
│ Load │
│ Balancer │
└──────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Server 1│ │Server 2│ │Server 3│
└────────┘ └────────┘ └────────┘
│ │ │
└──────────────┼──────────────┘
│
┌────▼─────┐
│ Redis │ ◄── Cache hot data
│ Cache │
└──────────┘
│
┌────▼─────┐
│ Database │ ◄── Cold storage
└──────────┘
Benefits:
✅ 80% of reads from cache (< 1ms latency)
✅ Database load reduced 5x
✅ Redis: in-memory, very fastVersion 4: Separate Read/Write + CDN
┌────────┐ ┌─────────┐ ┌────────────┐
│ User │─────▶│ CDN │─────▶│ Static │
│(Browser)│ │(Images, │ │ Assets │
└────────┘ │ JS, CSS)│ │ (S3/Blob) │
│ └─────────┘ └────────────┘
│
│ ┌──────────┐
└──────────▶│ Load │
│ Balancer │
└──────────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Write │ │ Write │ │ Read │
│Server 1│ │Server 2│ │Servers │
└────────┘ └────────┘ └────────┘
│ │ │
└───────────┼───────────┘
│
┌───────────┴───────────┐
▼ ▼
┌─────────┐ ┌──────────┐
│ Primary │──────────▶│ Replicas │
│Database │ Replicate │ (Readers)│
│(Writer) │ └──────────┘
└─────────┘
Benefits:
✅ CDN: assets served from nearest edge location (20-200ms saved)
✅ Write servers: optimized for inserts (no caching)
✅ Read servers: optimized for queries (heavy caching)
✅ Database replication: reads scale horizontallyStep 4: API Design
Goal: Define clear interfaces. Use RESTful conventions.
// Post a tweet
POST /api/v1/tweets
Request Body:
{
"user_id": "uuid",
"text": "Hello world!",
"media_urls": ["https://cdn.example.com/img1.jpg"]
}
Response:
{
"tweet_id": "uuid",
"created_at": "2025-02-12T10:30:00Z"
}
// Get user timeline (home feed)
GET /api/v1/timeline?user_id={uuid}&cursor={cursor}&limit=20
Response:
{
"tweets": [
{
"tweet_id": "uuid",
"user_id": "uuid",
"username": "alice",
"text": "...",
"created_at": "...",
"media_urls": [...],
"likes_count": 42,
"retweets_count": 10
},
// ... 19 more
],
"next_cursor": "base64_encoded_timestamp"
}
// Follow a user
POST /api/v1/follow
Request Body:
{
"follower_id": "uuid",
"followee_id": "uuid"
}
Response:
{
"success": true
}
// Get user profile
GET /api/v1/users/{user_id}
Response:
{
"user_id": "uuid",
"username": "alice",
"bio": "...",
"followers_count": 1000,
"following_count": 500,
"tweets_count": 2000
}Key Decisions to Mention:
- Pagination: Cursor-based (better for real-time feeds than offset)
- Rate Limiting: 300 tweets/hour per user, 1000 API calls/15min
- Authentication: JWT tokens in Authorization header
- Versioning: /api/v1/ allows future breaking changes
Step 5: Database Design
Goal: Choose appropriate database(s) and define schema.
SQL vs NoSQL Decision Matrix
| Factor | SQL (Postgres) | NoSQL (DynamoDB) |
|---|---|---|
| Schema | Fixed, enforced | Flexible ✓ |
| Transactions | ACID ✓ | Eventual consistency |
| Joins | Powerful ✓ | Difficult/expensive |
| Scaling | Vertical + sharding | Horizontal ✓ |
| Write Speed | Moderate | Very Fast ✓ |
For Twitter: Use Both!
-- PostgreSQL: User data (needs transactions)
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
bio TEXT,
profile_image_url TEXT
);
CREATE TABLE follows (
follower_id UUID REFERENCES users(user_id),
followee_id UUID REFERENCES users(user_id),
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (follower_id, followee_id)
);
-- Why SQL: Need to enforce unique username, email constraints.
-- Following relationships need joins for "mutual follows" queries.// DynamoDB (NoSQL): Tweets and Timeline (needs scale + speed)
{
TableName: "Tweets",
PartitionKey: "tweet_id", // UUID
SortKey: "created_at", // Timestamp
Attributes: {
user_id: "UUID",
username: "String", // Denormalized for fast reads!
text: "String",
media_urls: ["String"],
likes_count: "Number",
retweets_count: "Number"
}
// Global Secondary Index: user_id + created_at (for user profile view)
}
{
TableName: "Timeline",
PartitionKey: "user_id", // Owner of timeline
SortKey: "created_at", // Latest first
Attributes: {
tweet_id: "UUID",
// Fan-out on write: when user tweets, add to all followers' timelines
}
// TTL: 7 days (auto-delete old timeline entries)
}
// Why NoSQL:
// - 50M tweets/day needs horizontal scaling
// - Schema may evolve (polls, videos, etc.)
// - Read-heavy: denormalize for speed (store username in tweet)
// - Fan-out on write: pre-compute timelines in Timeline tableStep 6: Deep Dive & Scaling
Goal: Address potential bottlenecks and demonstrate senior thinking.
1. Timeline Generation: Fan-out Strategies
Problem: When user posts tweet, how do 10,000 followers see it?
Option A: Fan-out on Write (Push)
- Store tweet in each follower's timeline immediately
- Read: Fast (just query user's timeline table)
- Write: Slow for celebrities (1M followers = 1M writes)
Option B: Fan-out on Read (Pull)
- Store tweets only in user's own table
- Read: Slow (join tweets from all followed users)
- Write: Fast (1 write only)
Hybrid Solution (What Twitter Actually Does):
- Regular users (<10K followers): Fan-out on write
- Celebrities (>10K followers): Fan-out on read
- At read time: merge pre-computed timeline + celebrity tweets
- Best of both worlds!2. Caching Strategy
What to Cache:
✅ Timeline: Top 100 tweets per user (Redis Sorted Set, TTL 5min)
✅ User profiles: Hot users (celebrities) cached (TTL 1hr)
✅ Tweet metadata: Likes/retweets count (updated async)
Cache Invalidation:
- New tweet: invalidate author's timeline + followers' timelines
- Use pub/sub (Redis) to notify cache servers
- Accept slight delay (eventual consistency)
Cache-Aside Pattern:
1. App checks cache
2. Cache miss: query database
3. Store in cache with TTL
4. Return to user3. Database Sharding
Problem: Postgres has 1 billion users, can't fit on 1 server.
Shard by user_id:
- Hash(user_id) % N → determines which database shard
- Shard 1: user_id 0-249M
- Shard 2: user_id 250M-499M
- Shard 3: user_id 500M-749M
- Shard 4: user_id 750M-999M
Pros:
✅ Even distribution
✅ Each shard handles 250M users
Cons:
❌ Cross-shard queries hard (e.g., "users who follow user A and B")
❌ Rebalancing when adding shards is complex
Mitigation:
- Use consistent hashing to minimize re-sharding
- Denormalize to avoid cross-shard queries4. Monitoring & Observability
Metrics to Track:
📊 Server health: CPU, memory, disk I/O per instance
📊 API latency: p50, p95, p99 per endpoint
📊 Error rates: 4xx, 5xx by endpoint
📊 Database: Query time, connection pool size, replication lag
📊 Cache: Hit rate (target >80%), eviction rate
Alerts:
🚨 P99 latency > 2 seconds
🚨 Error rate > 1%
🚨 Replication lag > 10 seconds
🚨 Cache hit rate < 70%
Tools:
- Prometheus + Grafana: Metrics dashboards
- Jaeger: Distributed tracing
- ELK Stack: Centralized logging5. Security Considerations
- Rate Limiting: Prevent spam/abuse (300 tweets/hr, 1000 API calls/15min)
- Authentication: JWT tokens with 1hr expiry
- Authorization: Check tweet.user_id === auth.user_id before edit/delete
- Input Validation: Sanitize tweet text, validate URLs
- HTTPS: Encrypt all traffic (TLS 1.3)
- DDoS Protection: CloudFlare/AWS Shield at edge
Trade-offs to Discuss
Interviewers LOVE when you mention trade-offs without being asked:
Consistency vs Availability (CAP Theorem)
"I chose eventual consistency for timelines because 1-2 second delay is acceptable for availability. For payment systems, I'd choose strong consistency."
Latency vs Consistency
"Caching reduces latency to 10ms but risks showing stale data for 1 minute. Acceptable for social media, not for stock prices."
Storage Cost vs Query Speed
"Denormalizing (storing username in tweet) costs 50 bytes × 50M tweets = 2.5GB extra, but avoids join, saving 100ms per query. Worth it for read-heavy system."
Complexity vs Performance
"Hybrid fan-out adds complexity (2 code paths) but handles both regular users and celebrities efficiently. Simpler fan-out on read would break for Elon Musk tweets."
Common Mistakes to Avoid
❌ Jumping to implementation too quickly
Ask clarifying questions first! "Should we support video tweets?"
❌ Focusing only on happy path
Discuss: What if server crashes? Database is down? User spams API?
❌ Ignoring scale
"Just use Postgres" works for 1K users, not 100M. Always consider scale from step 2.
❌ Over-engineering early
Start simple (monolith), then add complexity (microservices, Kafka) when explaining scale.
❌ Silent drawing
Narrate while drawing: "I'm adding a cache here to reduce database load..."
Sample Questions to Practice
- Beginner: Design URL Shortener (bit.ly), Design Pastebin, Design Rate Limiter
- Intermediate: Design Instagram, Design YouTube, Design Uber, Design WhatsApp
- Advanced: Design Google Search, Design Amazon, Design Netflix, Design Distributed Cache
Practice these with a friend or record yourself. The goal is to speak confidently and demonstrate structured thinking, not memorize solutions.
Key Takeaways
- ✅ Clarify first: Scope the problem before designing
- ✅ Estimate capacity: Use numbers to guide decisions
- ✅ Start simple: Monolith → Load balancer → Caching → Microservices
- ✅ Discuss trade-offs: Every decision has pros/cons
- ✅ Think about failure: What breaks at scale? How to recover?
- ✅ Communicate clearly: Draw, narrate, check understanding
- ✅ No perfect answer: Show thought process, adapt to feedback
Further Resources
- Book: "Designing Data-Intensive Applications" by Martin Kleppmann
- Course: "Grokking the System Design Interview" (educative.io)
- YouTube: System Design Interview channel, Gaurav Sen
- Practice: Use our System Design questions with real-world scenarios
Continue Learning
- Hash Maps: When and Why — Essential for caching layers in system design
- Big O Notation Explained — Understand scalability analysis fundamentals
- STAR Method for Behavioral Interviews — Prepare for the behavioral portion of your interview
- Start Practicing — Apply system design concepts to real scenarios