Design Twitter: A Step-by-Step System Design Walkthrough
The complete guide to acing the most common system design interview question
Why Design Twitter?
Twitter (now X) is the most commonly asked system design question in tech interviews. It tests your understanding of:
- Data modeling: Users, tweets, follows, likes
- Scale: 500M tweets/day, 300M users
- Trade-offs: Consistency vs latency for feeds
- Real-time systems: Notifications, trending
This guide walks through a complete interview answer, including the specific questions to ask, calculations to make, and architectures to draw.
Step 1: Clarify Requirements (5 minutes)
Never start designing immediately. Ask questions to scope the problem and show structured thinking.
Functional Requirements
Ask the interviewer:
"What features should I focus on?"
Typical scope for 45-minute interview:
- Post tweets (280 chars, optional media)
- Follow/unfollow users
- Home timeline (tweets from followed users)
- User profile (user's own tweets)
Usually OUT of scope (confirm with interviewer):
- Direct messages
- Search
- Trending topics (may be deep-dive)
- Likes/retweets (simple, mention briefly)
- Notifications (may be deep-dive)Non-Functional Requirements
Ask the interviewer:
"What scale should I design for?"
Typical assumptions:
- 300M monthly active users (MAU)
- 100M daily active users (DAU)
- Users post 0.5 tweets/day on average
- Users view timeline 10 times/day
- 20% of tweets have media (images/video)
Performance requirements:
- Timeline loads < 500ms
- High availability (99.9%)
- Eventual consistency acceptable for social media
Key insight: Read-heavy system (100:1 read-to-write ratio)
This drives our architecture decisions.Pro Tip
Write these requirements on the whiteboard. Reference them when making decisions: "Since we agreed eventual consistency is acceptable, we can use caching aggressively."
Step 2: Capacity Estimation (5 minutes)
Use back-of-envelope calculations to guide architecture decisions. Round aggressively; interviewers want to see your reasoning, not exact math.
Traffic Estimation
Write Traffic (Tweets):
- 100M DAU x 0.5 tweets/day = 50M tweets/day
- 50M / 86,400 sec = ~600 tweets/second
- Peak (3x average) = ~1,800 tweets/second
Read Traffic (Timeline Views):
- 100M DAU x 10 views/day = 1B timeline views/day
- 1B / 86,400 = ~12,000 reads/second
- Peak = ~36,000 reads/second
Ratio: 36,000 reads / 1,800 writes = 20:1 read-heavy
Implication: Optimize for reads (caching, pre-computation)Storage Estimation
Tweet Storage:
- 50M tweets/day
- Tweet size: ~500 bytes (text + metadata)
- 50M x 500 bytes = 25 GB/day text data
- Per year: 25 GB x 365 = ~9 TB/year
Media Storage:
- 20% of tweets have media
- 10M media/day x 1MB average = 10 TB/day
- Per year: 10 TB x 365 = 3.6 PB/year
Key insight: Media dominates storage.
Solution: Store media in blob storage (S3), only store URLs in database.Memory Estimation (for Caching)
What to cache: Most recent timeline for active users
- Cache top 100 tweets per user timeline
- 100M users x (100 tweets x 500 bytes) = 5 TB
This is large but feasible with distributed cache (Redis cluster).
In practice: cache only for active users, evict inactive.
Target: ~500GB - 1TB cache cluster.Step 3: High-Level Design (10 minutes)
Draw the architecture progressively. Start simple, add complexity.
Core Components
┌──────────────────────────────────────────────────────────────────┐
│ CLIENTS │
│ (Web, iOS, Android) │
└──────────────────────┬───────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ CDN │
│ (Static assets, cached media) │
└──────────────────────┬───────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ LOAD BALANCER │
│ (Route requests, health checks) │
└─────────┬──────────────┬──────────────┬──────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ API │ │ API │ │ API │
│ Server │ │ Server │ │ Server │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└──────────────┼──────────────┘
│
┌──────────────┴──────────────┐
│ │
▼ ▼
┌───────────┐ ┌───────────────┐
│ Redis │ │ Databases │
│ Cache │ │ (See below) │
└───────────┘ └───────────────┘Database Architecture
Use different databases for different purposes (polyglot persistence):
┌─────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ PostgreSQL │ │ Cassandra │ │
│ │ │ │ │ │
│ │ - Users table │ │ - Tweets table │ │
│ │ - Follows table │ │ - Timeline table│ │
│ │ (Strong ACID) │ │ (High write │ │
│ │ │ │ throughput) │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ S3 │ │ Redis │ │
│ │ │ │ │ │
│ │ - Media files │ │ - Timeline cache│ │
│ │ - Images/video │ │ - User cache │ │
│ │ (Blob storage) │ │ - Rate limiting │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Why this split:
- PostgreSQL: Users need ACID (unique usernames, email verification)
- Cassandra: Tweets are append-heavy, need horizontal scaling
- S3: Media is large, immutable, needs CDN integration
- Redis: Hot data, sub-millisecond reads for timelinesStep 4: API Design (5 minutes)
Define the key endpoints. Use RESTful conventions.
// ===== TWEET OPERATIONS =====
POST /api/v1/tweets
Request:
{
"text": "Hello world!",
"media_ids": ["uuid1", "uuid2"] // Pre-uploaded to S3
}
Response:
{
"id": "tweet_123abc",
"created_at": "2026-02-27T10:30:00Z",
"user": { "id": "user_456", "username": "alice" }
}
// ===== TIMELINE =====
GET /api/v1/timeline?cursor={cursor}&limit=20
Response:
{
"tweets": [
{
"id": "tweet_123",
"text": "Hello world!",
"user": { "id": "user_456", "username": "alice", "avatar_url": "..." },
"created_at": "2026-02-27T10:30:00Z",
"media_urls": ["https://cdn.example.com/..."],
"like_count": 42,
"retweet_count": 10
}
],
"next_cursor": "base64_encoded_timestamp"
}
// ===== FOLLOW =====
POST /api/v1/users/{user_id}/follow
DELETE /api/v1/users/{user_id}/follow
// ===== MEDIA UPLOAD =====
POST /api/v1/media/upload
Returns: pre-signed S3 URL for direct upload
After upload: returns media_id to attach to tweetKey Design Decisions
- Cursor pagination: Better than offset for real-time feeds (handles new tweets during pagination)
- Pre-signed URLs: Client uploads directly to S3, reduces server load
- Denormalized user data: Include username in tweet response (no extra lookup)
- Rate limiting: 300 tweets/hour, 1000 API calls/15min (mention in headers)
Step 5: Deep Dive - Timeline Generation
This is the most interesting part of Twitter's architecture and where interviewers spend the most time. The core question: When a user posts a tweet, how do their followers see it?
Approach 1: Fan-out on Write (Push Model)
When user posts tweet:
1. Write tweet to Tweets table
2. Query all followers (could be millions)
3. Insert tweet_id into each follower's timeline
User posts → Write to 10,000 follower timelines
┌──────────┐
│ Tweet │
│ Posted │
└────┬─────┘
│
▼
┌──────────────────────────────────────────┐
│ Fan-out Worker │
│ For each follower: INSERT into timeline │
└───┬───────────────────────────────────────┘
│
├──► Timeline_user_1: [tweet_id, ...]
├──► Timeline_user_2: [tweet_id, ...]
├──► Timeline_user_3: [tweet_id, ...]
└──► ... (thousands of writes)
Pros:
+ Timeline read is fast (already pre-computed)
+ Simple read path: SELECT from user's timeline
Cons:
- Celebrity with 10M followers = 10M writes per tweet
- High write amplification
- Delay in tweet appearing for followersApproach 2: Fan-out on Read (Pull Model)
When user views timeline:
1. Query list of followed users
2. For each followed user, fetch recent tweets
3. Merge and sort by timestamp
4. Return top N tweets
User requests timeline → Query N users' tweets → Merge
┌──────────┐
│ Read │
│ Timeline │
└────┬─────┘
│
▼
┌─────────────────────────────────────────────┐
│ Get followed users (100 users) │
└────┬────────────────────────────────────────┘
│
├──► Fetch tweets from user_1
├──► Fetch tweets from user_2
├──► Fetch tweets from user_3
└──► ... (100 queries, parallelized)
│
▼
┌──────────────────────────────────────────────┐
│ Merge all tweets, sort by time, return 20 │
└──────────────────────────────────────────────┘
Pros:
+ Tweet post is fast (1 write)
+ No write amplification
+ Fresh data always
Cons:
- Read is slow (N queries + merge)
- Complex read path
- Latency increases with following countThe Hybrid Approach (What Twitter Actually Uses)
Insight: 99% of users have < 10K followers. 1% are celebrities.
Strategy:
- Regular users (< 10K followers): Fan-out on WRITE
→ Pre-compute timelines, fast reads
- Celebrities (> 10K followers): Fan-out on READ
→ Don't fan-out, merge at read time
At read time:
1. Fetch pre-computed timeline from cache
2. Query tweets from followed celebrities
3. Merge both, sort by time
4. Return combined feed
┌──────────────────────────────────────────────────────────────────┐
│ TIMELINE READ │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
│ │ Pre-computed │ │ Celebrity tweets (fetched │ │
│ │ timeline from │ + │ on-demand from their feeds) │ │
│ │ Redis cache │ │ Elon, Taylor, etc. │ │
│ └────────┬────────┘ └──────────────┬──────────────────┘ │
│ │ │ │
│ └──────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ Merge & Sort │ │
│ │ by timestamp │ │
│ └───────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ Return top 20 │ │
│ │ tweets to user │ │
│ └───────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
Why this works:
- Most timelines = cache read + merge ~3 celebrity feeds
- Cache hit = 10ms, merge = 20ms, total < 50ms
- Celebrity posting: no fan-out delayInterview Insight
Mention that the threshold (10K followers) is tunable based on monitoring. Twitter has adjusted this over time. Show you understand it's a pragmatic engineering decision, not a fixed rule.
Step 6: Deep Dive - Database Schema
Users Table (PostgreSQL)
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
username VARCHAR(15) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
display_name VARCHAR(50),
bio TEXT,
avatar_url TEXT,
followers_count INT DEFAULT 0,
following_count INT DEFAULT 0,
is_celebrity BOOLEAN DEFAULT FALSE, -- For hybrid fan-out
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_users_username ON users(username);
-- Why PostgreSQL: Strong consistency for unique username check,
-- ACID for account operations, complex queries for user searchFollows Table (PostgreSQL)
CREATE TABLE follows (
follower_id UUID REFERENCES users(id),
followee_id UUID REFERENCES users(id),
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (follower_id, followee_id)
);
-- Get who I follow (for timeline generation)
CREATE INDEX idx_follows_follower ON follows(follower_id);
-- Get my followers (for fan-out on write)
CREATE INDEX idx_follows_followee ON follows(followee_id);
-- Trigger to update follower/following counts
-- (or handle in application layer)Tweets Table (Cassandra)
// Cassandra schema - optimized for write throughput
// Partition key: user_id (all tweets by user together)
// Clustering key: created_at DESC (recent tweets first)
CREATE TABLE tweets (
user_id UUID,
tweet_id TIMEUUID, -- Time-based UUID, auto-sorted
text TEXT,
media_urls LIST<TEXT>,
like_count COUNTER,
retweet_count COUNTER,
created_at TIMESTAMP,
PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);
// Query: Get recent tweets by user (for profile page)
SELECT * FROM tweets WHERE user_id = ? LIMIT 20;
// Why Cassandra:
// - 50M tweets/day needs horizontal scaling
// - Time-series pattern (recent tweets = hot data)
// - Eventual consistency acceptable
// - Counters for like/retweet countsTimeline Table (Cassandra)
// Pre-computed timelines from fan-out on write
CREATE TABLE timeline (
user_id UUID, -- Owner of this timeline
tweet_id TIMEUUID, -- Tweet in their feed
author_id UUID, -- Who posted it
PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC)
AND default_time_to_live = 604800; -- 7 day TTL
// Query: Get timeline for user
SELECT * FROM timeline WHERE user_id = ? LIMIT 20;
// TTL ensures old timeline entries auto-delete
// (user won't scroll back 7 days anyway)Step 7: Scaling and Reliability
Caching Strategy
Cache Layer (Redis Cluster):
1. Timeline Cache
- Key: timeline:{user_id}
- Value: Sorted Set of (tweet_id, timestamp)
- TTL: 5 minutes (refresh on read)
- Size: ~50KB per user (1000 tweet IDs)
2. User Cache
- Key: user:{user_id}
- Value: JSON of user profile
- TTL: 1 hour
- Invalidate on profile update
3. Tweet Cache
- Key: tweet:{tweet_id}
- Value: JSON of tweet + embedded user info
- TTL: 24 hours (tweets are immutable)
Cache-aside pattern:
1. Check cache
2. On miss: query database
3. Store in cache with TTL
4. Return to client
Target: 95%+ cache hit rate for timeline readsHandling Failures
What could go wrong?
1. Database failure
- Replication: Primary + 2 replicas per shard
- Automatic failover with consensus
- Timeline reads served from cache during outage
2. Cache failure
- Redis cluster with replicas
- Graceful degradation: fall back to database
- Circuit breaker to prevent cascade
3. Celebrity tweet goes viral
- Rate limit fan-out worker
- Queue backpressure
- Prioritize recent followers
4. DDoS attack
- CDN-level rate limiting (CloudFlare)
- Application-level rate limiting (Redis)
- Bot detection and CAPTCHAMonitoring
Key metrics to track:
Latency:
- Timeline P50, P95, P99 latency
- Tweet post latency
- Fan-out completion time
Throughput:
- Tweets per second
- Timeline reads per second
- Cache hit rate (target: >95%)
Errors:
- Failed tweet posts
- Timeout rate
- 5xx error rate
Infrastructure:
- Database replication lag
- Cache memory usage
- Queue depth for fan-out workers
Alerting thresholds:
- P99 latency > 500ms
- Error rate > 0.1%
- Cache hit rate < 90%
- Replication lag > 10 secondsCommon Interview Questions
How do you handle a tweet from Elon Musk?
Fan-out on read. Don't push to 150M followers. When users load timeline, merge their pre-computed timeline with a real-time query of followed celebrities. Cache celebrity tweets aggressively.
What if a user follows 5,000 accounts?
Their pre-computed timeline still works (receives fan-out from followed non-celebrities). At read time, merge with ~50 celebrity feeds they follow. 50 parallel queries with caching is fast.
How do you implement trending topics?
Stream processing (Kafka + Flink). Count hashtags in 5-minute sliding windows. Apply decay function for recency. Normalize against baseline to avoid always-popular topics. Geographic partitioning for local trends.
How do you handle duplicate tweets (spam)?
Content hash comparison, rate limiting per user, ML-based spam detection on write path. For retweets, store reference to original rather than duplicating content.
Summary Checklist
- Ask clarifying questions (scope, scale, priorities)
- Calculate: 50M tweets/day, 1B timeline reads/day, 20:1 read-heavy
- Draw: CDN, LB, API servers, cache, databases (polyglot)
- Explain hybrid fan-out: push for regular users, pull for celebrities
- Schema: Users/Follows in PostgreSQL, Tweets/Timeline in Cassandra
- Caching: Redis for timelines, 95%+ hit rate target
- Handle edge cases: viral tweets, failures, spam
- Discuss trade-offs at every decision point
Continue Learning
- System Design Interview Framework - The 6-step approach for any system design question
- Design URL Shortener - Another classic system design problem
- Practice System Design - Test your knowledge with interactive questions