System DesignFebruary 27, 2026·10 min read

Design a URL Shortener: The Complete Guide

Master this beginner-friendly system design problem with encoding math and scaling strategies

Why URL Shortener?

URL shorteners (like bit.ly, TinyURL) are a classic "beginner" system design question. Despite being simpler than Twitter or Uber, they test core concepts:

Encoding algorithms: How to generate short, unique URLs
Database design: Simple key-value with scale considerations
Caching: Read-heavy workload optimization
Analytics: Tracking clicks at scale

This guide shows you how to nail this interview in 30-45 minutes.

Step 1: Clarify Requirements (3 minutes)

Functional Requirements

Core features:
1. Shorten URL: given long URL, return short URL
2. Redirect: given short URL, redirect to original
3. Custom aliases: user can choose their own short code (optional)
4. Expiration: URLs can expire after N days (optional)

Usually OUT of scope:
- User accounts and authentication
- URL editing after creation
- Preview page before redirect
- QR code generation

Non-Functional Requirements

Scale:
- 100M URLs created per day
- Read:write ratio = 100:1 (10B redirects/day)
- URLs stored for 5 years

Performance:
- Redirect latency < 100ms
- URL creation < 500ms
- 99.9% availability

Constraints:
- Short URL as short as possible (7 characters ideal)
- No offensive words in generated URLs

Step 2: Capacity Estimation (5 minutes)

Let's do the math to guide our design decisions.

Traffic

Writes (URL creation):
- 100M URLs/day
- 100M / 86,400 sec = ~1,160 URLs/second
- Peak (3x): ~3,500 URLs/second

Reads (Redirects):
- 100:1 ratio → 10B redirects/day
- 10B / 86,400 = ~116,000 redirects/second
- Peak: ~350,000 redirects/second

Key insight: Extremely read-heavy (100:1)
Strategy: Aggressive caching, optimize for reads

Storage

Per URL record:
- Short code: 7 bytes
- Long URL: 500 bytes (average)
- Created timestamp: 8 bytes
- Expiration: 8 bytes
- User ID (optional): 8 bytes
- Total: ~550 bytes per URL

5 years of URLs:
- 100M/day x 365 days x 5 years = 182.5 billion URLs
- 182.5B x 550 bytes = ~100 TB

With replication (3x): ~300 TB
This is large but manageable with distributed database.

Short URL Length Calculation

How many characters do we need?

Using base62 (a-z, A-Z, 0-9):
- 6 chars: 62^6 = 56.8 billion URLs
- 7 chars: 62^7 = 3.5 trillion URLs

Our 5-year need: 182.5 billion URLs
Answer: 7 characters is plenty (19x headroom)

Using base64 (adds + and /):
- 7 chars: 64^7 = 4.4 trillion URLs
- But +/ are URL-unfriendly, stick with base62

Step 3: High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                          CLIENTS                                 │
│           (Browsers, Mobile Apps, API clients)                   │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                      LOAD BALANCER                               │
│               (Route by URL path, health checks)                 │
└───────────────┬─────────────────────────────────┬───────────────┘
                │                                 │
                ▼                                 ▼
      ┌──────────────────┐              ┌──────────────────┐
      │   Write Service  │              │   Read Service   │
      │                  │              │   (Redirect)     │
      │  POST /shorten   │              │   GET /:code     │
      └────────┬─────────┘              └────────┬─────────┘
               │                                 │
               │    ┌──────────────────┐         │
               │    │      CACHE       │◄────────┘
               │    │     (Redis)      │
               │    └────────┬─────────┘
               │             │ (cache miss)
               │             │
               ▼             ▼
      ┌─────────────────────────────────────────────────────┐
      │                    DATABASE                          │
      │              (URL mappings table)                    │
      └─────────────────────────────────────────────────────┘

Component Responsibilities

1. Write Service (URL Shortening)
   - Validate input URL
   - Generate short code (or validate custom alias)
   - Check for collisions
   - Store mapping in database
   - Populate cache
   - Return short URL

2. Read Service (Redirect)
   - Parse short code from URL
   - Check cache first
   - On miss: query database, populate cache
   - Return 301/302 redirect
   - (Async) Log analytics event

3. Cache (Redis Cluster)
   - Store short_code → long_url mappings
   - ~50KB per 100K URLs (short code + URL)
   - Target: 99%+ cache hit rate

4. Database
   - Primary storage for all mappings
   - Handles custom alias uniqueness
   - Stores metadata (created_at, expires_at)

Step 4: Short Code Generation

This is the core algorithm question. There are three main approaches:

Approach 1: Hash + Truncate

Algorithm:
1. Hash the long URL: MD5(long_url) = 128-bit hash
2. Take first 43 bits (enough for 8.7 trillion)
3. Encode in base62 = 7 characters

Example:
MD5("https://example.com/very/long/path")
= "a1b2c3d4e5f6..."
→ Take 43 bits → base62 → "aB3x9Kp"

Pros:
+ Same URL always generates same short code (idempotent)
+ Stateless (no counter coordination needed)

Cons:
- Collisions! MD5 truncation WILL collide at scale
- Need collision handling (append counter, retry)
- Can't guarantee short code length

Collision handling:
1. Check if short_code exists in DB
2. If collision: append "1", "2", etc. until unique
3. Or: use Bloom filter for fast pre-check

Approach 2: Counter + Base62 (Recommended)

Algorithm:
1. Get next number from distributed counter
2. Encode number in base62 = short code

Example:
Counter = 123456789
Base62(123456789) = "8M0kX"

Counter options:
- Auto-increment database column
- Redis INCR (atomic, fast)
- Distributed ID generator (Twitter Snowflake)

Pros:
+ No collisions (counter is always unique)
+ Predictable length
+ Simple implementation

Cons:
- Counter is single point of coordination
- Sequential URLs (predictable, security concern?)
- Different URLs get different codes (no deduplication)

Mitigation for predictability:
- Add random salt to counter before encoding
- Use ranges: Server 1 gets 1-1M, Server 2 gets 1M-2M

Approach 3: Pre-generate Keys

Algorithm:
1. Background job pre-generates millions of unique keys
2. Stores unused keys in "available_keys" table
3. On URL creation: pop a key from available pool

Pros:
+ No collision check at creation time
+ Very fast URL creation
+ No coordination needed at request time

Cons:
- Complex: need key generation service
- Wasted keys if URL creation fails
- Need to monitor key pool depletion

Use case: Very high write throughput where counter
coordination is a bottleneck.

Recommendation

For most interviews, go with Counter + Base62. It's simple, has no collisions, and Redis INCR handles 100K+ ops/second. Mention the other approaches to show depth.

Step 5: Database Design

Schema

CREATE TABLE url_mappings (
    short_code VARCHAR(10) PRIMARY KEY,
    long_url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    user_id UUID,  -- If supporting user accounts
    click_count BIGINT DEFAULT 0  -- Denormalized for fast reads
);

-- Index for cleanup job (expired URLs)
CREATE INDEX idx_expires_at ON url_mappings(expires_at)
    WHERE expires_at IS NOT NULL;

-- Index for user's URLs (if user accounts exist)
CREATE INDEX idx_user_id ON url_mappings(user_id)
    WHERE user_id IS NOT NULL;

SQL vs NoSQL Decision

For URL Shortener, either works well:

SQL (PostgreSQL):
+ ACID transactions for custom alias uniqueness
+ Familiar, easy to query
+ Good enough scale with sharding
- Sharding adds complexity

NoSQL (DynamoDB, Cassandra):
+ Built for horizontal scaling
+ Simple key-value access pattern fits perfectly
+ Managed (DynamoDB) reduces ops burden
- Eventual consistency (usually fine here)

Recommendation: DynamoDB or similar
- Primary key: short_code (partition key)
- Provisioned capacity for predictable performance
- TTL attribute for auto-expiration

Sharding Strategy (if using SQL)

If you need to shard:

Option 1: Hash-based sharding on short_code
- shard = hash(short_code) % N
- Even distribution
- Works well for random access

Option 2: Range-based sharding
- Shard 1: codes starting with a-m
- Shard 2: codes starting with n-z
- Easy to understand, potential hot spots

For interview: Hash-based is usually the right answer.
Mention consistent hashing for adding/removing shards.

Step 6: Caching Strategy

Cache is critical for 100:1 read-heavy workload.

What to cache:
- Key: short_code
- Value: long_url
- Size: ~500 bytes per entry

Cache sizing:
- 80/20 rule: 20% of URLs get 80% of traffic
- Cache 20% of URLs = 36 billion URLs
- 36B x 500 bytes = 18 TB
- Too large! Use LRU eviction

Practical cache size:
- Last 30 days of popular URLs
- 100M/day x 30 days x 20% = 600M URLs
- 600M x 500 bytes = 300 GB
- Fits in Redis cluster

Cache pattern: Cache-Aside

Read path:
1. Check Redis for short_code
2. Cache HIT: return long_url, redirect
3. Cache MISS: query database
4. Store in Redis with TTL (24 hours)
5. Return redirect

Write path:
1. Generate short_code
2. Write to database
3. Write to cache (write-through)
4. Return short URL

Cache Performance Target

Without cache:
- 116,000 DB queries/second
- Database would need massive scaling

With 95% cache hit rate:
- 116,000 x 5% = 5,800 DB queries/second
- Much more manageable

With 99% cache hit rate:
- 116,000 x 1% = 1,160 DB queries/second
- Single database instance might handle this

Target: 99%+ cache hit rate
This is achievable because:
- Popular URLs are accessed repeatedly
- URL mappings never change (immutable)
- LRU naturally keeps hot URLs in cache

Step 7: Analytics at Scale

Tracking click analytics (who clicked, when, where) is a common follow-up question.

The Challenge

Naive approach:
- On each redirect, UPDATE click_count in database
- 116,000 updates/second = database dies

Problem:
- Write amplification on read path
- Redirect latency increases
- Database becomes bottleneck

The Solution: Async Event Processing

Architecture:

┌─────────────┐    ┌─────────────┐    ┌─────────────────┐
│   Redirect  │───▶│    Kafka    │───▶│  Analytics      │
│   Service   │    │   (Events)  │    │  Processor      │
└─────────────┘    └─────────────┘    └────────┬────────┘
      │                                        │
      │ (redirect immediately)                 ▼
      ▼                               ┌─────────────────┐
   User                               │  ClickHouse /   │
                                      │  TimescaleDB    │
                                      └─────────────────┘

Flow:
1. Redirect service: lookup URL, return 301 redirect
2. Async: publish click event to Kafka
   {short_code, timestamp, ip, user_agent, referer}
3. Stream processor: aggregate events
4. Store in analytics database (optimized for time-series)

Benefits:
- Redirect latency unaffected (~10ms)
- Analytics scales independently
- Can replay events if needed
- Supports complex queries (by hour, geo, etc.)

Analytics Queries

-- Clicks per hour for a URL
SELECT
    date_trunc('hour', clicked_at) as hour,
    COUNT(*) as clicks
FROM click_events
WHERE short_code = 'aB3x9Kp'
    AND clicked_at > NOW() - INTERVAL '7 days'
GROUP BY hour
ORDER BY hour;

-- Top URLs this week
SELECT
    short_code,
    COUNT(*) as clicks
FROM click_events
WHERE clicked_at > NOW() - INTERVAL '7 days'
GROUP BY short_code
ORDER BY clicks DESC
LIMIT 100;

-- Geographic distribution
SELECT
    country,
    COUNT(*) as clicks
FROM click_events
WHERE short_code = 'aB3x9Kp'
GROUP BY country
ORDER BY clicks DESC;

Step 8: Additional Considerations

301 vs 302 Redirect

301 (Permanent Redirect):
- Browser caches the redirect
- Future visits skip your server entirely
- Better for SEO (passes link juice)
- You lose analytics on repeat visits

302 (Temporary Redirect):
- Browser doesn't cache
- Every visit hits your server
- You track all clicks
- Slightly slower for users

Recommendation:
- 302 if analytics are important (most URL shorteners)
- 301 if SEO/performance matters and you don't need analytics
- Make it configurable per URL

Security Considerations

1. Malicious URL Prevention
   - Check URLs against Google Safe Browsing API
   - Rate limit URL creation per IP
   - Require CAPTCHA for anonymous users

2. Preventing Enumeration
   - Don't use sequential codes (counter approach vulnerability)
   - Add randomness: base62(counter + random_salt)
   - Or use random generation with collision check

3. Rate Limiting
   - 100 URLs/hour per IP for anonymous
   - 1000 URLs/hour for authenticated users
   - 429 Too Many Requests with Retry-After header

4. Input Validation
   - Validate URL format
   - Maximum URL length (2048 chars)
   - Block localhost, internal IPs

URL Expiration

Options:

1. TTL in database (DynamoDB native feature)
   - Automatic deletion after expires_at
   - No cleanup job needed

2. Soft delete
   - Keep mapping but return 410 Gone
   - Useful for analytics retention

3. Cleanup job
   - Periodic job deletes expired URLs
   - Run during low-traffic hours

Cache invalidation on expiration:
- Set Redis TTL to match URL expiration
- Or: check expiration on read, delete if expired

API Design

// ===== SHORTEN URL =====

POST /api/v1/shorten
Request:
{
  "url": "https://example.com/very/long/path?with=params",
  "custom_alias": "my-link",  // Optional
  "expires_in": 86400         // Optional: seconds until expiration
}

Response (201 Created):
{
  "short_url": "https://short.ly/aB3x9Kp",
  "short_code": "aB3x9Kp",
  "long_url": "https://example.com/very/long/path?with=params",
  "expires_at": "2026-02-28T10:30:00Z",
  "created_at": "2026-02-27T10:30:00Z"
}

Error (409 Conflict - custom alias taken):
{
  "error": "custom_alias_taken",
  "message": "The alias 'my-link' is already in use"
}

// ===== REDIRECT =====

GET /:short_code

Response: 302 Redirect
Location: https://example.com/very/long/path?with=params

Error (404 Not Found):
{
  "error": "url_not_found",
  "message": "Short URL does not exist or has expired"
}

// ===== ANALYTICS =====

GET /api/v1/urls/:short_code/stats

Response:
{
  "short_code": "aB3x9Kp",
  "total_clicks": 15420,
  "clicks_today": 230,
  "clicks_by_country": {
    "US": 8000,
    "UK": 3000,
    "DE": 2000
  },
  "clicks_by_day": [
    {"date": "2026-02-27", "clicks": 230},
    {"date": "2026-02-26", "clicks": 450}
  ]
}

Summary Checklist

Clarify requirements: 100M URLs/day, 100:1 read ratio
Calculate: 7 chars base62 = 3.5 trillion URLs (enough for 5 years)
Choose encoding: Counter + Base62 (simple, no collisions)
Database: DynamoDB or PostgreSQL with short_code as primary key
Cache: Redis, target 99% hit rate, 300GB for 30 days of hot URLs
Analytics: Async event streaming to avoid blocking redirects
Security: Rate limiting, URL validation, enumeration prevention

Common Interview Questions

How do you handle custom aliases?

Check if alias exists in database. If not, use it. If yes, return 409 Conflict. Validate alias format (alphanumeric, 3-20 chars, no offensive words).

What if the same URL is shortened twice?

Two options: (1) Return existing short code (deduplicate) - requires secondary index on long_url, expensive at scale. (2) Create new short code each time - simpler, most services do this.

How do you scale the counter?

Redis INCR handles 100K+ ops/second on single node. For more: pre-allocate ranges to each server (Server 1 gets 1-1M, Server 2 gets 1M-2M). Or use distributed ID generator like Twitter Snowflake.

How do you prevent abuse?

Rate limiting by IP (100 URLs/hour anonymous). Check URLs against malware lists. CAPTCHA for suspicious patterns. Block known spam domains. Require email verification for high volume.

Continue Learning

System Design Interview Framework - The 6-step approach for any system design question
Design Twitter - A more complex system design problem
Practice System Design - Test your knowledge with interactive questions