System DesignMarch 3, 2026·13 min read
Design Spotify: Music Streaming System Design Guide
Design audio streaming, recommendations, and playlist management for 500M+ users
Why Design Spotify?
Spotify tests unique challenges that differ from typical web applications: continuous media streaming, recommendation algorithms, and offline-first mobile design. It's increasingly popular in interviews at Spotify, Apple, Amazon Music, and general system design rounds.
- Audio streaming: Adaptive bitrate, buffer management, gapless playback
- Recommendations: Discover Weekly, Daily Mix, Radio — personalized at scale
- Content delivery: 100M+ songs, served globally with low latency
- Offline mode: Download management, DRM, storage optimization
Step 1: Requirements
Functional Requirements
Core features:
- Search for songs, artists, albums, playlists
- Stream audio in real-time
- Create and manage playlists
- Follow artists and users
- Personalized recommendations (home feed, Discover Weekly)
Out of scope:
- Podcasts
- Social features (group sessions)
- Artist upload portal
- Ads system (free tier)Non-Functional Requirements
Scale:
- 500M monthly active users
- 200M concurrent streams during peak
- 100M+ song catalog
- Average song: 3.5 minutes, 10MB at 320kbps
Performance:
- Song playback starts within 200ms of pressing play
- Gapless playback between songs
- Search results in < 300ms
- 99.9% availability
Key insight: This is a READ-heavy streaming system.
Audio files are immutable — perfect for aggressive caching.Step 2: Capacity Estimation
Storage:
- 100M songs x 10MB average = 1 PB of audio files
- Multiple quality levels (96, 160, 320 kbps) = ~3 PB total
- Metadata: 100M songs x 1KB = 100 GB
Bandwidth:
- 200M concurrent streams x 320kbps = 64 Tbps peak
- This is enormous — CDN is absolutely essential
Daily streams:
- 500M users x average 30 min/day = 250M hours of audio/day
- ~4B individual song plays per dayStep 3: High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ CLIENT APPS │
│ (iOS, Android, Web, Desktop) │
│ Local cache, playback engine, offline storage │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌────────┐ ┌──────────┐
│ CDN │ │ API │ │ Search │
│ (audio) │ │Gateway │ │ Service │
└──────────┘ └───┬────┘ └──────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Catalog │ │ Playlist │ │ Reco │
│ Service │ │ Service │ │ Engine │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Metadata │ │ Playlist │ │ ML │
│ DB │ │ DB │ │ Pipeline │
└──────────┘ └──────────┘ └──────────┘Step 4: Deep Dive — Audio Streaming
Audio streaming is NOT like video streaming (no visual seeking).
Key differences from video:
- Audio files are small (3-10MB vs 1-10GB for video)
- Gapless playback matters (no buffering between songs)
- Bitrate adaptation is simpler (3 levels vs many for video)
Streaming flow:
1. Client requests song → API returns CDN URL + metadata
2. Client starts HTTP range request to CDN
3. CDN serves from edge cache (cache hit rate: 95%+)
4. Client buffers 10-30 seconds ahead
5. Adaptive bitrate: switch quality based on connection speed
File format:
- Ogg Vorbis (Spotify's primary) at 96/160/320 kbps
- Files pre-encoded and stored at all quality levels
- Each song stored as ~10 second chunks for efficient seeking
Gapless playback:
- While current song plays, pre-fetch next song's first chunk
- Start decoding next song before current ends
- Crossfade or gap removal at the audio decoder level
Client-side caching:
- LRU cache of recently played songs (configurable size)
- Downloaded songs for offline (encrypted with device key)
- Reduces CDN bandwidth by ~30%Step 5: Deep Dive — Search
Search across 100M songs, 10M artists, billions of playlists.
Architecture:
- Elasticsearch cluster for full-text search
- Separate indexes: songs, artists, albums, playlists, users
- Custom ranking that combines text relevance + popularity
Search ranking signals:
1. Text match quality (exact > prefix > fuzzy)
2. Popularity (stream count, follower count)
3. Recency (newer releases boosted slightly)
4. Personalization (artists you listen to ranked higher)
5. Region (local artists boosted in their country)
Typeahead / autocomplete:
- Trie-based index for prefix matching
- Updated every few hours from search logs
- Top 5-10 suggestions in < 50ms
Handling misspellings:
- Levenshtein distance for fuzzy matching
- n-gram tokenization (break "metallica" into "met", "eta", "tal"...)
- Phonetic matching (Soundex/Metaphone) for name searchesStep 6: Recommendation Engine
Spotify's recommendations power Discover Weekly, Daily Mix, and Radio.
Three approaches combined:
1. Collaborative Filtering
"Users who liked X also liked Y"
- Build user-song matrix (500M users x 100M songs — sparse)
- Matrix factorization (ALS algorithm) to find latent features
- Find similar users, recommend their top songs
2. Content-Based Filtering
"Songs that sound like what you play"
- Audio features: tempo, key, energy, danceability, valence
- Extracted via ML models from raw audio
- Recommend songs with similar audio fingerprints
3. Natural Language Processing
"What people say about this music"
- Crawl blogs, reviews, social media for music descriptions
- Build word vectors for each song/artist
- Match user taste profile to song descriptions
Discover Weekly pipeline:
1. Run collaborative filtering weekly (batch job)
2. Filter: remove songs user already knows
3. Filter: ensure genre diversity
4. Rank by predicted listen probability
5. Generate 30 songs per user
6. Cache results, serve Monday morning
Infrastructure:
- Apache Spark for batch recommendation jobs
- Feature store (user listening history, song features)
- A/B testing framework for algorithm improvementsStep 7: Playlist Service
Playlists are surprisingly complex at scale:
- Billions of playlists
- Collaborative playlists (multiple editors)
- Ordering, deduplication, version history
Data model:
- Playlist metadata: id, name, owner, description, cover_image
- Playlist tracks: ordered list of (track_id, added_by, added_at)
- Store as ordered array in database
Collaborative playlists:
- Operational Transform or CRDT for concurrent edits
- In practice: last-write-wins with conflict resolution
- Version history allows undo
Popular playlists (millions of followers):
- Cache aggressively (read-heavy)
- Update propagation via pub/sub when playlist changes
- Eventual consistency is fine (30-second delay acceptable)Key Takeaways for the Interview
- Audio ≠ video: Different streaming challenges — gapless playback, smaller files, simpler bitrate adaptation
- CDN is critical: 95%+ cache hit rate for audio. Pre-encode at all quality levels
- Three recommendation approaches: Collaborative filtering + content-based + NLP, combined
- Client caching saves bandwidth: Local LRU cache reduces CDN load by ~30%
- Search is multi-signal: Text relevance + popularity + personalization + region
Practice This on HireReady
Streaming system design questions appear at Spotify, Apple, Netflix, and YouTube. Practice talking through your design with our AI voice interviewer.