System design interviews are not trivia tests. The interviewer is not checking whether you memorized the architecture of Twitter or can recite the CAP theorem. They want to see how you think through an ambiguous, open-ended problem. Can you break it down? Can you make decisions under uncertainty? Can you explain why you chose X over Y? Can you identify what matters and ignore what does not? These are the skills that matter in senior engineering roles, and the interview is designed to surface them in 45 minutes.
This lesson gives you the complete framework: how to structure your time, what to cover in each phase, which mistakes to avoid, and a checklist of components you should consider for any design problem.
The 45-Minute Framework
Every system design interview, regardless of the problem, follows the same structure. The time allocation is not a suggestion — it is a survival strategy. Candidates who skip phases or spend 20 minutes on requirements gathering fail.
Phase 1: Requirements & Scope 5 minutes
Phase 2: Back-of-Envelope Estimation 5 minutes
Phase 3: API Design 5 minutes
Phase 4: High-Level Architecture 15 minutes
Phase 5: Deep Dive 10 minutes
Phase 6: Wrap-Up & Extensions 5 minutes
-----------
45 minutesLet’s walk through each phase.
Phase 1: Requirements and Scope (5 minutes)
The problem is deliberately vague. “Design a chat application” could mean Slack, WhatsApp, Discord, or a customer support widget. Your first job is to narrow it down.
Functional Requirements
Ask clarifying questions. Do not assume. The interviewer wants to see you ask.
You: "Before I start designing, I'd like to clarify a few things.
Is this a 1:1 chat, group chat, or both?"
Interviewer: "Both. Groups up to 500 members."
You: "Do we need to support media messages — images, files, voice?"
Interviewer: "Text and images for now."
You: "What about message history? Persistent or ephemeral?"
Interviewer: "Persistent. Users should see full chat history."
You: "Read receipts? Typing indicators? Online status?"
Interviewer: "Read receipts yes. Others are nice-to-have."Write down the agreed requirements. This becomes your contract for the rest of the interview.
Functional Requirements:
- 1:1 and group chat (up to 500 members)
- Text and image messages
- Persistent message history
- Read receipts
- Push notifications for offline usersNon-Functional Requirements
These matter more than functional ones. They drive your architecture.
Non-Functional Requirements:
- Scale: 50M DAU, 500M messages/day
- Latency: messages delivered in < 500ms
- Availability: 99.99% uptime
- Durability: zero message loss
- Ordering: messages appear in send order within a chatIf the interviewer does not volunteer scale numbers, propose them yourself: “I’ll assume we’re designing for a scale similar to WhatsApp — around 50 million daily active users. Does that sound reasonable?”
Phase 2: Back-of-Envelope Estimation (5 minutes)
Estimation accomplishes two things: it proves you can reason about scale, and it reveals the technical constraints that drive your design decisions.
Template for Any Problem
Users and Traffic:
DAU: 50M
Avg messages/user/day: 10
Total messages/day: 500M
QPS (avg): 500M / 86400 = ~5,800
QPS (peak, 3x): ~17,400
Storage:
Avg message size: 200 bytes (text) or 200KB (image)
Text messages/day: 450M * 200B = 90GB/day
Image messages/day: 50M * 200KB = 10TB/day
5-year text storage: ~165TB
5-year image storage: ~18PB
Bandwidth:
Incoming: 10TB/day = ~115MB/s sustained
Outgoing (fan-out): With avg 5 recipients, 5x read amplification
~575MB/s outgoing
Connections:
Concurrent connections: 50M DAU, ~30% online at once = 15M WebSocket connections
Per server (10K conn): ~1,500 WebSocket serversDo not spend more than 5 minutes on this. Round aggressively. The point is order-of-magnitude thinking, not precise math. If you calculated 5,800 QPS, say “roughly 6,000 QPS, peaking at around 18,000.”
Estimation Shortcuts
These approximations are useful across many problems:
Time:
1 day = 86,400 seconds (~100K for quick math)
1 year = 31.5M seconds (~30M for quick math)
Storage:
1 char = 1 byte (ASCII) or 2-4 bytes (UTF-8)
Average tweet/message: 100-200 bytes
Average image (compressed): 200KB-500KB
Average video (1 min, compressed): 10MB
Scale references:
Twitter: 500M tweets/day, 300M MAU
WhatsApp: 100B messages/day, 2B MAU
YouTube: 500 hours uploaded/min, 1B hours watched/day
Uber: 20M rides/day
Network:
1 Gbps link: ~125 MB/s
Single server: 10K-50K concurrent connections
Single DB server: 5K-10K QPS (depends on query complexity)
Redis single node: 100K+ QPSPhase 3: API Design (5 minutes)
Define the core API endpoints. This forces you to think about the interface before the implementation.
# Core APIs for a chat system
POST /api/v1/messages
Body: { chat_id, content, type: "text"|"image", media_url? }
Response: { message_id, timestamp, status }
GET /api/v1/chats/{chat_id}/messages?cursor=xxx&limit=50
Response: { messages: [...], next_cursor, has_more }
POST /api/v1/chats
Body: { type: "1:1"|"group", member_ids: [...], name? }
Response: { chat_id }
PUT /api/v1/messages/{message_id}/read
Response: { status: "ok" }
WebSocket /ws/v1/connect?token=xxx
Events: message.new, message.read, typing.start, typing.stop, presence.updateKeep it concise. Three to five endpoints. Use standard REST conventions. Mention pagination (cursor-based, not offset-based). Note which operations use WebSocket vs REST.
Phase 4: High-Level Architecture (15 minutes)
This is the core of the interview. Draw the major components and explain how data flows through the system.
The Standard Building Blocks
Almost every system design uses some subset of these components. Start here and add what you need.
[Clients] --> [DNS] --> [CDN] --> [Load Balancer]
|
[API Gateway / Reverse Proxy]
/ \
[App Servers] [WebSocket Servers]
| |
[Cache (Redis)] [Message Queue (Kafka)]
| |
[Database] [Workers / Consumers]
(Primary + Replicas)
|
[Object Storage (S3)]How to Present It
Walk through the architecture by following a request:
"Let me trace what happens when User A sends a message to User B.
1. User A's client sends the message over their WebSocket connection
to a WebSocket server.
2. The WebSocket server publishes the message to Kafka,
topic 'chat-messages', partitioned by chat_id.
3. A message consumer reads from Kafka, persists the message
to the database, and looks up which WebSocket server
User B is connected to.
4. The consumer sends the message to User B's WebSocket server
via an internal pub/sub channel (Redis Pub/Sub).
5. User B's WebSocket server pushes the message to User B's client.
6. If User B is offline, a separate consumer triggers a push
notification via APNs/FCM."This approach is powerful because it shows data flow, not just boxes. The interviewer sees that you understand how the pieces fit together.
Architecture Patterns by Problem Type
Different problems call for different patterns. Here is a quick reference:
Read-heavy (news feed, timeline):
- Fan-out on write (precompute feeds)
- Heavy caching (Redis, CDN)
- Read replicas
Write-heavy (logging, analytics, IoT):
- Append-only logs (Kafka, Kinesis)
- LSM-tree databases (Cassandra, RocksDB)
- Batch processing (Spark, Flink)
Real-time (chat, notifications, live updates):
- WebSocket / SSE for persistent connections
- In-memory pub/sub (Redis Pub/Sub)
- Message queues for async processing
Storage-heavy (file storage, media):
- Object storage (S3, GCS)
- Chunking and deduplication
- CDN for content delivery
Search-heavy (search engine, product catalog):
- Inverted index (Elasticsearch, Solr)
- Ranking and relevance scoring
- Autocomplete with tries or prefix indicesThe Component Checklist
Use this checklist to make sure you have not missed a critical piece. You do not need every component for every problem — but you should consciously decide what to include and what to skip.
Networking and Traffic
| Component | When to Include | Key Decisions |
|---|---|---|
| DNS | Always (mention briefly) | Round-robin vs geo-routing |
| CDN | When serving static assets or media | Pull vs push CDN, cache TTL |
| Load Balancer | Always | L4 vs L7, algorithm (round-robin, least-connections, consistent hashing) |
| API Gateway | When you need auth, rate limiting, routing | Combined with LB or separate |
| Rate Limiter | When facing abuse or uneven traffic | Token bucket vs sliding window, per-user vs per-IP |
Compute
| Component | When to Include | Key Decisions |
|---|---|---|
| App Servers | Always | Stateless, horizontal scaling |
| WebSocket Servers | Real-time features | Connection management, server affinity |
| Workers / Consumers | Async processing | Queue-based, scaling with queue depth |
| Cron / Scheduler | Periodic tasks | Idempotency, distributed locking |
Data
| Component | When to Include | Key Decisions |
|---|---|---|
| SQL Database | Structured data, relationships, ACID | MySQL vs Postgres, sharding strategy |
| NoSQL Database | High write throughput, flexible schema | Cassandra, DynamoDB, MongoDB |
| Cache (Redis) | Read-heavy, latency-sensitive | Cache-aside vs write-through, TTL, eviction |
| Search Engine | Full-text search, ranking | Elasticsearch, indexing strategy |
| Object Storage (S3) | Files, images, videos | Bucket organization, lifecycle policies |
| Message Queue | Async, decoupling, buffering | Kafka vs RabbitMQ, partitioning, consumer groups |
Cross-Cutting Concerns
| Concern | What to Mention | Common Solutions |
|---|---|---|
| Monitoring | Metrics, logging, alerting | Prometheus, Grafana, ELK stack |
| Security | Auth, encryption, input validation | OAuth2, JWT, TLS, HTTPS |
| Data Privacy | GDPR, user data deletion | Soft deletes, anonymization |
| Disaster Recovery | Backups, multi-region | Cross-region replication, RTO/RPO targets |
Phase 5: Deep Dive (10 minutes)
The interviewer will ask you to go deeper on one or two areas. This is where you demonstrate real expertise. Common deep-dive topics:
Database Schema
-- Chat system example
CREATE TABLE messages (
message_id BIGINT PRIMARY KEY,
chat_id BIGINT NOT NULL,
sender_id BIGINT NOT NULL,
content TEXT,
message_type VARCHAR(10), -- 'text', 'image'
media_url VARCHAR(500),
created_at TIMESTAMP DEFAULT NOW(),
INDEX (chat_id, created_at)
);
-- Partition by chat_id for locality
-- Shard by chat_id % N for horizontal scalingBe ready to discuss: why this schema, how it’s indexed, how it scales, what the access patterns are.
Sharding Strategy
"I'd shard the messages table by chat_id.
Why chat_id and not user_id?
- All messages in a chat live on the same shard
- Fetching chat history is a single-shard query
- No need for scatter-gather across shards
If user_id were the shard key, loading a single chat
would require querying every shard (since members are
on different shards). That's N cross-shard queries
per page load -- it doesn't scale."Caching Strategy
"For the chat system, I'd use Redis for two things:
1. Recent messages cache: last 50 messages per active chat
Key: chat:{chat_id}:recent
TTL: 24 hours (refreshed on access)
This avoids hitting the database for every chat open.
2. User presence: which users are online
Key: presence:{user_id}
TTL: 60 seconds (refreshed by heartbeat)
This avoids querying the WebSocket servers directly.
Cache invalidation: new messages append to the list
and trim to 50. Consistent with the database because
every write goes through the same service."Failure Scenarios
This is where strong candidates separate themselves. Proactively discuss what breaks.
"What happens when a WebSocket server crashes?
1. All 10K connections on that server drop.
2. Clients detect the disconnect and reconnect
(with exponential backoff) to a different server.
3. On reconnect, the client sends its last received
message_id. The server replays any missed messages
from the database.
4. No messages are lost because they're persisted to
the database via Kafka before being pushed to clients.
The key invariant: Kafka and the database are the source
of truth. WebSocket servers are stateless delivery vehicles.
If one dies, clients reconnect and catch up."How to Discuss Tradeoffs
Tradeoffs are the most important thing in a system design interview. Every decision has a cost. Articulating both sides shows maturity.
The Tradeoff Template
"I'm choosing X over Y because [reason].
The tradeoff is [downside of X].
We could mitigate that by [mitigation]."Common Tradeoffs to Discuss
Consistency vs Availability
"For the chat system, I prioritize availability over strong consistency.
Users can tolerate seeing a message 500ms late, but they cannot tolerate
the chat being down. So I'll use eventual consistency with a conflict
resolution mechanism.
If this were a banking system, I'd flip this -- strong consistency
is non-negotiable for financial transactions, even at the cost
of higher latency."SQL vs NoSQL
"I'm choosing Cassandra for the message store because:
- Write throughput: 500M messages/day needs horizontal write scaling
- Access pattern: always query by (chat_id, time_range) -- perfect for Cassandra's partition key model
- No complex joins needed
The tradeoff: no ACID transactions, no ad-hoc queries, harder operational model.
For user profiles and chat membership, I'd still use PostgreSQL
because those have relationships and need consistency."Push vs Pull
"For the news feed, I'm using fan-out on write (push model):
- When a user posts, we precompute the feed for all their followers
- Reading the feed is a single cache/DB lookup -- very fast
- Tradeoff: celebrity users with 10M followers create write amplification
For celebrities (>10K followers), I'd switch to fan-out on read:
- Don't precompute. When a user opens their feed, merge in celebrity posts at read time
- This hybrid approach avoids the worst case of both strategies"Cache-Aside vs Write-Through
"I'm using cache-aside (lazy loading):
- On cache miss, read from DB and populate cache
- On write, invalidate cache (not update)
- Tradeoff: first read after invalidation hits the DB (cache miss penalty)
Write-through would keep the cache always fresh, but it adds latency
to every write and caches data that might never be read.
For our read-heavy workload (100:1 read-write ratio), cache-aside
is the better fit."Monolith vs Microservices
"At our scale (50M DAU), I'd use microservices for core paths:
- Message service, presence service, notification service
- Each can scale independently
- Teams can deploy independently
But I wouldn't split everything. Auth, rate limiting, and logging
stay in the API gateway. Over-decomposition creates more problems
than it solves -- distributed transactions, debugging difficulty,
network overhead."Common Mistakes
These are the patterns that tank system design interviews.
Jumping to the solution. You draw a diagram in the first minute without understanding what you’re building. The interviewer wanted a file storage system and you designed a CDN. Always start with requirements.
No estimation. You propose Redis for caching but never calculated whether the data fits in memory. You suggest a single PostgreSQL instance but the write volume needs sharding. Estimation grounds your design in reality.
Ignoring non-functional requirements. Your design handles the happy path but you never discussed: What happens when a server crashes? What happens when the database is full? What happens when traffic spikes 10x? Non-functional requirements are what make a design production-ready.
Not discussing tradeoffs. You say “I’ll use Kafka” but not why. You pick DynamoDB without explaining what you give up compared to PostgreSQL. Every technology choice is a tradeoff. If you can’t articulate both sides, the interviewer assumes you don’t understand the choice.
Over-engineering. You propose Kubernetes, service mesh, multi-region active-active replication, and event sourcing for a system that handles 100 QPS. Match the complexity of your design to the scale of the problem.
Silent design. You think for three minutes without saying anything, then present a finished diagram. The interviewer cannot evaluate your thought process if they can’t hear it. Think out loud. Say “I’m considering X because…” even while you’re still deciding.
Single points of failure. You have one database with no replicas, one cache with no fallback, one service with no redundancy. Always ask yourself: “What happens if this component goes down?”
Communication Tips
Drive the conversation. Do not wait for the interviewer to tell you what to do next. Move through the phases yourself: “Now that we’ve agreed on requirements, let me do a quick estimation.”
Use the whiteboard (or shared doc) actively. Draw as you talk. Label components. Draw arrows showing data flow. A visual design is easier to discuss and critique than a verbal one.
Check in with the interviewer. After the high-level design: “Does this look reasonable so far? Is there an area you’d like me to dive deeper into?” This shows collaboration and lets the interviewer steer you toward what they want to evaluate.
Acknowledge uncertainty. “I’m not sure about the exact throughput of Kafka on a single broker, but I know it’s in the hundreds of thousands of messages per second range. For our load of 18K QPS, a small Kafka cluster should be sufficient.” This is better than either guessing a precise number or saying “I don’t know.”
Name your assumptions. “I’m assuming users are distributed globally, so I’ll include a CDN and multi-region deployment.” If the assumption is wrong, the interviewer will correct you. That is a good thing.
Phase 6: Wrap-Up and Extensions (5 minutes)
In the last few minutes, summarize your design and proactively mention extensions you did not have time to cover.
"To summarize: we have a chat system that handles 50M DAU with
WebSocket servers for real-time delivery, Kafka for reliable
message processing, Cassandra for message storage, Redis for
caching and presence, and a push notification service for
offline users.
If I had more time, I'd discuss:
- End-to-end encryption (Signal Protocol)
- Message search (Elasticsearch index on message content)
- Multi-region deployment for global latency
- Abuse detection and content moderation
- Message retention policies and GDPR compliance"This shows breadth of knowledge without requiring time to design each extension in detail.
Practice Problem List
Here are the most common system design problems, grouped by difficulty. The first four in this course (Lessons 11-14) cover the most frequently asked ones. Practice at least one from each category.
Warm-Up (30-minute problems)
1. URL Shortener (TinyURL)
Focus: Hashing, database design, read-heavy scaling
2. Paste Bin
Focus: Object storage, expiration, content addressing
3. Rate Limiter
Focus: Token bucket, sliding window, distributed countingCore (45-minute problems)
4. News Feed / Timeline (Facebook, Twitter)
Focus: Fan-out, ranking, caching, real-time updates
5. Chat System (WhatsApp, Slack)
Focus: WebSockets, message ordering, presence, group chat
6. Notification System
Focus: Multi-channel, templates, dedup, priority queues
7. File Storage (Dropbox, Google Drive)
Focus: Chunking, sync, dedup, conflict resolution
8. Web Crawler
Focus: URL frontier, politeness, dedup, distributed crawlingAdvanced (60-minute problems)
9. Search Autocomplete (Typeahead)
Focus: Trie, ranking, precomputation, caching
10. Distributed Key-Value Store
Focus: Consistent hashing, replication, vector clocks
11. Video Streaming (YouTube, Netflix)
Focus: Encoding pipeline, adaptive bitrate, CDN, recommendations
12. Ride-Sharing (Uber, Lyft)
Focus: Geospatial indexing, matching, real-time tracking, surge pricing
13. Payment System (Stripe)
Focus: Idempotency, distributed transactions, reconciliation, PCI compliance
14. Distributed Task Scheduler
Focus: Priority queues, sharding, exactly-once execution, failure recoveryFor Each Practice Problem, Cover:
1. Requirements (functional + non-functional)
2. Estimation (storage, QPS, bandwidth)
3. API (3-5 core endpoints)
4. High-level architecture (trace a request end-to-end)
5. Deep dive on 2 components (schema, caching, sharding)
6. Tradeoffs (at least 3 explicit tradeoff discussions)
7. Failure scenarios (what breaks, how you recover)Quick-Reference: Estimation Cheat Sheet
Tape this to your wall. These numbers come up constantly.
Latency Numbers Every Engineer Should Know:
L1 cache reference: 1 ns
L2 cache reference: 4 ns
RAM reference: 100 ns
SSD random read: 16 us
HDD random read: 2 ms
Round trip same datacenter: 500 us
Round trip US coast-to-coast: 40 ms
Round trip US-to-Europe: 80 ms
Throughput References:
Single MySQL server: 5K-10K QPS (simple queries)
Single Redis server: 100K+ QPS
Single Kafka broker: 200K+ messages/s
Single Elasticsearch node: 5K-20K queries/s
Single app server (API): 1K-10K QPS (depends on logic)
Storage Conversions:
1 KB = 1,000 bytes
1 MB = 1,000 KB = 10^6 bytes
1 GB = 1,000 MB = 10^9 bytes
1 TB = 1,000 GB = 10^12 bytes
1 PB = 1,000 TB = 10^15 bytes
Availability:
99%: 3.65 days downtime/year
99.9%: 8.76 hours downtime/year
99.99%: 52.6 minutes downtime/year
99.999%: 5.26 minutes downtime/yearKey Takeaways
- Follow the framework religiously: requirements (5 min), estimation (5 min), API (5 min), high-level design (15 min), deep dive (10 min), wrap-up (5 min). Deviating from this structure is the most common cause of running out of time.
- Ask clarifying questions first. The problem is intentionally vague. Narrowing scope shows maturity. Writing down agreed requirements gives you a contract to design against.
- Estimation is not about precision. It is about proving you can reason about scale and using the results to justify design decisions. “We have 6K QPS so a single database won’t cut it” is the kind of insight estimation provides.
- Trace a request end-to-end through your architecture. This is more convincing than listing components. “User clicks send, the message hits the WebSocket server, goes to Kafka, gets persisted, and is pushed to the recipient” — that tells a story.
- Every technology choice is a tradeoff. State the tradeoff explicitly. “I chose Cassandra over PostgreSQL because we need horizontal write scaling, at the cost of giving up ACID transactions.” One-sided reasoning signals inexperience.
- Proactively discuss failure scenarios. What happens when a server crashes, a database goes down, or traffic spikes 10x? This separates senior candidates from mid-level ones.
- Think out loud. The interviewer is evaluating your thought process, not your final answer. Silent thinking for three minutes followed by a perfect diagram is less impressive than walking through your reasoning step by step.
- Do not over-engineer. A URL shortener does not need Kubernetes, event sourcing, and multi-region active-active replication. Match the complexity of your solution to the scale of the problem.
- Practice by writing, not just reading. For each problem, actually draw the architecture, write the schema, calculate the numbers, and articulate the tradeoffs out loud. Reading about system design and doing system design are different skills.
