System Design Masterclass
March 28, 2026|11 min read
Lesson 11 / 15

11. Designing a URL Shortener

TL;DR

A URL shortener maps long URLs to short codes via base62 encoding of a unique ID. Use a counter (auto-increment or distributed ID generator) for uniqueness. Cache hot URLs in Redis. Use 301 (permanent) or 302 (temporary) redirects. At scale: shard by short code, use read replicas, and CDN for redirect caching. Add analytics with async event logging.

The URL shortener is the “Hello World” of system design interviews. It seems simple — take a long URL, return a short one — but it touches on encoding algorithms, distributed ID generation, caching strategy, database design, and analytics pipelines. It is a great problem because every component has meaningful tradeoffs.

Let us design a URL shortener from scratch: define requirements, estimate scale, choose encoding strategies, design the data model, and architect the system for millions of URLs and billions of redirects.

URL shortener system architecture

Step 1: Requirements

Functional Requirements

  1. Shorten: Given a long URL, generate a unique short URL.
  2. Redirect: Given a short URL, redirect to the original long URL.
  3. Custom aliases: Users can optionally specify a custom short code.
  4. Expiration: URLs can optionally have an expiration time.
  5. Analytics: Track click counts, referrers, geolocation, and timestamps.

Non-Functional Requirements

  1. Low latency: Redirects must complete in under 10ms (p99).
  2. High availability: The redirect path cannot go down. 99.99% uptime.
  3. Scale: Handle 100M URLs created per month, 10B redirects per month.
  4. Durability: Once created, a URL mapping must not be lost.

Out of Scope (for now)

  • User accounts and authentication
  • Rate limiting (covered in a separate lesson)
  • Spam/phishing URL detection

Step 2: Back-of-Envelope Estimation

URL creation:
  100M new URLs per month
  = ~40 URLs per second (average)
  = ~400 URLs per second (peak, assuming 10x)

Redirects:
  10B redirects per month (100:1 read-to-write ratio)
  = ~4,000 redirects per second (average)
  = ~40,000 redirects per second (peak)

Storage:
  Each URL mapping: ~500 bytes (short code + long URL + metadata)
  100M URLs/month * 12 months * 5 years = 6B URLs
  6B * 500 bytes = 3 TB total storage

Bandwidth:
  40K redirects/sec * 500 bytes = 20 MB/s (modest)

Cache:
  80/20 rule: 20% of URLs get 80% of traffic
  Hot URL set: 6B * 20% * 500 bytes = 600 GB (fits in distributed Redis)

The system is heavily read-dominant. The write path (create URL) is 100x less frequent than the read path (redirect). This shapes every design decision.

Step 3: Encoding Strategies

The core problem: how do you generate a unique short code for each URL?

URL shortener encoding — base62 and hash approaches

Approach 1: Base62 Encoding of a Counter

The most straightforward approach. Maintain a global counter. For each new URL, increment the counter and encode the number in base62.

import string

ALPHABET = string.digits + string.ascii_lowercase + string.ascii_uppercase
BASE = len(ALPHABET)  # 62

def encode_base62(num: int) -> str:
    """Convert integer to base62 string."""
    if num == 0:
        return ALPHABET[0]

    result = []
    while num > 0:
        result.append(ALPHABET[num % BASE])
        num //= BASE

    return ''.join(reversed(result))

def decode_base62(s: str) -> int:
    """Convert base62 string back to integer."""
    num = 0
    for char in s:
        num = num * BASE + ALPHABET.index(char)
    return num

# Examples:
# encode_base62(1)          -> "1"
# encode_base62(1000000)    -> "4c92"
# encode_base62(56800235584) -> "zzzzzz"  (6-char max)

Code length analysis:

Characters Possible codes Coverage
6 62^6 = 56.8 billion Enough for decades
7 62^7 = 3.5 trillion Overkill for most systems

A 7-character code gives us 3.5 trillion unique URLs. At 100M per month, that is 35,000 months (almost 3,000 years).

How to maintain the counter in a distributed system:

Option A — Database auto-increment: Use a single MySQL/Postgres instance as the ID generator. Simple, but it is a single point of failure.

Option B — Range-based allocation: Each application server reserves a range of IDs (e.g., Server 1 gets 1-10000, Server 2 gets 10001-20000). A coordination service (ZooKeeper, etcd) manages range assignments.

class RangeCounter:
    """Each server gets a pre-allocated range of IDs."""

    def __init__(self, range_size=10000):
        self.range_size = range_size
        self.current = 0
        self.max = 0

    async def next_id(self) -> int:
        if self.current >= self.max:
            # Request new range from coordination service
            start = await self._allocate_range()
            self.current = start
            self.max = start + self.range_size

        self.current += 1
        return self.current - 1

    async def _allocate_range(self) -> int:
        # Atomic increment in ZooKeeper/etcd/Redis
        # Returns the start of the new range
        return await redis.incrby("url_counter", self.range_size)

Option C — Snowflake IDs: Use Twitter’s Snowflake algorithm to generate 64-bit unique IDs that are roughly time-ordered. Each worker generates IDs independently without coordination.

import time

class SnowflakeGenerator:
    """
    64-bit ID:
      1 bit unused | 41 bits timestamp | 10 bits machine ID | 12 bits sequence
    """
    EPOCH = 1609459200000  # Custom epoch (2021-01-01)

    def __init__(self, machine_id: int):
        self.machine_id = machine_id & 0x3FF  # 10 bits
        self.sequence = 0
        self.last_timestamp = -1

    def next_id(self) -> int:
        timestamp = int(time.time() * 1000) - self.EPOCH

        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & 0xFFF  # 12 bits
            if self.sequence == 0:
                # Wait for next millisecond
                while timestamp == self.last_timestamp:
                    timestamp = int(time.time() * 1000) - self.EPOCH
        else:
            self.sequence = 0

        self.last_timestamp = timestamp

        return (timestamp << 22) | (self.machine_id << 12) | self.sequence

The base62 encoding of a Snowflake ID will be longer (10-11 chars) because the IDs are 64-bit. If you want short codes (6-7 chars), use the range-based counter approach.

Approach 2: Hash-Based (MD5/SHA-256)

Hash the long URL and take the first N characters.

import hashlib

def hash_url(long_url: str, length: int = 7) -> str:
    """Generate short code by hashing the URL."""
    hash_hex = hashlib.md5(long_url.encode()).hexdigest()
    # Convert hex to base62
    hash_int = int(hash_hex, 16)
    code = encode_base62(hash_int)[:length]
    return code

The collision problem: With 7-character codes, different URLs can produce the same short code. You must handle this:

async def create_short_url(long_url: str) -> str:
    code = hash_url(long_url)

    # Check for collision
    attempts = 0
    while await db.exists("urls", code):
        existing = await db.get("urls", code)
        if existing["long_url"] == long_url:
            return code  # Same URL, return existing code

        # Collision! Append a suffix and re-hash
        attempts += 1
        code = hash_url(long_url + str(attempts))

    await db.insert("urls", {"code": code, "long_url": long_url})
    return code

Hash-based approaches give you the nice property that the same long URL always gets the same short code (deduplication). But collision handling adds complexity and requires a read-before-write pattern that is harder to scale.

Approach 3: Pre-Generated Key Service

Generate all possible short codes in advance. Store them in a database. When a new URL needs a code, pop one from the pool.

class KeyService:
    """Pre-generates keys and serves them from a pool."""

    def __init__(self):
        self.used_keys_table = "used_keys"
        self.available_keys_table = "available_keys"

    async def get_key(self) -> str:
        # Atomic pop from available keys
        key = await redis.spop("available_keys")
        if key is None:
            raise Exception("Key pool exhausted — generate more keys")
        await redis.sadd("used_keys", key)
        return key.decode()

    async def generate_keys(self, count: int):
        """Bulk-generate random base62 keys."""
        keys = set()
        while len(keys) < count:
            code = ''.join(random.choices(ALPHABET, k=7))
            keys.add(code)

        # Add to available pool (skip any that are already used)
        used = await redis.smembers("used_keys")
        new_keys = keys - used
        if new_keys:
            await redis.sadd("available_keys", *new_keys)

This eliminates collision handling entirely and makes URL creation a simple key lookup. The downside is that you need a separate service managing the key pool, and you must monitor pool exhaustion.

Which Approach to Use?

Approach Pros Cons Best For
Base62 counter Simple, no collisions, predictable Requires coordination for distributed counter Most systems
Hash-based Same URL gets same code (dedup) Collisions, read-before-write Systems needing URL dedup
Pre-generated keys No collision, fast creation Extra infrastructure, pool management Very high write throughput

Recommendation: Start with the range-based counter + base62 encoding. It is simple, collision-free, and scales well.

Step 4: Database Design

Schema

CREATE TABLE urls (
    code        VARCHAR(7) PRIMARY KEY,
    long_url    TEXT NOT NULL,
    created_at  TIMESTAMP DEFAULT NOW(),
    expires_at  TIMESTAMP NULL,
    user_id     BIGINT NULL,
    click_count BIGINT DEFAULT 0
);

-- Index for looking up by long URL (dedup)
CREATE INDEX idx_long_url ON urls (long_url);

-- Index for expiration cleanup
CREATE INDEX idx_expires ON urls (expires_at) WHERE expires_at IS NOT NULL;

Database Choice

The URL table has a very simple access pattern:

  • Write: Insert a row (code -> long_url)
  • Read: Lookup by code (primary key lookup)

This is a key-value workload. Any database works, but some choices are better than others:

  • DynamoDB / Cassandra: Natural key-value stores. Partition by code. Excellent for this workload.
  • PostgreSQL / MySQL: Works fine at moderate scale. Use read replicas for the heavy read path.
  • Redis (as primary store): If your dataset fits in memory (~600 GB for 6B URLs at 100 bytes each, not counting long URLs). Extremely fast, but expensive and risky for durability.

For a production URL shortener, DynamoDB is the pragmatic choice: single-digit millisecond reads, automatic scaling, and no operational overhead.

Step 5: Read Path (Redirect)

URL shortener redirect flow with caching

The redirect path is the hot path. It must be fast and reliable.

@app.get("/{code}")
async def redirect(code: str):
    # 1. Check cache first
    long_url = await redis_cache.get(f"url:{code}")

    if long_url is None:
        # 2. Cache miss — query database
        row = await db.get("urls", code)

        if row is None:
            raise HTTPException(status_code=404, detail="URL not found")

        if row["expires_at"] and row["expires_at"] < datetime.utcnow():
            raise HTTPException(status_code=410, detail="URL expired")

        long_url = row["long_url"]

        # 3. Populate cache (TTL of 24 hours)
        await redis_cache.setex(f"url:{code}", 86400, long_url)

    # 4. Log the click asynchronously (do not block the redirect)
    asyncio.create_task(log_click(code, request))

    # 5. Redirect
    return RedirectResponse(url=long_url, status_code=302)

301 vs 302 Redirect

This is a critical design decision:

301 (Moved Permanently): The browser caches the redirect. Subsequent visits to the short URL go directly to the long URL without hitting your server. Faster for users, but you lose analytics visibility.

302 (Found / Temporary): The browser does NOT cache the redirect. Every visit hits your server. Slightly slower, but you can track every click.

Most URL shorteners use 302 because analytics is a core feature. If you do not need analytics, 301 reduces server load dramatically.

Caching Strategy

The 80/20 rule applies strongly: a small fraction of URLs get most of the traffic.

# Multi-tier caching approach
class URLCache:
    def __init__(self):
        # L1: In-process LRU cache (1M entries, ~100MB)
        self.local_cache = LRUCache(maxsize=1_000_000)
        # L2: Redis cluster (distributed, shared across servers)
        self.redis = redis_cluster

    async def get(self, code: str) -> Optional[str]:
        # Check L1
        url = self.local_cache.get(code)
        if url:
            return url

        # Check L2
        url = await self.redis.get(f"url:{code}")
        if url:
            self.local_cache.put(code, url)
            return url

        return None

    async def put(self, code: str, long_url: str):
        self.local_cache.put(code, long_url)
        await self.redis.setex(f"url:{code}", 86400, long_url)

With this setup, the vast majority of redirects are served from the in-process cache (sub-millisecond) without even hitting Redis.

CDN-Level Caching

For extremely hot URLs (viral content), put a CDN in front of the redirect service:

User -> CDN (Cloudflare/CloudFront) -> Load Balancer -> App Servers -> Cache -> DB

Configure the CDN to cache 302 redirects for a short TTL (e.g., 60 seconds). This absorbs traffic spikes without touching your infrastructure. The tradeoff: analytics granularity drops to 1-minute resolution for cached URLs.

Step 6: Write Path (Create URL)

@app.post("/api/shorten")
async def shorten(request: ShortenRequest):
    long_url = request.url
    custom_alias = request.custom_alias  # Optional
    expires_in = request.expires_in      # Optional (seconds)

    # Validate URL
    if not is_valid_url(long_url):
        raise HTTPException(status_code=400, detail="Invalid URL")

    # Handle custom alias
    if custom_alias:
        if not is_valid_alias(custom_alias):
            raise HTTPException(status_code=400, detail="Invalid alias")
        if await db.exists("urls", custom_alias):
            raise HTTPException(status_code=409, detail="Alias already taken")
        code = custom_alias
    else:
        # Generate code via counter + base62
        counter_value = await counter.next_id()
        code = encode_base62(counter_value)

    # Calculate expiration
    expires_at = None
    if expires_in:
        expires_at = datetime.utcnow() + timedelta(seconds=expires_in)

    # Store
    url_record = {
        "code": code,
        "long_url": long_url,
        "created_at": datetime.utcnow(),
        "expires_at": expires_at,
        "user_id": request.user_id,
        "click_count": 0
    }
    await db.insert("urls", url_record)

    # Pre-populate cache
    await cache.put(code, long_url)

    short_url = f"https://short.io/{code}"
    return {"short_url": short_url, "code": code, "expires_at": expires_at}

Handling Duplicate URLs

Should the same long URL always get the same short code? It depends:

  • bit.ly approach: Every creation generates a new short code, even for the same URL. Simpler, and different users can have different analytics for the same destination.
  • TinyURL approach: Same URL returns the same short code. Requires a lookup by long URL before creation (add an index on long_url).

If you need dedup, add a lookup step:

# Check if this URL was already shortened
existing = await db.query_one(
    "SELECT code FROM urls WHERE long_url = %s", [long_url]
)
if existing:
    return {"short_url": f"https://short.io/{existing['code']}"}

Step 7: Analytics

Tracking clicks is a core feature, but it must not slow down the redirect path. Use async event logging.

async def log_click(code: str, request: Request):
    """Fire-and-forget click logging."""
    event = {
        "code": code,
        "timestamp": time.time(),
        "ip": request.client.host,
        "user_agent": request.headers.get("user-agent"),
        "referer": request.headers.get("referer"),
        "country": geoip_lookup(request.client.host),
    }

    # Send to Kafka for async processing
    await kafka_producer.send("click-events", json.dumps(event).encode())

    # Increment counter in Redis (for real-time display)
    await redis.incr(f"clicks:{code}")

The analytics pipeline:

Click Event -> Kafka -> Stream Processor (Flink/Spark) -> Analytics DB (ClickHouse)
                                                       -> Update click_count in primary DB (batched)

ClickHouse is excellent for analytics queries like “clicks per day for URL X” or “top URLs by country.” It handles billions of rows with sub-second query times for aggregation workloads.

Step 8: Scaling the System

Database Sharding

At billions of URLs, a single database instance won’t cut it. Shard by the short code:

def get_shard(code: str, num_shards: int = 16) -> int:
    """Consistent shard assignment based on short code."""
    return hash(code) % num_shards

Since the primary access pattern is lookup by code, sharding by code means each read hits exactly one shard. No scatter-gather queries needed.

Read Replicas

The redirect path is 100x more frequent than creation. Add read replicas to each shard:

Write path: App Server -> Primary DB (shard N)
Read path:  App Server -> Cache -> Read Replica (shard N)

Replication lag is acceptable here — a new URL being unresolvable for 100ms after creation is not a problem.

Rate Limiting

Without rate limiting, someone can exhaust your key space or use your service as a spam distribution platform:

# Per-IP rate limiting for URL creation
@rate_limit(key="ip:{request.client.host}", limit=100, window=3600)
@app.post("/api/shorten")
async def shorten(request: ShortenRequest):
    ...

URL Expiration Cleanup

Expired URLs consume storage. Run a periodic cleanup job:

# Cron job — runs every hour
async def cleanup_expired_urls():
    while True:
        deleted = await db.execute(
            "DELETE FROM urls WHERE expires_at < NOW() LIMIT 10000"
        )
        if deleted < 10000:
            break  # No more expired URLs
        await asyncio.sleep(0.1)  # Avoid overloading the DB

The Complete Architecture

                    +---------+
                    |   CDN   |
                    +----+----+
                         |
                    +----+----+
                    |   LB    |
                    +----+----+
                    /    |     \
          +--------+ +--------+ +--------+
          | App S1 | | App S2 | | App S3 |
          +---+----+ +---+----+ +---+----+
              |          |          |
         +----+----------+----------+----+
         |          Redis Cache          |
         +-------------------------------+
              |          |          |
         +----+----+ +--+-----+ +-+------+
         | DB Sh 0 | | DB Sh 1| | DB Sh N|
         | Primary | | Primary| | Primary|
         | + Read  | | + Read | | + Read |
         | Replica | | Replica| | Replica|
         +---------+ +--------+ +--------+
              |
         +----+----+
         |  Kafka  | <-- click events
         +----+----+
              |
         +----+------+
         | ClickHouse| <-- analytics queries
         +-----------+

Key Takeaways

  • Start with requirements and estimation. The numbers drive the design. A URL shortener is 100x read-heavy, which means caching and read replicas matter far more than write optimization.
  • Base62 encoding of a counter is the simplest and most reliable encoding strategy. Use range-based allocation for distributed counter generation.
  • Hash-based encoding adds collision handling complexity but provides natural deduplication. Choose based on whether you need dedup.
  • 302 redirects give you analytics visibility. 301 redirects reduce server load. Most shorteners choose 302.
  • Multi-tier caching (in-process LRU + Redis + CDN) makes the redirect path extremely fast. The vast majority of redirects should be served without hitting the database.
  • Analytics must be async. Never block the redirect path to log a click. Use Kafka or a similar queue for fire-and-forget event logging, and aggregate in a columnar database like ClickHouse.
  • Shard by the short code because the primary access pattern is key lookup. Each read hits exactly one shard.
  • Custom aliases add complexity (uniqueness checking, different length constraints) but are a common product requirement. Handle them as a special case in the write path.
  • Expiration needs a background cleanup job. Do not check expiration only at read time — you also need to reclaim storage.
  • The URL shortener interview question tests breadth. It touches encoding, distributed systems, caching, databases, and analytics. Walk through each layer methodically.