System Design Fundamentals to Advanced

1. Horizontal vs Vertical Scaling

⏱️ Reading Time: 20-25 minutes 📊 Difficulty: Fundamental 🎯 Interview Level: L4-L6

Overview & Introduction

Scaling is one of the most fundamental concepts in system design. When your application grows and starts receiving more traffic, you need to scale your system to handle the increased load. There are two primary approaches to scaling: horizontal scaling (scaling out) and vertical scaling (scaling up).

Understanding when and how to use each approach is crucial for designing systems that can grow efficiently. This decision impacts not just performance, but also cost, reliability, and operational complexity.

Why This Matters

Every system design interview at FAANG companies will touch on scaling. Interviewers expect you to understand the trade-offs between horizontal and vertical scaling, and to make informed decisions based on the specific requirements of the system you're designing.

What is Vertical Scaling (Scaling Up)?

Vertical scaling, also known as "scaling up," involves adding more power (CPU, RAM, storage) to your existing server or machine. Instead of adding more machines, you upgrade the hardware of your current machine.

Vertical Scaling (Scaling Up)

Same server, more powerful hardware

How It Works

You start with a server that has, for example, 4 CPU cores and 16GB RAM
When you need more capacity, you upgrade to 8 CPU cores and 32GB RAM
Or upgrade to 16 CPU cores and 64GB RAM
The application code typically doesn't need to change
You're essentially making your single server more powerful

Real-World Example: AWS EC2 Instance Upgrade

You're running your application on an AWS EC2 t3.medium instance (2 vCPUs, 4GB RAM). As traffic increases, you upgrade to t3.xlarge (4 vCPUs, 16GB RAM), then to m5.2xlarge (8 vCPUs, 32GB RAM). This is vertical scaling - you're making the same instance more powerful.

Vertical Scaling: Instance Upgrade Journey

Key Insight: Vertical scaling is simple - just upgrade the hardware. No code changes, no architectural changes. But you're limited by the most powerful hardware available, and you still have a single point of failure.

Advantages of Vertical Scaling

Simplicity: No code changes required in most cases. Your application continues to run on a single machine.
No Data Distribution: All data remains on one machine, so you don't need to worry about data partitioning or synchronization.
Lower Latency: No network calls between servers, so inter-process communication is faster.
Easier to Implement: Just upgrade hardware or move to a larger cloud instance.
Better for Stateful Applications: Applications that maintain state in memory work well with vertical scaling.

Disadvantages of Vertical Scaling

Hardware Limits: There's a physical limit to how powerful a single machine can be. You can't infinitely upgrade a server.
Single Point of Failure: If your one powerful server fails, your entire system goes down.
Downtime During Upgrades: Upgrading hardware often requires taking the server offline.
Hardware Cost Scaling: More powerful hardware becomes exponentially more expensive. At scale, horizontal scaling with commodity hardware is typically more economical.
Limited Scalability: You can only scale as much as the most powerful available hardware allows.

What is Horizontal Scaling (Scaling Out)?

Horizontal scaling, also known as "scaling out," involves adding more machines or servers to your system. Instead of making one machine more powerful, you add more machines and distribute the load across them.

Horizontal Scaling (Scaling Out)

Multiple servers sharing the load

How It Works

You start with 1 server handling all requests
As traffic increases, you add a 2nd server, then a 3rd, 4th, and so on
Requests are distributed across all servers using a load balancer
Each server runs the same application code
You're essentially creating a cluster of servers

Real-World Example: Web Application Cluster

Your web application starts with 1 server. As users grow, you add 2 more servers behind a load balancer. Now you have 3 servers sharing the load. When traffic spikes, you can quickly add 5 more servers (total 8) to handle the surge. This is horizontal scaling - you're adding more machines rather than making one machine more powerful.

Scaling Journey: From 1 to 8 Servers

Key Insight: Each server maintains the same capacity (1K req/s), but total capacity grows linearly with the number of servers. The load balancer automatically distributes traffic evenly, and you can add or remove servers without downtime.

Advantages of Horizontal Scaling

Nearly Unlimited Scalability: You can add as many servers as needed (within cloud provider limits). There's no theoretical limit.
High Availability: If one server fails, others continue serving traffic. The system remains operational.

Cost Efficiency at Scale:

No Downtime: You can add or remove servers without taking the system offline.
Better Fault Tolerance: System can survive individual server failures.
Geographic Distribution: You can distribute servers across different regions for better performance.

Disadvantages of Horizontal Scaling

Increased Complexity: You need load balancers, service discovery, distributed state management, etc.
Data Distribution Challenges: Data needs to be partitioned or replicated across servers, which adds complexity.
Network Latency: Communication between servers happens over the network, which is slower than in-memory communication.
State Management: Stateless applications work best. Stateful applications require session management or external state stores.
Code Changes May Be Required: Applications may need to be refactored to work in a distributed environment.

Side-by-Side Comparison

Visual Comparison: Vertical vs Horizontal Scaling

Left: One server getting more powerful | Right: Multiple servers sharing load

Aspect	Vertical Scaling	Horizontal Scaling
Definition	Adding more power to existing machine	Adding more machines to the system
Scalability Limit	Limited by hardware maximums	Nearly unlimited (cloud limits)
Scaling Efficiency	Diminishing returns (hardware limits)	Linear scaling (add more servers)
Fault Tolerance	Single point of failure	High availability
Implementation Complexity	Simple (just upgrade hardware)	Complex (load balancing, distribution)
Downtime During Scaling	Usually requires downtime	No downtime (can add/remove live)
Data Management	All data on one machine	Data distributed across machines
Network Calls	Minimal (mostly in-memory)	Frequent (inter-server communication)
Best For	Small to medium applications, stateful apps	Large applications, stateless apps, high traffic
Cloud Examples	AWS: Upgrade EC2 instance type GCP: Upgrade VM machine type	AWS: Add more EC2 instances GCP: Add more VM instances

When to Use Each Approach

The decision between horizontal and vertical scaling isn't always clear-cut. Here's a comprehensive guide to help you make the right choice based on your specific requirements.

Use Vertical Scaling When:

Small to Medium Traffic: Your application doesn't need to handle millions of requests per second. If you're serving less than 10,000 requests/second, vertical scaling might be sufficient.
Stateful Applications: Applications that maintain significant state in memory:
- Gaming servers (player state, game world state)
- Real-time analytics (in-memory aggregations)
- In-memory databases (Redis for complex data structures)
- Machine learning inference servers (model in memory)
Simple Architecture: You want to keep the system simple and avoid distributed systems complexity. Fewer moving parts = fewer failure points.
Database Servers (Write Masters): Many databases benefit from vertical scaling initially:
- PostgreSQL master: Single powerful machine for writes
- MySQL master: Strong consistency requires single node
- MongoDB primary: Single primary for write operations
Budget Constraints: You have limited budget and can't invest in distributed infrastructure (load balancers, service discovery, monitoring).
Low Latency Requirements: Applications where network latency between servers would be problematic:
- High-frequency trading systems
- Real-time collaboration tools (operational transforms)
- In-memory data processing
Single-Tenant Applications: Applications serving a single organization or use case where traffic is predictable.

Example: Single-Node Database

A PostgreSQL database for a small to medium application can start with vertical scaling. As data grows, you upgrade from 16GB RAM to 64GB RAM, then to 256GB RAM. This works well until you hit hardware limits or need high availability.

# PostgreSQL Vertical Scaling Journey (Typical Path)

Phase 1: Small Scale (Vertical Scaling)
  Instance: t3.medium  (4GB RAM)    - 10K rows
  Instance: t3.xlarge  (16GB RAM)  - 100K rows
  Instance: m5.2xlarge (32GB RAM)  - 1M rows
  Instance: m5.4xlarge (64GB RAM)  - 10M rows
  Instance: m5.8xlarge (128GB RAM) - 50M rows

Phase 2: Hit Limits (Time to Scale Horizontally)
  At this point, vertical scaling becomes impractical:
  - Hardware limits reached
  - Single point of failure risk
  - Upgrade downtime becomes unacceptable

Phase 3: Hybrid Approach (Recommended)
  - Keep powerful master for writes (vertical)
  - Add read replicas for reads (horizontal)
  - Consider database sharding for writes (horizontal)
  - Or migrate to managed service (RDS, Aurora)
      

Use Horizontal Scaling When:

High Traffic: You need to handle millions of requests per second (like Twitter, Facebook, Netflix). Single server can't handle the load.
Stateless Applications: Web servers, API servers that don't maintain session state:
- REST APIs
- Microservices
- Static file servers
- API gateways
High Availability Required: System must remain operational even if servers fail. 99.9%+ uptime requirements.
Variable Traffic: Traffic patterns are unpredictable:
- Spikes during events (Super Bowl, product launches)
- Seasonal variations (holiday shopping, tax season)
- Viral content (social media, news sites)
Cost Optimization: You want to optimize costs by using commodity hardware. Horizontal scaling is more cost-effective at scale.
Geographic Distribution: You need to serve users from multiple regions with low latency. Deploy servers in each region.
Auto-Scaling Requirements: Need to automatically add/remove servers based on traffic (cloud auto-scaling).
Multi-Tenant Applications: SaaS applications serving multiple customers with varying load patterns.

Example: Web Application Servers

A web application serving HTTP requests is typically stateless. You can easily add more web servers behind a load balancer. When traffic increases, you add 10 more servers. When traffic decreases, you remove servers to save costs. This is the classic horizontal scaling pattern.

# Horizontal Scaling with Auto-Scaling
Base Configuration:
  - Application Servers:  2x t3.medium
  - Load Balancer:       1x AWS Application Load Balancer
  - Auto-scaling group:  2-20 instances

Normal Traffic (9 AM - 5 PM):
  - Servers running:     4
  - Handles:             400 requests/second
  - Cost:               $120/month (servers) + $20/month (ALB)

Peak Traffic (Product Launch):
  - Servers:             15 servers automatically added
  - Handles:             1,500 requests/second
  - Cost:               $450/month (servers) + $20/month (ALB)

Low Traffic (Midnight - 6 AM):
  - Auto-scales down to: 2 servers
  - Handles:             100 requests/second
  - Cost:               $60/month (servers) + $20/month (ALB)

# Key Benefit: Pay only for what you use
      

Decision Framework

Start with Vertical Scaling if: Traffic < 1,000 requests/second, budget < $500/month, team < 5 engineers, simple architecture acceptable.

Move to Horizontal Scaling when: Traffic > 5,000 requests/second, need 99.9%+ uptime, traffic is variable, team can handle operational complexity.

Use Hybrid Approach: Most production systems use both - vertical for databases (write masters), horizontal for application servers.

Real-World Examples from Top Companies

Netflix - Horizontal Scaling at Massive Scale

Netflix uses horizontal scaling extensively. They run thousands of microservices across hundreds of thousands of servers. When a popular show releases, they can quickly scale up by adding more servers to handle the traffic spike. This would be impossible with vertical scaling.

Infrastructure: AWS EC2 instances across multiple regions (horizontal scaling)
Scale: Can scale from hundreds to thousands of servers in minutes using auto-scaling
Content Delivery: Distributes content delivery across multiple regions and CDN edge locations
Microservices: Each service scales independently based on demand
Cost Optimization: Uses spot instances and auto-scaling to optimize costs

Netflix Scaling Example

When "Stranger Things" Season 4 launched, Netflix's traffic spiked by 300%. Their horizontal scaling infrastructure automatically added thousands of servers across multiple regions to handle the load. Within 15 minutes, they scaled from ~50,000 servers to ~150,000 servers. This would be impossible with vertical scaling.

Instagram - Hybrid Approach

Instagram uses a hybrid approach, which is common for large-scale applications. Their application servers scale horizontally (thousands of servers), but their database infrastructure uses both approaches strategically:

Horizontal Scaling:
- Application servers (Python Django): Thousands of servers behind load balancers
- Cache servers: Redis clusters with hundreds of nodes
- CDN: CloudFront edge locations globally
Vertical Scaling:
- Database master nodes: Very powerful machines (256GB+ RAM, 32+ CPU cores)
- Why? Write operations need strong consistency and low latency
- Single master avoids distributed transaction complexity
Hybrid for Reads:
- Database read replicas: Multiple replicas (horizontal) for read scaling
- Each replica is also vertically scaled (powerful machines)

Key Insight: Hybrid is Common

Most large-scale systems use a hybrid approach. Application servers scale horizontally, but databases often use vertical scaling for write masters and horizontal scaling (read replicas) for reads. This balances performance, consistency, and scalability.

Google Search - Horizontal Scaling at Global Scale

Google's search infrastructure is one of the largest horizontally scaled systems in the world:

Scale: Millions of servers distributed globally across hundreds of data centers
Architecture: Each data center has thousands of servers handling different functions
Traffic: Can handle billions of search queries per day (3.5+ billion searches daily)
Scaling Strategy: Uses horizontal scaling for both compute and storage
Geographic Distribution: Servers in every major region for low latency
Fault Tolerance: Can lose entire data centers without service interruption

Small SaaS Application - Vertical to Horizontal Journey

A typical small SaaS application follows this scaling journey:

Phase 1 - Vertical Scaling (Months 0-6):
- Starts on a single server (e.g., DigitalOcean droplet with 2GB RAM, $12/month)
- Upgrades to 4GB RAM ($24/month) as users grow
- Upgrades to 8GB RAM ($48/month) when traffic increases
- Simple, cost-effective, no code changes needed
Phase 2 - Hybrid (Months 6-12):
- Application server: Still vertical (16GB RAM, $96/month)
- Database: Separate server, vertical scaling (8GB RAM, $48/month)
- Cache: Add Redis on separate server (2GB RAM, $24/month)
Phase 3 - Horizontal Migration (Months 12+):
- Make application stateless (move sessions to Redis)
- Add load balancer (AWS ALB, $20/month)
- Deploy 3 application servers (3x 4GB = $72/month)
- Add database read replicas for read scaling
- Total: ~$140/month with high availability

How to Discuss Scaling in System Design Interviews

In FAANG interviews, you'll be asked about scaling strategies. Here's how to approach these discussions effectively.

Step-by-Step Interview Approach

1. Start with Requirements

# Questions to Ask Interviewer

Traffic Patterns:
  - "What's the expected traffic? (QPS, concurrent users)"
  - "Is traffic consistent or variable? (spikes, seasonal)"
  - "What's the growth projection?"

Application Characteristics:
  - "Is the application stateless or stateful?"
  - "What are the read/write ratios?"
  - "What are the latency requirements?"

Constraints:
  - "Any budget constraints?"
  - "What's the timeline for scaling?"
  - "Any existing infrastructure?"
    

2. Analyze the Use Case

# Decision Framework

Choose Vertical Scaling When:
  ✅ Small to medium scale (< 10K requests/second)
  ✅ Stateful application (hard to make stateless)
  ✅ Simple architecture preferred
  ✅ Predictable traffic
  ✅ Single region deployment

Choose Horizontal Scaling When:
  ✅ Large scale (> 10K requests/second)
  ✅ Stateless or can be made stateless
  ✅ High availability required (99.9%+)
  ✅ Variable/unpredictable traffic
  ✅ Global/multi-region deployment
  ✅ Need auto-scaling
    

3. Discuss Trade-offs

Always mention trade-offs. Interviewers want to see you understand the implications:

Vertical Scaling: "Simple to implement, but hits hardware limits and creates single point of failure"
Horizontal Scaling: "More scalable and fault-tolerant, but requires stateless design and load balancing infrastructure"

4. Mention Hybrid Approach

Most real-world systems use a hybrid approach. Always mention this:

# Hybrid Approach Example

"Most large systems use a hybrid approach:
  - Application servers: Horizontal scaling (stateless, behind load balancer)
  - Database master: Vertical scaling (powerful machine for writes)
  - Database replicas: Horizontal scaling (multiple read replicas)
  - Cache layer: Horizontal scaling (Redis cluster)
  - CDN: Horizontal scaling (edge locations globally)

This balances performance, consistency, and scalability."
    

Common Interview Questions

Question 1: "When would you choose vertical vs horizontal scaling?"

# Strong Answer Structure

1. Start with use case:
   "It depends on the application characteristics and scale..."

2. Vertical scaling when:
   - Small to medium scale
   - Stateful application
   - Simple architecture needed
   - Predictable traffic

3. Horizontal scaling when:
   - Large scale
   - Stateless application
   - High availability needed
   - Variable traffic

4. Mention hybrid:
   "In practice, most systems use both - vertical for database
   masters, horizontal for application servers and reads."

5. Discuss trade-offs:
   "Vertical is simpler but limited. Horizontal is more complex
   but nearly unlimited scalability."
    

Question 2: "How would you migrate from vertical to horizontal scaling?"

# Migration Strategy Answer

1. Make application stateless:
   "First, I'd move session state to external storage (Redis/database).
   This is critical for horizontal scaling."

2. Add load balancer:
   "Introduce a load balancer (ALB, Nginx) to distribute traffic."

3. Deploy multiple instances:
   "Deploy application to multiple servers behind load balancer."

4. Scale database:
   "Add read replicas for database reads, keep powerful master for writes."

5. Add caching:
   "Implement caching layer (Redis) to reduce database load."

6. Monitor and optimize:
   "Continuously monitor and add/remove servers based on traffic."
    

Question 3: "What are the challenges of horizontal scaling?"

# Challenges Answer

1. Stateless Design:
   "Application must be stateless - no in-memory session state.
   This may require refactoring."

2. Data Consistency:
   "With multiple servers, ensuring data consistency becomes
   challenging. Need to consider CAP theorem."

3. Load Balancing:
   "Need load balancer infrastructure and choose right algorithm."

4. Monitoring Complexity:
   "More servers means more monitoring, logging, alerting needed."

5. Network Overhead:
   "Inter-server communication adds latency and network overhead."

6. Database Bottleneck:
   "Application servers can scale, but database becomes bottleneck.
   Need read replicas, caching, or sharding."
    

Red Flags to Avoid

❌ "Always use horizontal scaling" - Not true, depends on use case
❌ "Vertical scaling is never good" - Wrong, it's simpler for small scale
❌ "Just add more servers" - Ignoring stateless requirement, database bottleneck
❌ "Cost is the only factor" - Missing complexity, operational overhead
❌ "One size fits all" - Not considering application characteristics

Key Points to Emphasize

✅ Start simple: Begin with vertical scaling, migrate when needed
✅ Hybrid approach: Most systems use both strategies
✅ Stateless first: Critical for horizontal scaling
✅ Database scaling: Don't forget database when scaling application
✅ Trade-offs: Always discuss pros and cons
✅ Real-world examples: Mention how companies actually do it

Migration Strategies

Most applications start with vertical scaling and eventually migrate to horizontal scaling. Understanding migration strategies is important for system design interviews.

Phase 1: Start with Vertical Scaling

Launch application on a single server
Monitor performance and resource usage
Upgrade server as traffic grows (vertical scaling)
Keep it simple and cost-effective for initial growth

Phase 2: Identify Scaling Bottlenecks

Monitor CPU, memory, disk I/O, network I/O
Identify which resource is the bottleneck
Determine if vertical scaling can still solve the problem
Evaluate complexity vs benefits of horizontal scaling

Phase 3: Migrate to Horizontal Scaling

Stateless First: Make application stateless (move session storage to Redis/database)
Add Load Balancer: Introduce a load balancer (AWS ALB, Nginx, HAProxy)
Deploy Multiple Instances: Deploy application to multiple servers
Database Scaling: Add read replicas, implement caching
Monitor and Optimize: Continuously monitor and add/remove servers as needed

Migration Example: E-commerce Application

Phase 1: Start with single server (4GB RAM) handling 100 requests/second
Phase 2: Upgrade to 16GB RAM server handling 500 requests/second
Phase 3: Add load balancer + 3 servers (each 4GB RAM) handling 2000 requests/second
Phase 4: Scale to 10 servers handling 10,000 requests/second
Phase 5: Add database read replicas, caching layer, CDN

Trade-offs Analysis

Performance

Vertical: Lower latency (no network calls)
Horizontal: Network latency between servers
Winner: Depends on use case

Reliability

Vertical: Single point of failure
Horizontal: High availability
Winner: Horizontal scaling

Scalability

Vertical: Limited by hardware
Horizontal: Nearly unlimited
Winner: Horizontal scaling

Scaling Efficiency

Vertical: Diminishing returns, hardware limits
Horizontal: Linear scaling, nearly unlimited
Winner: Horizontal scaling (for large scale)

Complexity

Vertical: Simple implementation
Horizontal: Complex distributed system
Winner: Vertical scaling

Flexibility

Vertical: Fixed capacity
Horizontal: Dynamic scaling
Winner: Horizontal scaling

Common Mistakes to Avoid

❌ Mistake 1: Always Choosing Horizontal Scaling

Many engineers assume horizontal scaling is always better. However, for small applications, vertical scaling is simpler and more cost-effective. Don't over-engineer.

❌ Mistake 2: Ignoring State Management

When migrating to horizontal scaling, forgetting to make the application stateless is a common mistake. Session data stored in memory will be lost when requests hit different servers.

❌ Mistake 3: Not Considering Database Scaling

Scaling application servers horizontally is useless if the database becomes the bottleneck. Always consider database scaling (read replicas, sharding) when scaling application servers.

❌ Mistake 4: Underestimating Operational Complexity

Horizontal scaling requires monitoring, load balancing, service discovery, health checks, and more. Don't underestimate the operational overhead.

Interview Tips

How to Discuss Scaling in Interviews

Start with Requirements: Ask about expected traffic, growth projections, and availability requirements.
Recommend Vertical First: For most systems, start with vertical scaling and explain when you'd migrate to horizontal.
Discuss Trade-offs: Always mention both approaches and their trade-offs, even if you recommend one.
Consider the Full Stack: Don't just scale application servers - consider databases, caches, and other components.
Mention Cost: Discuss cost implications of your scaling decisions.
Plan for Migration: Explain how you'd migrate from vertical to horizontal scaling when needed.

Red Flags to Avoid

❌ Recommending horizontal scaling for a small application
❌ Not considering database scaling when scaling application servers
❌ Ignoring cost implications
❌ Not discussing fault tolerance and availability
❌ Forgetting about state management in horizontally scaled systems

Key Points to Emphasize

✅ Start simple (vertical), scale to complex (horizontal) when needed
✅ Consider the entire system, not just one component
✅ Discuss trade-offs explicitly
✅ Mention real-world examples from companies
✅ Consider cost, complexity, and operational overhead

Summary

Understanding horizontal vs vertical scaling is fundamental to system design. Here are the key takeaways:

Vertical Scaling: Add more power to existing machine. Simple, but limited and expensive at scale.
Horizontal Scaling: Add more machines. Complex, but unlimited and cost-effective at scale.
Best Practice: Start with vertical scaling, migrate to horizontal when needed.
Consider: Always think about the entire stack (application, database, cache) when making scaling decisions.
Trade-offs: Every decision has trade-offs - discuss them explicitly in interviews.

In the next section, we'll explore Load Balancing, which is essential for horizontal scaling.

📝 Quiz: Test Your Understanding

Complete this quiz to mark this topic as completed

Question 1: What is the primary difference between horizontal and vertical scaling?

a) Horizontal scaling is cheaper, vertical scaling is more expensive
b) Horizontal scaling adds more machines, vertical scaling upgrades existing machine
c) Horizontal scaling is for databases, vertical scaling is for applications
d) There is no difference

Question 2: When should you use vertical scaling?

a) When you need to handle millions of requests per second
b) When you need high availability
c) When you have small to medium traffic and want simplicity
d) When you need geographic distribution

Question 3: What is a key advantage of horizontal scaling?

a) Nearly unlimited scalability and high availability
b) Lower latency between components
c) Simpler implementation
d) No need for load balancers

Question 4: Which of the following is a disadvantage of vertical scaling?

a) Requires complex distributed systems
b) Network latency between servers
c) Need for load balancers
d) Single point of failure and hardware limits

Question 5: For a system needing 32 vCPUs and 64GB RAM, which approach is more cost-effective?

a) 1x m5.8xlarge (vertical) - $1,472/month
b) 16x t3.medium (horizontal) - ~$506/month
c) Both cost the same
d) Depends on the application type

Aspect	Layer 4 (L4)	Layer 7 (L7)
OSI Layer	Transport Layer (4)	Application Layer (7)
Decision Based On	IP addresses, ports, protocol	URL, HTTP headers, content
Performance	Very fast (lower latency)	Slower (higher latency due to processing)
Protocol Support	Any TCP/UDP protocol	Primarily HTTP/HTTPS
SSL Termination	No (pass-through)	Yes (can terminate SSL)
Content Awareness	No	Yes
Routing Flexibility	Limited (IP/port based)	High (content-based routing)
Client IP Visibility	Yes (with DSR)	No (requires X-Forwarded-For)
Use Case	High throughput, simple distribution	Microservices, content routing, API gateway
Examples	AWS NLB, HAProxy TCP mode	AWS ALB, NGINX, HAProxy HTTP mode

Parameter	Description	Typical Values
Interval	How often to check	10-60 seconds
Timeout	Max time to wait for response	2-10 seconds
Unhealthy Threshold	Consecutive failures to mark unhealthy	2-3 failures
Healthy Threshold	Consecutive successes to mark healthy	2-3 successes
Path	Health check endpoint	/health, /healthz, /ping
Expected Status	HTTP status code for healthy	200 OK

Feature	Application Load Balancer (ALB)	Network Load Balancer (NLB)
Layer	Layer 7 (Application)	Layer 4 (Network)
Performance	High (millions of requests/sec)	Ultra-high (millions of connections/sec)
Latency	~100-400ms	~50-100ms (lower)
Content-Based Routing	Yes (URL, headers, host)	No
SSL Termination	Yes	Yes
Use Case	Microservices, HTTP/HTTPS apps	High-performance TCP/UDP, gaming
Cost	$0.0225 per ALB-hour + $0.008 per LCU-hour	$0.0225 per NLB-hour + $0.006 per LCU-hour

6. Cache Coherency and Consistency

Maintaining consistency across multiple cache layers and distributed caches is challenging.

6.1 Cache Coherency Models

# Cache Coherency Models

1. Strong Consistency:
   - All caches see updates immediately
   - Requires synchronization
   - High latency, low throughput
   - Use case: Financial data

2. Eventual Consistency:
   - Caches eventually converge
   - Lower latency, higher throughput
   - May serve stale data temporarily
   - Use case: Most web applications

3. Weak Consistency:
   - No guarantees about when updates appear
   - Highest performance
   - May never converge
   - Use case: Analytics, non-critical data

# Real-World: Google's Cache Coherency
  - Uses eventual consistency for most caches
  - Strong consistency only for critical operations
  - Reference: "Large-scale Incremental Processing" - Google Research
    

6.2 Distributed Cache Consistency

# Handling Consistency in Distributed Caches

Problem:
  - Multiple cache nodes
  - Updates must propagate
  - Network partitions possible

Solutions:

1. Write-Through to All Nodes:
   - Write to all cache nodes
   - Ensures consistency
   - High latency (wait for all)

2. Write-Through to Primary:
   - Write to primary node
   - Replicate to others asynchronously
   - Lower latency, eventual consistency

3. Invalidation Messages:
   - Invalidate on all nodes
   - Next read refreshes from database
   - Ensures eventual consistency

# Real-World: Redis Cluster
  - Writes go to master node
  - Replicated to replicas asynchronously
  - Reads can go to any node (may be stale)
  - Strong consistency option: Read from master
    

7. Cloud Provider Caching Solutions

7.1 AWS ElastiCache

# AWS ElastiCache Features

Supported Engines:
  - Redis (6.x, 7.x)
  - Memcached (1.6.x)

Features:
  - Automatic failover (Redis)
  - Multi-AZ deployment
  - Automatic backups
  - Encryption at rest and in transit
  - VPC integration
  - CloudWatch monitoring

Configuration Options:
  - Basic: Single node, development/testing
  - Standard: Multi-AZ, production-ready
  - Cluster: Horizontal scaling, large scale

Instance Types:
  - Small: 0.5-1 GB (development)
  - Medium: 13-26 GB (small production)
  - Large: 52+ GB (large production)

Use Cases:
  - Session storage
  - Real-time analytics
  - Leaderboards
  - Rate limiting

Reference:
  https://aws.amazon.com/elasticache/
    

7.2 Google Cloud Memorystore

# Google Cloud Memorystore

Supported Engines:
  - Redis (6.x, 7.x)
  - Memcached (1.6.x)

Features:
  - Automatic scaling
  - High availability
  - VPC peering
  - IAM integration
  - Monitoring and alerts

Tiers:
  - Basic: Single node, no replication
  - Standard: Primary + replica, high availability

Features:
  - Automatic failover
  - VPC integration
  - IAM authentication

Use Cases:
  - Application caching
  - Session storage
  - Real-time analytics

Reference:
  https://cloud.google.com/memorystore
    

7.3 Azure Cache for Redis

# Azure Cache for Redis

Tiers:
  - Basic: Single node, no SLA
  - Standard: 2 nodes (primary + replica), 99.9% SLA
  - Premium: Enhanced features, 99.95% SLA

Features:
  - Redis persistence (Premium)
  - Geo-replication (Premium)
  - Virtual network integration
  - Private endpoints

Tiers:
  - Basic: Single node, development
  - Standard: Primary + replica, production
  - Premium: Enhanced features, geo-replication

Sizes:
  - Small: 250 MB - 1 GB (development/small apps)
  - Medium: 2.5 GB - 6 GB (medium apps)
  - Large: 13 GB+ (large scale apps)

Reference:
  https://azure.microsoft.com/services/cache/
    

Common Mistakes to Avoid

❌ Cache everything: Only cache expensive operations or frequently accessed data
❌ Ignoring cache invalidation: Stale data can cause serious bugs
❌ Not handling cache failures: Application should work without cache
❌ Wrong eviction policy: Choose based on access patterns
❌ Cache stampede: Implement lock-based or probabilistic refresh
❌ Not monitoring cache metrics: Hit rate, latency, memory usage

How to Discuss Caching in Interviews

Caching is one of the most common topics in system design interviews. Here's how to approach it effectively.

Step-by-Step Interview Approach

1. Identify What to Cache

# Questions to Ask Yourself

"What should I cache?"
  - Frequently accessed data
  - Expensive computations (database queries, API calls)
  - Static or semi-static content
  - User sessions

"What should I NOT cache?"
  - Frequently changing data
  - User-specific sensitive data (unless encrypted)
  - Data larger than cache capacity
  - Data accessed rarely"
    

2. Choose Cache Pattern

# Decision Framework

Cache-Aside (Most Common):
  ✅ Simple, flexible
  ✅ Cache failures don't break app
  ✅ Good for read-heavy workloads
  Use when: General purpose caching

Write-Through:
  ✅ Cache always consistent
  ✅ Good for write-heavy workloads
  ❌ Higher write latency
  Use when: Strong consistency needed

Write-Behind:
  ✅ Fastest writes
  ✅ High throughput
  ❌ Risk of data loss
  Use when: Write performance critical, can tolerate eventual consistency
    

3. Discuss Cache Invalidation

# Critical Interview Point

"Cache invalidation is hard. I'd use:
  1. TTL-based: Simple, automatic, but may serve stale data
  2. Event-driven: Real-time, but complex infrastructure needed
  3. Version-based: Efficient, but requires version tracking

For most cases, I'd combine TTL with event-driven invalidation
for critical data."
    

Common Interview Questions

Question 1: "How would you implement caching for a news website?"

# Strong Answer

1. Multi-level caching:
   "I'd use multiple cache levels:
   - CDN: Static assets (images, CSS, JS)
   - Application cache (Redis): Article content, metadata
   - Browser cache: User preferences"

2. Cache strategy:
   "For articles:
   - Cache-aside pattern
   - TTL: 5 minutes (articles don't change often)
   - Invalidate on article update
   - Cache popular articles longer"

3. Handle cache stampede:
   "Use lock-based refresh to prevent all requests
   hitting database when cache expires."
    

Question 2: "What happens when cache fails?"

# Critical Interview Question

1. Application should still work:
   "Cache is an optimization, not a requirement.
   Application should degrade gracefully."

2. Fallback strategies:
   - Direct database query (slower but works)
   - Circuit breaker pattern (prevent cascade failures)
   - Stale cache serving (if acceptable)

3. Monitoring:
   "Monitor cache hit rate, latency.
   Alert if hit rate drops significantly."
    

Question 3: "How do you prevent cache stampede?"

# Common Follow-up

1. Problem:
   "When cache expires, many requests try to refresh
   simultaneously, overwhelming database."

2. Solutions:
   - Lock-based: First request acquires lock, others wait
   - Probabilistic early expiration: Expire slightly before TTL
   - Background refresh: Refresh before expiration
   - Stale-while-revalidate: Serve stale, refresh in background

3. Implementation:
   "I'd use distributed lock (Redis) to ensure only one
   request refreshes cache at a time."
    

Key Points to Emphasize

✅ Cache-aside is most common: Simple, flexible, handles failures well
✅ Cache invalidation is hard: Always discuss strategies
✅ Handle cache failures: Application must work without cache
✅ Monitor cache metrics: Hit rate, latency, memory usage
✅ Multi-level caching: Browser, CDN, application, database
✅ Real-world examples: Facebook Memcached, Twitter Redis

Red Flags to Avoid

❌ "Cache everything" - Only cache what makes sense
❌ "Ignore cache failures" - Application must work without cache
❌ "No invalidation strategy" - Stale data causes bugs
❌ "One cache pattern fits all" - Choose based on use case
❌ "No monitoring" - Need to track cache effectiveness

📝 Quiz: Test Your Understanding

Complete this quiz to mark this topic as completed

Question 1: What is the main difference between cache-aside and write-through patterns?

a) Cache-aside: Application manages cache, write-through: Cache and DB written together
b) Cache-aside is faster, write-through is slower
c) Cache-aside uses Redis, write-through uses Memcached
d) There is no difference

Question 2: What is cache stampede and how is it typically prevented?

a) Cache overflow, prevented by increasing cache size
b) Cache corruption, prevented by checksums
c) Many requests refreshing expired cache simultaneously, prevented by locks or probabilistic expiration
d) Cache inconsistency, prevented by write-through

Question 3: What is the main advantage of LRU eviction policy?

a) It considers access frequency
b) It works well for temporal locality (recent items likely to be used again)
c) It never evicts frequently used data
d) It's the fastest eviction algorithm

Question 4: What is the main difference between Redis and Memcached?

a) Redis is faster, Memcached is slower
b) Redis is for caching, Memcached is for databases
c) Redis doesn't support persistence, Memcached does
d) Redis has rich data structures and persistence, Memcached is simple key-value only

Question 5: What is the primary advantage of write-behind cache pattern?

a) Very fast writes (cache only) and high write throughput
b) Guaranteed consistency with database
c) No risk of data loss
d) Simplest to implement

System Design Fundamentals to Advanced

1. Horizontal vs Vertical Scaling

Overview & Introduction

Why This Matters

What is Vertical Scaling (Scaling Up)?

How It Works

Real-World Example: AWS EC2 Instance Upgrade

Advantages of Vertical Scaling

Disadvantages of Vertical Scaling

What is Horizontal Scaling (Scaling Out)?

How It Works

Real-World Example: Web Application Cluster

Advantages of Horizontal Scaling

Disadvantages of Horizontal Scaling

Side-by-Side Comparison

When to Use Each Approach

Use Vertical Scaling When:

Example: Single-Node Database

Use Horizontal Scaling When:

Example: Web Application Servers

Decision Framework

Real-World Examples from Top Companies

Netflix - Horizontal Scaling at Massive Scale

Netflix Scaling Example

Instagram - Hybrid Approach

Key Insight: Hybrid is Common

Google Search - Horizontal Scaling at Global Scale

Small SaaS Application - Vertical to Horizontal Journey

How to Discuss Scaling in System Design Interviews

Step-by-Step Interview Approach

1. Start with Requirements

2. Analyze the Use Case

3. Discuss Trade-offs

4. Mention Hybrid Approach

Common Interview Questions

Question 1: "When would you choose vertical vs horizontal scaling?"

Question 2: "How would you migrate from vertical to horizontal scaling?"

Question 3: "What are the challenges of horizontal scaling?"

Red Flags to Avoid

Key Points to Emphasize

Migration Strategies

Phase 1: Start with Vertical Scaling

Phase 2: Identify Scaling Bottlenecks

Phase 3: Migrate to Horizontal Scaling

Migration Example: E-commerce Application

Trade-offs Analysis

Performance

Reliability

Scalability

Scaling Efficiency

Complexity

Flexibility

Common Mistakes to Avoid

❌ Mistake 1: Always Choosing Horizontal Scaling

❌ Mistake 2: Ignoring State Management

❌ Mistake 3: Not Considering Database Scaling

❌ Mistake 4: Underestimating Operational Complexity

Interview Tips

How to Discuss Scaling in Interviews

Red Flags to Avoid

Key Points to Emphasize

Summary

📝 Quiz: Test Your Understanding

Question 1: What is the primary difference between horizontal and vertical scaling?

Question 2: When should you use vertical scaling?

Question 3: What is a key advantage of horizontal scaling?

Question 4: Which of the following is a disadvantage of vertical scaling?

Question 5: For a system needing 32 vCPUs and 64GB RAM, which approach is more cost-effective?

2. Load Balancing

Overview & Introduction

Why Load Balancing Matters

What is Load Balancing?

Key Functions of a Load Balancer

Why Load Balancing is Essential

1. Prevents Server Overload

Example: E-commerce Site During Sale

2. Enables Horizontal Scaling

3. Provides High Availability

4. Improves Performance

5. Enables Geographic Distribution