KnowYourBasics
← Back to Interview Prep

System Design Fundamentals to Advanced

Master the art of designing scalable, distributed systems for FAANG interviews

1. Horizontal vs Vertical Scaling

⏱️ Reading Time: 20-25 minutes 📊 Difficulty: Fundamental 🎯 Interview Level: L4-L6

Overview & Introduction

Scaling is one of the most fundamental concepts in system design. When your application grows and starts receiving more traffic, you need to scale your system to handle the increased load. There are two primary approaches to scaling: horizontal scaling (scaling out) and vertical scaling (scaling up).

Understanding when and how to use each approach is crucial for designing systems that can grow efficiently. This decision impacts not just performance, but also cost, reliability, and operational complexity.

Why This Matters

Every system design interview at FAANG companies will touch on scaling. Interviewers expect you to understand the trade-offs between horizontal and vertical scaling, and to make informed decisions based on the specific requirements of the system you're designing.

What is Vertical Scaling (Scaling Up)?

Vertical scaling, also known as "scaling up," involves adding more power (CPU, RAM, storage) to your existing server or machine. Instead of adding more machines, you upgrade the hardware of your current machine.

Vertical Scaling (Scaling Up)
Server 2 CPU, 4GB RAM 2 vCPU, 4GB Upgrade Server 8 CPU, 32GB RAM 8 vCPU, 32GB

Same server, more powerful hardware

How It Works

  • You start with a server that has, for example, 4 CPU cores and 16GB RAM
  • When you need more capacity, you upgrade to 8 CPU cores and 32GB RAM
  • Or upgrade to 16 CPU cores and 64GB RAM
  • The application code typically doesn't need to change
  • You're essentially making your single server more powerful
Real-World Example: AWS EC2 Instance Upgrade

You're running your application on an AWS EC2 t3.medium instance (2 vCPUs, 4GB RAM). As traffic increases, you upgrade to t3.xlarge (4 vCPUs, 16GB RAM), then to m5.2xlarge (8 vCPUs, 32GB RAM). This is vertical scaling - you're making the same instance more powerful.

Vertical Scaling: Instance Upgrade Journey
Before: t3.medium EC2 Instance 2 vCPUs 4 GB RAM 5 Gbps Network ~1K req/s Upgrade Hardware After: m5.2xlarge EC2 Instance 8 vCPUs 32 GB RAM 10 Gbps ~8K req/s Key Changes ✓ Same application code ✓ Same deployment process ✓ No code changes needed ✓ 4x more vCPUs ✓ 8x more RAM ✓ 2x network bandwidth ✓ 8x capacity increase ⚠ Eventually hits hardware limits ⚠ Single point of failure ⚠ Downtime during upgrade

Key Insight: Vertical scaling is simple - just upgrade the hardware. No code changes, no architectural changes. But you're limited by the most powerful hardware available, and you still have a single point of failure.

Advantages of Vertical Scaling

  • Simplicity: No code changes required in most cases. Your application continues to run on a single machine.
  • No Data Distribution: All data remains on one machine, so you don't need to worry about data partitioning or synchronization.
  • Lower Latency: No network calls between servers, so inter-process communication is faster.
  • Easier to Implement: Just upgrade hardware or move to a larger cloud instance.
  • Better for Stateful Applications: Applications that maintain state in memory work well with vertical scaling.

Disadvantages of Vertical Scaling

  • Hardware Limits: There's a physical limit to how powerful a single machine can be. You can't infinitely upgrade a server.
  • Single Point of Failure: If your one powerful server fails, your entire system goes down.
  • Downtime During Upgrades: Upgrading hardware often requires taking the server offline.
  • Hardware Cost Scaling: More powerful hardware becomes exponentially more expensive. At scale, horizontal scaling with commodity hardware is typically more economical.
  • Limited Scalability: You can only scale as much as the most powerful available hardware allows.

What is Horizontal Scaling (Scaling Out)?

Horizontal scaling, also known as "scaling out," involves adding more machines or servers to your system. Instead of making one machine more powerful, you add more machines and distribute the load across them.

Horizontal Scaling (Scaling Out)
Load Balancer Server 1 2 CPU, 4GB Server 2 2 CPU, 4GB Server 3 2 CPU, 4GB + More... Add More Servers Server 4 Server 5 Server 6

Multiple servers sharing the load

How It Works

  • You start with 1 server handling all requests
  • As traffic increases, you add a 2nd server, then a 3rd, 4th, and so on
  • Requests are distributed across all servers using a load balancer
  • Each server runs the same application code
  • You're essentially creating a cluster of servers
Real-World Example: Web Application Cluster

Your web application starts with 1 server. As users grow, you add 2 more servers behind a load balancer. Now you have 3 servers sharing the load. When traffic spikes, you can quickly add 5 more servers (total 8) to handle the surge. This is horizontal scaling - you're adding more machines rather than making one machine more powerful.

Scaling Journey: From 1 to 8 Servers
Stage 1: Initial Setup Server 1 2 vCPU, 4GB 100 req/s Capacity: 1K req/s Stage 2: After Growth (3 servers) Load Balancer Server 1 Server 2 Server 3 300 req/s distributed | Capacity: 3K req/s Stage 3: Traffic Spike (8 servers) Load Balancer 800 req/s distributed | Capacity: 8K req/s Can scale down when traffic decreases

Key Insight: Each server maintains the same capacity (1K req/s), but total capacity grows linearly with the number of servers. The load balancer automatically distributes traffic evenly, and you can add or remove servers without downtime.

Advantages of Horizontal Scaling

  • Nearly Unlimited Scalability: You can add as many servers as needed (within cloud provider limits). There's no theoretical limit.
  • High Availability: If one server fails, others continue serving traffic. The system remains operational.
  • Cost Efficiency at Scale: Commodity hardware is typically more cost-effective at scale. 10 servers with 4GB RAM each often provide better price/performance than 1 server with 40GB RAM.
  • No Downtime: You can add or remove servers without taking the system offline.
  • Better Fault Tolerance: System can survive individual server failures.
  • Geographic Distribution: You can distribute servers across different regions for better performance.

Disadvantages of Horizontal Scaling

  • Increased Complexity: You need load balancers, service discovery, distributed state management, etc.
  • Data Distribution Challenges: Data needs to be partitioned or replicated across servers, which adds complexity.
  • Network Latency: Communication between servers happens over the network, which is slower than in-memory communication.
  • State Management: Stateless applications work best. Stateful applications require session management or external state stores.
  • Code Changes May Be Required: Applications may need to be refactored to work in a distributed environment.

Side-by-Side Comparison

Visual Comparison: Vertical vs Horizontal Scaling
Vertical Scaling Server 2 CPU 4GB RAM Server 8 CPU 32GB RAM 8 vCPU, 32GB Horizontal Scaling Load Balancer Server 1 2 CPU, 4GB Server 2 2 CPU, 4GB Server 3 2 CPU, 4GB Server 4 Server 5 Server 6 +

Left: One server getting more powerful | Right: Multiple servers sharing load

Aspect Vertical Scaling Horizontal Scaling
Definition Adding more power to existing machine Adding more machines to the system
Scalability Limit Limited by hardware maximums Nearly unlimited (cloud limits)
Scaling Efficiency Diminishing returns (hardware limits) Linear scaling (add more servers)
Fault Tolerance Single point of failure High availability
Implementation Complexity Simple (just upgrade hardware) Complex (load balancing, distribution)
Downtime During Scaling Usually requires downtime No downtime (can add/remove live)
Data Management All data on one machine Data distributed across machines
Network Calls Minimal (mostly in-memory) Frequent (inter-server communication)
Best For Small to medium applications, stateful apps Large applications, stateless apps, high traffic
Cloud Examples AWS: Upgrade EC2 instance type
GCP: Upgrade VM machine type
AWS: Add more EC2 instances
GCP: Add more VM instances

When to Use Each Approach

The decision between horizontal and vertical scaling isn't always clear-cut. Here's a comprehensive guide to help you make the right choice based on your specific requirements.

Use Vertical Scaling When:

  • Small to Medium Traffic: Your application doesn't need to handle millions of requests per second. If you're serving less than 10,000 requests/second, vertical scaling might be sufficient.
  • Stateful Applications: Applications that maintain significant state in memory:
    • Gaming servers (player state, game world state)
    • Real-time analytics (in-memory aggregations)
    • In-memory databases (Redis for complex data structures)
    • Machine learning inference servers (model in memory)
  • Simple Architecture: You want to keep the system simple and avoid distributed systems complexity. Fewer moving parts = fewer failure points.
  • Database Servers (Write Masters): Many databases benefit from vertical scaling initially:
    • PostgreSQL master: Single powerful machine for writes
    • MySQL master: Strong consistency requires single node
    • MongoDB primary: Single primary for write operations
  • Budget Constraints: You have limited budget and can't invest in distributed infrastructure (load balancers, service discovery, monitoring).
  • Low Latency Requirements: Applications where network latency between servers would be problematic:
    • High-frequency trading systems
    • Real-time collaboration tools (operational transforms)
    • In-memory data processing
  • Single-Tenant Applications: Applications serving a single organization or use case where traffic is predictable.
Example: Single-Node Database

A PostgreSQL database for a small to medium application can start with vertical scaling. As data grows, you upgrade from 16GB RAM to 64GB RAM, then to 256GB RAM. This works well until you hit hardware limits or need high availability.

# PostgreSQL Vertical Scaling Journey (Typical Path) Phase 1: Small Scale (Vertical Scaling) Instance: t3.medium (4GB RAM) - 10K rows Instance: t3.xlarge (16GB RAM) - 100K rows Instance: m5.2xlarge (32GB RAM) - 1M rows Instance: m5.4xlarge (64GB RAM) - 10M rows Instance: m5.8xlarge (128GB RAM) - 50M rows Phase 2: Hit Limits (Time to Scale Horizontally) At this point, vertical scaling becomes impractical: - Hardware limits reached - Single point of failure risk - Upgrade downtime becomes unacceptable Phase 3: Hybrid Approach (Recommended) - Keep powerful master for writes (vertical) - Add read replicas for reads (horizontal) - Consider database sharding for writes (horizontal) - Or migrate to managed service (RDS, Aurora)

Use Horizontal Scaling When:

  • High Traffic: You need to handle millions of requests per second (like Twitter, Facebook, Netflix). Single server can't handle the load.
  • Stateless Applications: Web servers, API servers that don't maintain session state:
    • REST APIs
    • Microservices
    • Static file servers
    • API gateways
  • High Availability Required: System must remain operational even if servers fail. 99.9%+ uptime requirements.
  • Variable Traffic: Traffic patterns are unpredictable:
    • Spikes during events (Super Bowl, product launches)
    • Seasonal variations (holiday shopping, tax season)
    • Viral content (social media, news sites)
  • Cost Optimization: You want to optimize costs by using commodity hardware. Horizontal scaling is more cost-effective at scale.
  • Geographic Distribution: You need to serve users from multiple regions with low latency. Deploy servers in each region.
  • Auto-Scaling Requirements: Need to automatically add/remove servers based on traffic (cloud auto-scaling).
  • Multi-Tenant Applications: SaaS applications serving multiple customers with varying load patterns.
Example: Web Application Servers

A web application serving HTTP requests is typically stateless. You can easily add more web servers behind a load balancer. When traffic increases, you add 10 more servers. When traffic decreases, you remove servers to save costs. This is the classic horizontal scaling pattern.

# Horizontal Scaling with Auto-Scaling Base Configuration: - Application Servers: 2x t3.medium - Load Balancer: 1x AWS Application Load Balancer - Auto-scaling group: 2-20 instances Normal Traffic (9 AM - 5 PM): - Servers running: 4 - Handles: 400 requests/second - Cost: $120/month (servers) + $20/month (ALB) Peak Traffic (Product Launch): - Servers: 15 servers automatically added - Handles: 1,500 requests/second - Cost: $450/month (servers) + $20/month (ALB) Low Traffic (Midnight - 6 AM): - Auto-scales down to: 2 servers - Handles: 100 requests/second - Cost: $60/month (servers) + $20/month (ALB) # Key Benefit: Pay only for what you use
Decision Framework

Start with Vertical Scaling if: Traffic < 1,000 requests/second, budget < $500/month, team < 5 engineers, simple architecture acceptable.

Move to Horizontal Scaling when: Traffic > 5,000 requests/second, need 99.9%+ uptime, traffic is variable, team can handle operational complexity.

Use Hybrid Approach: Most production systems use both - vertical for databases (write masters), horizontal for application servers.

Real-World Examples from Top Companies

Netflix - Horizontal Scaling at Massive Scale

Netflix uses horizontal scaling extensively. They run thousands of microservices across hundreds of thousands of servers. When a popular show releases, they can quickly scale up by adding more servers to handle the traffic spike. This would be impossible with vertical scaling.

  • Infrastructure: AWS EC2 instances across multiple regions (horizontal scaling)
  • Scale: Can scale from hundreds to thousands of servers in minutes using auto-scaling
  • Content Delivery: Distributes content delivery across multiple regions and CDN edge locations
  • Microservices: Each service scales independently based on demand
  • Cost Optimization: Uses spot instances and auto-scaling to optimize costs
Netflix Scaling Example

When "Stranger Things" Season 4 launched, Netflix's traffic spiked by 300%. Their horizontal scaling infrastructure automatically added thousands of servers across multiple regions to handle the load. Within 15 minutes, they scaled from ~50,000 servers to ~150,000 servers. This would be impossible with vertical scaling.

Instagram - Hybrid Approach

Instagram uses a hybrid approach, which is common for large-scale applications. Their application servers scale horizontally (thousands of servers), but their database infrastructure uses both approaches strategically:

  • Horizontal Scaling:
    • Application servers (Python Django): Thousands of servers behind load balancers
    • Cache servers: Redis clusters with hundreds of nodes
    • CDN: CloudFront edge locations globally
  • Vertical Scaling:
    • Database master nodes: Very powerful machines (256GB+ RAM, 32+ CPU cores)
    • Why? Write operations need strong consistency and low latency
    • Single master avoids distributed transaction complexity
  • Hybrid for Reads:
    • Database read replicas: Multiple replicas (horizontal) for read scaling
    • Each replica is also vertically scaled (powerful machines)
Key Insight: Hybrid is Common

Most large-scale systems use a hybrid approach. Application servers scale horizontally, but databases often use vertical scaling for write masters and horizontal scaling (read replicas) for reads. This balances performance, consistency, and scalability.

Google Search - Horizontal Scaling at Global Scale

Google's search infrastructure is one of the largest horizontally scaled systems in the world:

  • Scale: Millions of servers distributed globally across hundreds of data centers
  • Architecture: Each data center has thousands of servers handling different functions
  • Traffic: Can handle billions of search queries per day (3.5+ billion searches daily)
  • Scaling Strategy: Uses horizontal scaling for both compute and storage
  • Geographic Distribution: Servers in every major region for low latency
  • Fault Tolerance: Can lose entire data centers without service interruption

Small SaaS Application - Vertical to Horizontal Journey

A typical small SaaS application follows this scaling journey:

  1. Phase 1 - Vertical Scaling (Months 0-6):
    • Starts on a single server (e.g., DigitalOcean droplet with 2GB RAM, $12/month)
    • Upgrades to 4GB RAM ($24/month) as users grow
    • Upgrades to 8GB RAM ($48/month) when traffic increases
    • Simple, cost-effective, no code changes needed
  2. Phase 2 - Hybrid (Months 6-12):
    • Application server: Still vertical (16GB RAM, $96/month)
    • Database: Separate server, vertical scaling (8GB RAM, $48/month)
    • Cache: Add Redis on separate server (2GB RAM, $24/month)
  3. Phase 3 - Horizontal Migration (Months 12+):
    • Make application stateless (move sessions to Redis)
    • Add load balancer (AWS ALB, $20/month)
    • Deploy 3 application servers (3x 4GB = $72/month)
    • Add database read replicas for read scaling
    • Total: ~$140/month with high availability

How to Discuss Scaling in System Design Interviews

In FAANG interviews, you'll be asked about scaling strategies. Here's how to approach these discussions effectively.

Step-by-Step Interview Approach

1. Start with Requirements
# Questions to Ask Interviewer Traffic Patterns: - "What's the expected traffic? (QPS, concurrent users)" - "Is traffic consistent or variable? (spikes, seasonal)" - "What's the growth projection?" Application Characteristics: - "Is the application stateless or stateful?" - "What are the read/write ratios?" - "What are the latency requirements?" Constraints: - "Any budget constraints?" - "What's the timeline for scaling?" - "Any existing infrastructure?"
2. Analyze the Use Case
# Decision Framework Choose Vertical Scaling When: ✅ Small to medium scale (< 10K requests/second) ✅ Stateful application (hard to make stateless) ✅ Simple architecture preferred ✅ Predictable traffic ✅ Single region deployment Choose Horizontal Scaling When: ✅ Large scale (> 10K requests/second) ✅ Stateless or can be made stateless ✅ High availability required (99.9%+) ✅ Variable/unpredictable traffic ✅ Global/multi-region deployment ✅ Need auto-scaling
3. Discuss Trade-offs

Always mention trade-offs. Interviewers want to see you understand the implications:

  • Vertical Scaling: "Simple to implement, but hits hardware limits and creates single point of failure"
  • Horizontal Scaling: "More scalable and fault-tolerant, but requires stateless design and load balancing infrastructure"
4. Mention Hybrid Approach

Most real-world systems use a hybrid approach. Always mention this:

# Hybrid Approach Example "Most large systems use a hybrid approach: - Application servers: Horizontal scaling (stateless, behind load balancer) - Database master: Vertical scaling (powerful machine for writes) - Database replicas: Horizontal scaling (multiple read replicas) - Cache layer: Horizontal scaling (Redis cluster) - CDN: Horizontal scaling (edge locations globally) This balances performance, consistency, and scalability."

Common Interview Questions

Question 1: "When would you choose vertical vs horizontal scaling?"
# Strong Answer Structure 1. Start with use case: "It depends on the application characteristics and scale..." 2. Vertical scaling when: - Small to medium scale - Stateful application - Simple architecture needed - Predictable traffic 3. Horizontal scaling when: - Large scale - Stateless application - High availability needed - Variable traffic 4. Mention hybrid: "In practice, most systems use both - vertical for database masters, horizontal for application servers and reads." 5. Discuss trade-offs: "Vertical is simpler but limited. Horizontal is more complex but nearly unlimited scalability."
Question 2: "How would you migrate from vertical to horizontal scaling?"
# Migration Strategy Answer 1. Make application stateless: "First, I'd move session state to external storage (Redis/database). This is critical for horizontal scaling." 2. Add load balancer: "Introduce a load balancer (ALB, Nginx) to distribute traffic." 3. Deploy multiple instances: "Deploy application to multiple servers behind load balancer." 4. Scale database: "Add read replicas for database reads, keep powerful master for writes." 5. Add caching: "Implement caching layer (Redis) to reduce database load." 6. Monitor and optimize: "Continuously monitor and add/remove servers based on traffic."
Question 3: "What are the challenges of horizontal scaling?"
# Challenges Answer 1. Stateless Design: "Application must be stateless - no in-memory session state. This may require refactoring." 2. Data Consistency: "With multiple servers, ensuring data consistency becomes challenging. Need to consider CAP theorem." 3. Load Balancing: "Need load balancer infrastructure and choose right algorithm." 4. Monitoring Complexity: "More servers means more monitoring, logging, alerting needed." 5. Network Overhead: "Inter-server communication adds latency and network overhead." 6. Database Bottleneck: "Application servers can scale, but database becomes bottleneck. Need read replicas, caching, or sharding."

Red Flags to Avoid

  • "Always use horizontal scaling" - Not true, depends on use case
  • "Vertical scaling is never good" - Wrong, it's simpler for small scale
  • "Just add more servers" - Ignoring stateless requirement, database bottleneck
  • "Cost is the only factor" - Missing complexity, operational overhead
  • "One size fits all" - Not considering application characteristics

Key Points to Emphasize

  • Start simple: Begin with vertical scaling, migrate when needed
  • Hybrid approach: Most systems use both strategies
  • Stateless first: Critical for horizontal scaling
  • Database scaling: Don't forget database when scaling application
  • Trade-offs: Always discuss pros and cons
  • Real-world examples: Mention how companies actually do it

Migration Strategies

Most applications start with vertical scaling and eventually migrate to horizontal scaling. Understanding migration strategies is important for system design interviews.

Phase 1: Start with Vertical Scaling

  • Launch application on a single server
  • Monitor performance and resource usage
  • Upgrade server as traffic grows (vertical scaling)
  • Keep it simple and cost-effective for initial growth

Phase 2: Identify Scaling Bottlenecks

  • Monitor CPU, memory, disk I/O, network I/O
  • Identify which resource is the bottleneck
  • Determine if vertical scaling can still solve the problem
  • Evaluate complexity vs benefits of horizontal scaling

Phase 3: Migrate to Horizontal Scaling

  • Stateless First: Make application stateless (move session storage to Redis/database)
  • Add Load Balancer: Introduce a load balancer (AWS ALB, Nginx, HAProxy)
  • Deploy Multiple Instances: Deploy application to multiple servers
  • Database Scaling: Add read replicas, implement caching
  • Monitor and Optimize: Continuously monitor and add/remove servers as needed
Migration Example: E-commerce Application

Phase 1: Start with single server (4GB RAM) handling 100 requests/second
Phase 2: Upgrade to 16GB RAM server handling 500 requests/second
Phase 3: Add load balancer + 3 servers (each 4GB RAM) handling 2000 requests/second
Phase 4: Scale to 10 servers handling 10,000 requests/second
Phase 5: Add database read replicas, caching layer, CDN

Trade-offs Analysis

Performance
  • Vertical: Lower latency (no network calls)
  • Horizontal: Network latency between servers
  • Winner: Depends on use case
Reliability
  • Vertical: Single point of failure
  • Horizontal: High availability
  • Winner: Horizontal scaling
Scalability
  • Vertical: Limited by hardware
  • Horizontal: Nearly unlimited
  • Winner: Horizontal scaling
Scaling Efficiency
  • Vertical: Diminishing returns, hardware limits
  • Horizontal: Linear scaling, nearly unlimited
  • Winner: Horizontal scaling (for large scale)
Complexity
  • Vertical: Simple implementation
  • Horizontal: Complex distributed system
  • Winner: Vertical scaling
Flexibility
  • Vertical: Fixed capacity
  • Horizontal: Dynamic scaling
  • Winner: Horizontal scaling

Common Mistakes to Avoid

❌ Mistake 1: Always Choosing Horizontal Scaling

Many engineers assume horizontal scaling is always better. However, for small applications, vertical scaling is simpler and more cost-effective. Don't over-engineer.

❌ Mistake 2: Ignoring State Management

When migrating to horizontal scaling, forgetting to make the application stateless is a common mistake. Session data stored in memory will be lost when requests hit different servers.

❌ Mistake 3: Not Considering Database Scaling

Scaling application servers horizontally is useless if the database becomes the bottleneck. Always consider database scaling (read replicas, sharding) when scaling application servers.

❌ Mistake 4: Underestimating Operational Complexity

Horizontal scaling requires monitoring, load balancing, service discovery, health checks, and more. Don't underestimate the operational overhead.

Interview Tips

How to Discuss Scaling in Interviews

  1. Start with Requirements: Ask about expected traffic, growth projections, and availability requirements.
  2. Recommend Vertical First: For most systems, start with vertical scaling and explain when you'd migrate to horizontal.
  3. Discuss Trade-offs: Always mention both approaches and their trade-offs, even if you recommend one.
  4. Consider the Full Stack: Don't just scale application servers - consider databases, caches, and other components.
  5. Mention Cost: Discuss cost implications of your scaling decisions.
  6. Plan for Migration: Explain how you'd migrate from vertical to horizontal scaling when needed.

Red Flags to Avoid

  • ❌ Recommending horizontal scaling for a small application
  • ❌ Not considering database scaling when scaling application servers
  • ❌ Ignoring cost implications
  • ❌ Not discussing fault tolerance and availability
  • ❌ Forgetting about state management in horizontally scaled systems

Key Points to Emphasize

  • ✅ Start simple (vertical), scale to complex (horizontal) when needed
  • ✅ Consider the entire system, not just one component
  • ✅ Discuss trade-offs explicitly
  • ✅ Mention real-world examples from companies
  • ✅ Consider cost, complexity, and operational overhead

Summary

Understanding horizontal vs vertical scaling is fundamental to system design. Here are the key takeaways:

  • Vertical Scaling: Add more power to existing machine. Simple, but limited and expensive at scale.
  • Horizontal Scaling: Add more machines. Complex, but unlimited and cost-effective at scale.
  • Best Practice: Start with vertical scaling, migrate to horizontal when needed.
  • Consider: Always think about the entire stack (application, database, cache) when making scaling decisions.
  • Trade-offs: Every decision has trade-offs - discuss them explicitly in interviews.

In the next section, we'll explore Load Balancing, which is essential for horizontal scaling.

📝 Quiz: Test Your Understanding

Complete this quiz to mark this topic as completed

Question 1: What is the primary difference between horizontal and vertical scaling?

  • a) Horizontal scaling is cheaper, vertical scaling is more expensive
  • b) Horizontal scaling adds more machines, vertical scaling upgrades existing machine
  • c) Horizontal scaling is for databases, vertical scaling is for applications
  • d) There is no difference

Question 2: When should you use vertical scaling?

  • a) When you need to handle millions of requests per second
  • b) When you need high availability
  • c) When you have small to medium traffic and want simplicity
  • d) When you need geographic distribution

Question 3: What is a key advantage of horizontal scaling?

  • a) Nearly unlimited scalability and high availability
  • b) Lower latency between components
  • c) Simpler implementation
  • d) No need for load balancers

Question 4: Which of the following is a disadvantage of vertical scaling?

  • a) Requires complex distributed systems
  • b) Network latency between servers
  • c) Need for load balancers
  • d) Single point of failure and hardware limits

Question 5: For a system needing 32 vCPUs and 64GB RAM, which approach is more cost-effective?

  • a) 1x m5.8xlarge (vertical) - $1,472/month
  • b) 16x t3.medium (horizontal) - ~$506/month
  • c) Both cost the same
  • d) Depends on the application type

6. Cache Coherency and Consistency

Maintaining consistency across multiple cache layers and distributed caches is challenging.

6.1 Cache Coherency Models

# Cache Coherency Models 1. Strong Consistency: - All caches see updates immediately - Requires synchronization - High latency, low throughput - Use case: Financial data 2. Eventual Consistency: - Caches eventually converge - Lower latency, higher throughput - May serve stale data temporarily - Use case: Most web applications 3. Weak Consistency: - No guarantees about when updates appear - Highest performance - May never converge - Use case: Analytics, non-critical data # Real-World: Google's Cache Coherency - Uses eventual consistency for most caches - Strong consistency only for critical operations - Reference: "Large-scale Incremental Processing" - Google Research

6.2 Distributed Cache Consistency

# Handling Consistency in Distributed Caches Problem: - Multiple cache nodes - Updates must propagate - Network partitions possible Solutions: 1. Write-Through to All Nodes: - Write to all cache nodes - Ensures consistency - High latency (wait for all) 2. Write-Through to Primary: - Write to primary node - Replicate to others asynchronously - Lower latency, eventual consistency 3. Invalidation Messages: - Invalidate on all nodes - Next read refreshes from database - Ensures eventual consistency # Real-World: Redis Cluster - Writes go to master node - Replicated to replicas asynchronously - Reads can go to any node (may be stale) - Strong consistency option: Read from master

7. Cloud Provider Caching Solutions

7.1 AWS ElastiCache

# AWS ElastiCache Features Supported Engines: - Redis (6.x, 7.x) - Memcached (1.6.x) Features: - Automatic failover (Redis) - Multi-AZ deployment - Automatic backups - Encryption at rest and in transit - VPC integration - CloudWatch monitoring Configuration Options: - Basic: Single node, development/testing - Standard: Multi-AZ, production-ready - Cluster: Horizontal scaling, large scale Instance Types: - Small: 0.5-1 GB (development) - Medium: 13-26 GB (small production) - Large: 52+ GB (large production) Use Cases: - Session storage - Real-time analytics - Leaderboards - Rate limiting Reference: https://aws.amazon.com/elasticache/

7.2 Google Cloud Memorystore

# Google Cloud Memorystore Supported Engines: - Redis (6.x, 7.x) - Memcached (1.6.x) Features: - Automatic scaling - High availability - VPC peering - IAM integration - Monitoring and alerts Tiers: - Basic: Single node, no replication - Standard: Primary + replica, high availability Features: - Automatic failover - VPC integration - IAM authentication Use Cases: - Application caching - Session storage - Real-time analytics Reference: https://cloud.google.com/memorystore

7.3 Azure Cache for Redis

# Azure Cache for Redis Tiers: - Basic: Single node, no SLA - Standard: 2 nodes (primary + replica), 99.9% SLA - Premium: Enhanced features, 99.95% SLA Features: - Redis persistence (Premium) - Geo-replication (Premium) - Virtual network integration - Private endpoints Tiers: - Basic: Single node, development - Standard: Primary + replica, production - Premium: Enhanced features, geo-replication Sizes: - Small: 250 MB - 1 GB (development/small apps) - Medium: 2.5 GB - 6 GB (medium apps) - Large: 13 GB+ (large scale apps) Reference: https://azure.microsoft.com/services/cache/

Common Mistakes to Avoid

  • Cache everything: Only cache expensive operations or frequently accessed data
  • Ignoring cache invalidation: Stale data can cause serious bugs
  • Not handling cache failures: Application should work without cache
  • Wrong eviction policy: Choose based on access patterns
  • Cache stampede: Implement lock-based or probabilistic refresh
  • Not monitoring cache metrics: Hit rate, latency, memory usage

How to Discuss Caching in Interviews

Caching is one of the most common topics in system design interviews. Here's how to approach it effectively.

Step-by-Step Interview Approach

1. Identify What to Cache
# Questions to Ask Yourself "What should I cache?" - Frequently accessed data - Expensive computations (database queries, API calls) - Static or semi-static content - User sessions "What should I NOT cache?" - Frequently changing data - User-specific sensitive data (unless encrypted) - Data larger than cache capacity - Data accessed rarely"
2. Choose Cache Pattern
# Decision Framework Cache-Aside (Most Common): ✅ Simple, flexible ✅ Cache failures don't break app ✅ Good for read-heavy workloads Use when: General purpose caching Write-Through: ✅ Cache always consistent ✅ Good for write-heavy workloads ❌ Higher write latency Use when: Strong consistency needed Write-Behind: ✅ Fastest writes ✅ High throughput ❌ Risk of data loss Use when: Write performance critical, can tolerate eventual consistency
3. Discuss Cache Invalidation
# Critical Interview Point "Cache invalidation is hard. I'd use: 1. TTL-based: Simple, automatic, but may serve stale data 2. Event-driven: Real-time, but complex infrastructure needed 3. Version-based: Efficient, but requires version tracking For most cases, I'd combine TTL with event-driven invalidation for critical data."

Common Interview Questions

Question 1: "How would you implement caching for a news website?"
# Strong Answer 1. Multi-level caching: "I'd use multiple cache levels: - CDN: Static assets (images, CSS, JS) - Application cache (Redis): Article content, metadata - Browser cache: User preferences" 2. Cache strategy: "For articles: - Cache-aside pattern - TTL: 5 minutes (articles don't change often) - Invalidate on article update - Cache popular articles longer" 3. Handle cache stampede: "Use lock-based refresh to prevent all requests hitting database when cache expires."
Question 2: "What happens when cache fails?"
# Critical Interview Question 1. Application should still work: "Cache is an optimization, not a requirement. Application should degrade gracefully." 2. Fallback strategies: - Direct database query (slower but works) - Circuit breaker pattern (prevent cascade failures) - Stale cache serving (if acceptable) 3. Monitoring: "Monitor cache hit rate, latency. Alert if hit rate drops significantly."
Question 3: "How do you prevent cache stampede?"
# Common Follow-up 1. Problem: "When cache expires, many requests try to refresh simultaneously, overwhelming database." 2. Solutions: - Lock-based: First request acquires lock, others wait - Probabilistic early expiration: Expire slightly before TTL - Background refresh: Refresh before expiration - Stale-while-revalidate: Serve stale, refresh in background 3. Implementation: "I'd use distributed lock (Redis) to ensure only one request refreshes cache at a time."

Key Points to Emphasize

  • Cache-aside is most common: Simple, flexible, handles failures well
  • Cache invalidation is hard: Always discuss strategies
  • Handle cache failures: Application must work without cache
  • Monitor cache metrics: Hit rate, latency, memory usage
  • Multi-level caching: Browser, CDN, application, database
  • Real-world examples: Facebook Memcached, Twitter Redis

Red Flags to Avoid

  • "Cache everything" - Only cache what makes sense
  • "Ignore cache failures" - Application must work without cache
  • "No invalidation strategy" - Stale data causes bugs
  • "One cache pattern fits all" - Choose based on use case
  • "No monitoring" - Need to track cache effectiveness

📝 Quiz: Test Your Understanding

Complete this quiz to mark this topic as completed

Question 1: What is the main difference between cache-aside and write-through patterns?

  • a) Cache-aside: Application manages cache, write-through: Cache and DB written together
  • b) Cache-aside is faster, write-through is slower
  • c) Cache-aside uses Redis, write-through uses Memcached
  • d) There is no difference

Question 2: What is cache stampede and how is it typically prevented?

  • a) Cache overflow, prevented by increasing cache size
  • b) Cache corruption, prevented by checksums
  • c) Many requests refreshing expired cache simultaneously, prevented by locks or probabilistic expiration
  • d) Cache inconsistency, prevented by write-through

Question 3: What is the main advantage of LRU eviction policy?

  • a) It considers access frequency
  • b) It works well for temporal locality (recent items likely to be used again)
  • c) It never evicts frequently used data
  • d) It's the fastest eviction algorithm

Question 4: What is the main difference between Redis and Memcached?

  • a) Redis is faster, Memcached is slower
  • b) Redis is for caching, Memcached is for databases
  • c) Redis doesn't support persistence, Memcached does
  • d) Redis has rich data structures and persistence, Memcached is simple key-value only

Question 5: What is the primary advantage of write-behind cache pattern?

  • a) Very fast writes (cache only) and high write throughput
  • b) Guaranteed consistency with database
  • c) No risk of data loss
  • d) Simplest to implement