Case study

Netflix

Streaming & original content

Overview

Netflix couples globally distributed playback with heavy personalization: rows, artwork, and rankings adapt per profile and region. Engineering students often see “a video app”; at scale it is data + delivery + device matrix under strict QoE goals.

The company is also known for chaos engineering and cell-based architectures—resilience ideas that translate well to classroom discussions on failure injection and blast radius.

Technical problems at scale

Recommendation and ranking under latency SLOs

Home and detail surfaces call ranking services with tight budgets. Caching, approximate precompute, and graceful fallbacks keep playback paths hot even when ML tiers degrade.

Open Connect and edge footprint

ISP-hosted appliances and massive edge caches reduce origin load. Cache fill, eviction, and manifest signing interact with DRM and regional compliance.

Encoding ladders and per-title optimization

ABR needs multiple renditions per asset; complexity analysis and per-shot tuning trade storage cost against rebuffering on constrained networks.

Playback on fragmented devices

Smart TVs, consoles, and old mobile OS versions mean codec support, DRM, and app rollout strategies differ by cohort—classic long tail client engineering.

Systems & patterns you will hear about

  • Personalization & ML serving
  • Global CDN (Open Connect)
  • HLS / DASH & ABR
  • Microservices / cells
  • Chaos & resilience testing

Case-study angles

Sketch what happens when the home-ranking service is slow but the playback manifest service is healthy—what does the client show, and for how long?

Compare pushing new encodes to the edge vs invalidating stale segments during a bad deploy.