Guidelines
As you will see each of the real system design problems can be broken down into the fundamentals or design explained in the first two parts of the course. The same pattern of listing functional and non-functional requirements, back of envelope calculations, database type selection, system overview and breaking it into components, explaining problems in distributed system and how we can solve it and at the end telemetry/instrumentation - data driven decision making, flow can be applied to any of the problems. Below you will find index of all the topics and part of the course which explains it:
- General problems every system has to take care of - Three Concerns in Sofware Systems
- Relational vs Document vs Graph based databases - Data Models and Query Language
- Primary index, database at lower level with focus on B-trees and LSM - Storage and Retrieval
- How to encode/decode data, schemas, backward and forward compatibility - Encoding and Evolution
- All about replication - Replication and Data Replication
- Partitioning - Partitioning and Data Partitioning
- Asynchronous Messaging - Asynchronous Messaging
- Autoscalling and rate limiting - AutoScalling
- Caching pessimistic/optimistic concurrency - Caching
- Transactions, isolation levels, MVCC, write skews and phantoms - Transactions
- System Model, Fencing tokens, Unreliable clocks - The Trouble with Distributed Systems
- Total order and causal broadcast, 2PC - 2 phase commit, CAP therem, linerazibility and Eventual vs Strong Consistency - Consistency and Consensus and Data Consistency
- Batch Processing, Distributed FileSystems and MapReduce - Batch Processing
- Streams, Change data capture, Messaging Systems, stream joins, idempotent operations, fan-out, Loadbalancing - Stream Processing
- Labmda architecture, Federated database, Unbundled databases, Dataflow Systems and how stream can replace distributed transactions The Future of Data Systems
- Telemetry instrumentation collecting, and identifing bottlenecks Telemetry
Suggested way would be to through the fundamentals in the order layed out above before jumping to the problems, but you could also start with the problems and read more about fundamental topics referenced in the solutions. Below are lessons learned from each of the real system desing problems:
- Transactions on low level, B-trees, Replication/qourom, Write-ahead-log for recovery/fault tolerance, how we design distributed databases - Amazon Aurora
- Distributed transactions, 2PC, 2PL, multi-version concurrency, strong consistency/linerazibility - Google Spanner Design
- Distribured Cache Solution, Cache aside pattern, Availability Zones - Facebook Memcached
- Replication, partitioning, consistent hashing, crash recovery analysis and hearbeat. Focus on telemetry - Key-Value Store
- Rate limiting algorithms, problems with distributed system and how to solve them - Rate Limiter
- Back of Enveloper calculations, generating unique ID in distributed databases, top K problem, probabilistic data structures, Lambda Arhitecture and MapReduce - Tiny URL
- Designing for different user categories, monitoring, graph database, fan on read/write (push/pull model) - Social Network
- Websockets, CDN, image/media hashing/checking for duplicates - Chat System
- Search Service - Elastic - Search Consumer - Kafka flow, distributed transactions - Hotel Booking System
- Arhive Cold Storage, Redis TTL - Ecommerce Platform
- Independent operations/messages with queue and multiple workers (prioritization, classification of different types of messages), exactly once semantics - Notification System
- Workflow/Dataflow engine (events, milestones), CDN, Transcoding - Video Sharing Platform
- UDP vs TCP, recording video call, Turn and Stun Servers - Video Calling System
- Trie, Storing data structure to db/cache, real time processing with fast and slow path - Search Autocomplete System
- Applied graph algorithms, kd trees, segmentation - Google Maps
- How to use maps, segmentation, compaction/spliting - Design Uber
- Delta Sync, Compression, Conflict-free replicated data types (CRDTs), Strong vs Eventual Consistency, State and Operation based Replication, Broadcast - Cloud Storage Service