Course Overview
The field of distributed systems studies the design, implementation, and behavior of systems that involve independent components that communicate by passing messages to one another over a network. In addition to the usual challenges of concurrency, distributed systems may be characterized by unbounded latency between components and independent failure of components, making them challenging to reason about and debug.
Some of the foundational distributed systems concepts we’ll explore in this course are:
Time and asynchrony. No two computers can reason about each others’ perception of time. What does it mean to talk about time when we don’t share a clock?
Fault tolerance and replication. Given that computers crash and messages lost, how can we write protocols and algorithms that have adequate redundancy to tolerate failure? Maybe if I think a computer will crash, it’s a good idea to run the same computation on more than one! Maybe if I think messages will be lost, I should send the same message more than once!
Consistency and consensus. Is our system storing the right data and providing the right responses? I might have two “replicas” that aren’t actually replicas! If replicas disagree, how do we know which one is right?
Parallelism. Why deal with all the pain of distributed systems? Sometimes, if you throw a bunch of computers at a problem, you can do things faster – much faster.
Table of Contents
Lecture 1: logistics/administrivia; distributed systems: what and why?
Lecture 3: happens-before recap; partial orders; total orders; Lamport clocks; vector clocks
***
The course overview is copied from the course website.