distributed systemshomes.sice.indiana.edu/prateeks/ds/lec1.pdf · 2019-01-09 · welcome! •what...

Post on 02-Apr-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Distributed SystemsCSCI-B 534/ENGR E-510

Spring 2019Instructor: Prateek Sharma

Welcome!• What is a distributed system? • Where are distributed systems found?• Why you should take this course? • Small taste of challenges in distributed systems• Course contents, outline, structure, etc.

What Is A Distributed System?• Collection of autonomous computing elements that appears to its users as a

single coherent system• Computing elements: hardware devices or software processes • Single coherent system: Users and applications perceive a single system

Nodes / Servers / Processes

Communication link

Where Distributed Systems Fit In

Computer Architecture

Operating Systems

Computer Networks

Distributed Systems

Web Services (e-commerce)

“Big Data” Processing

Cloud Storage

Machine Learning

Internet of Things

Abstraction Level

Distributed Systems Are Everywhere• Large-scale Internet Services

• Web clusters for high-traffic websites

• Cloud storage • Dropbox, Google Drive,...

• Large-scale data processing• Map-reduce to process TB’s of data

• Graph processing • Social network analysis

• Large-scale machine learning• Model training and inference

• Sensor Networks • Internet of Things

• Modern multi-core architectures

Large-Scale Distributed Systems

• Conventionally: application deployed on a single server • Warehouse-scale computing: meet increasing computing needs of applications• How to handle computing, storage, and networking needs of millions of users?

Challenges In Distributed Systems• Nodes are independent and can be uncoupled or only loosely coupled• No global clock!

• How to order events within a distributed system?

• How to organize a distributed system? • Node membership: fixed, dynamic

• Communication between nodes• Structured: Each node has well-defined set of neighbors (tree, ring)• Unstructured: Nodes can communicate with any other node

• Fault-tolerance: Nodes can fail in multiple ways • Data consistency• Performance

Designing Distributed Software Systems• Scalability: Handle massive changes in user requests (10X)• Performance• Reliability and Resiliency: Handle partial failures gracefully• Usability: Abstractions and Interfaces • Jeff Dean: Building Software Systems At Google and Lessons Learned

https://www.youtube.com/watch?v=modXC5IWTJI

Coherent Systems• Collection of nodes as a whole appears the same to the end-user • Single coherent system: Users/applications perceive a single system • Users cannot determine location of computation/data

• Or details about the data replication

Nodes

Some Challenges In Distributed Systems1. Distributed reads and writes2. How to build distributed systems --- Middleware 3. Two Generals Problem

Conventional Program Semanticsdef foo():X=0 ;X=1 ;print(X); 

• We expect X==1, because that was the last write • Writes take effect “in order” of their issue

• Aka “Strong consistency”

Distributed Reads and Writes

• How to ensure that data written can be retrieved from any server?

Distributed Reads and Writes

• How to ensure that data written can be retrieved from any server? • Replication!• Broadcast values after a write

Tradeoffs In Distributed Systems• Previous consistency example:• Sacrifice latency for “strong consistency”• Or, have a “looser” consistency for lower latency• Another classic tradeoff is Cost vs. Performance:

Low Cost, High Performance(Doesn’t exist)

High Cost, High Performance

Low Cost, Low Performance High Cost, Low Performance(Undesirable)

Middleware: The OS of Distributed Systems• Commonly used components and functions for distributed applications

Middleware Goals• Resource Sharing• Distribution Transparency• Openness (Other nodes can join)• Scalability

Two Generals Problem• Two Roman Generals want to co-ordinate an attack on the enemy

• Both must attack simultaneously. Otherwise, both will lose

• Only way to communicate is via a messenger • But messengers can get captured/lost. • Perfectly-reliable communication system not available Task: Design a protocol

that ensures the two generals always attack simultaneously

Impossibility Proof of Two Generals Problem• Claim: There is no non-trivial protocol that guarantees that the two generals

will always attack simultaneously • Proof by induction on the number of messages• Let d messages be delivered at the time of attack• Base case: d=0. Claim holds (Impossible without any delivered messages)• Suppose impossibility claim holds for d=n. Then, we’ll show for d=n+1• Consider message n+1

• Sender attacks without knowing if message is delivered or not • Receiver must then attack too, even if msg not received • So the last message (n+1) was irrelevant, and n messages suffice • But that’s a contradiction: since n+1 was supposed to be the smallest number of messages

Common Knowledge• Solving the Two Generals Problem requires common knowledge • Common knowledge cannot be achieved with unreliable communication

channels

What Is This Course About?

Course Outline Major theme: Combine fundamentals of distributed computing with real-world systems and applications 1. Architecture of Distributed Systems 2. Components of distributed systems: processes, threads, and virtualization3. Communication in distributed systems and RPC’s4. Ordering of events, logical clocks [theory]5. Vector clocks and their applications6. Distributed data processing (Map-Reduce) 7. Mutual exclusion8. Leader election9. Global state and snapshots

Course Outline Continued10. Data consistency and CRDT’s11. Distributed Caching12. Key-value stores (Riak, Voldemort, Redis)13. Storage: NFS14. Storage: CAP theorem 15. Consensus: Raft, Paxos16. Fault-tolerance: Spark 17. Distributed ML 18. Cluster Scheduling 19. Block Chains

Course Prerequisites • Distributed Systems is fundamentally an advanced topic • Even “easy” problems in conventional non-distributed computing are hard or

even impossible in distributed settings • We will focus both on rigorous distributed algorithms and engineering large

distributed systems

•“You can have a second computer if you can show you know how to use the first one”---Paul Barham

• You must be proficient in:• Systems and network programming

Processes, threads, sockets, file-handling, basic UNIX IPC (pipes, etc.)• Algorithmic thinking and discrete math proof techniques (such as induction)

Logistics

Course Web-pagehttp://homes.sice.indiana.edu/prateeks/dist-sys-course.html

Course Structure and Text Book• Lectures: Tuesday and Thursday --- 1pm to 2.15 pm • Questions encouraged!

• Points for class participation

• Readings assigned for each lecture• Read before coming to class!

• Text-book: “Distributed Systems”. Maarten van Steen and Andrew Tanenbaum• Soft-copy available on the web

• Most lectures will also discuss research papers • Reference book for distributed algorithms:

• “Elements of Distributed Computing”. Vijay Garg

Evaluation Components• Programming Assignments [20%]

• 2—4 spread throughout the semester• Usually language agnostic (choose any popular programming language of your choice)• Poll: Python, C++, Java, Go, Rust

• Homework [10%]• Exams [30%]

• Mid-term(s)

• Final Project [30%]• Build a distributed system. End-goal is to have a conference-style paper • Groups of 3

• Class Participation: Quizzes etc. [10%]

Resources• http://homes.sice.indiana.edu/prateeks/dist-sys-course.html• Check course webpage frequently!• Office hours:

• Prateek: Thu 3—4 in Luddy Hall 4126• Vibhatha: TBA

• Canvas:• Discussions • Submitting assignments

Next Time• Distributed systems building blocks

• Refresher on Operating System Processes and Threads

top related