This repository contains a small framework written in python for emulating asynchronous and synchronous distributed systems.
The stepping emulator requires the following packages to run
- PyQt6
- pynput
- cryptography
These packages can be installed using pip
as shown below:
pip install --user -r requirements.txt
These packages can be installed using conda
as shown below:
conda env create -f environment.yml
conda activate ds
A FAQ can be found here
Exercises will be described later in this document.
In general avoid changing any of the files in the emulators
subdirectory.
Instead, restrict your implementation to extending emulators.Device
and emulators.MessageStub
.
Your implementation/solution for, for instance, exercise 1 should go into the exercises/exercise1.py
document.
I will provide new templates as the course progresses.
You should be able to execute your solution to exercise 1 using the following lines:
python exercise_runner.py --lecture 1 --algorithm Gossip --type sync --devices 3
python exercise_runner.py --lecture 1 --algorithm Gossip --type async --devices 3
python exercise_runner.py --lecture 1 --algorithm Gossip --type stepping --devices 3
The first line will execute your implementation of the Gossip
algorithm in a synchronous setting with three devices,
while the second line will execute in an asynchronous setting.
The third line will execute your implementation in a synchronous setting, launching a GUI to visualize your
implementation, the setting can be adjusted during execution.
For usage of the framework, see exercises/demo.py
for a lightweight example.
The example can be run with:
python exercise_runner.py --lecture 0 --algorithm PingPong --type async --devices 3
The stepping emulator can be used to run the algorithm in steps where one message is sent or received for each step in this emulator. The Stepping emulator can be controlled with the following keyboard input:
space: Step a single time through messages
f: Fast-forward through messages
enter: Kill stepper daemon and finish algorithm
tab: Show all messages currently waiting tobe transmitted
s: Pick the next message waiting to be transmitted to transmit next
e: Toggle between sync and async emulation
The framework can also be launched with an interface by executing the following line:
python exercise_runner_overlay.py
Where your solution can be executed through this GUI.
If the stepping emulator is chosen, the framework will launch with a GUI visualising some different aspects of your algorithm, an example of the Stepping GUI is shown below:
If you have any extensions or improvements you are welcome to create a pull request.
However, pull requests should provide some significant improvement, changes in style, renaming of variables etc. is strongly discouraged: such changes would likely break the solutions of your fellow students.
Implement the following gossiping problem in exercises/exercise1.py
.
A number of persons initially know one distinct secret each.
In each message, a person discloses all their secrets to the recipient.
These individuals can communicate only in pairs (no conference calls) but it is possible that different pairs of people talk concurrently. For all the tasks below you should consider the following two scenarios:
- Scenario 1: a person may call any other person, thus the network is a total graph,
- Scenario 2: the persons are organized in a bi-directional circle, where each person can only pass messages to the
- left and the right (use the modulo operator).
In both scenarios you should use the async
network, details of the differences between sync
and async
will be
given in the third lecture.
Your tasks are as follows:
- implement the above behaviour - however, with the freedom to pick which person to talk to, when to send a message, etc.
- Try to minimize the number of messages.
- How few messages are enough?
- Is your solution optimal? And in what sense?
You can have several copies of the Gossip
class, just give the class another name in the exercise1.py
document, for
instance ImprovedGossip
.
You should then be able to call the framework with your new class via:
python exercise_runner.py --lecture 1 --algorithm ImprovedGossip --type async --devices 3
- Similarly to the first lecture, in the
__init__
ofRipCommunication
, create a ring topology (that is, set up who are the neighbors of each device). - Implement the RIP protocol (fill in missing code in merge_tables), described in [DS, fifth edition] Page 115-118. (NOTICE: To run/debug the protocol, you must first implement the network topology described in "task 2.0" below.)
- Now that you have a ring topology, consider a ring size of 10 devices.
- How many messages are sent in total before the routing_tables of all nodes are synchronized?
- How can you "know" that the routing tables are complete and you can start using the network to route packets? Consider the general case of internet, and the specific case of our toy ring network.
- For the ring network, consider an approach similar to
-
Does it work? Each routing table should believe it is completed just one row short. How many times do the routing tables appear to be completed?
def routing_table_complete(self): if len(self.routing_table) < self.number_of_devices-1: return False return True
-
- Try this other approach, which works better:
-
def routing_table_complete(self): if len(self.routing_table) < self.number_of_devices-1: return False for row in self.routing_table: (next_hop, distance) = self.routing_table[row] if distance > (self.number_of_devices/2): return False return True
-
- Send a
RoutableMessage
after the routing tables are ready. Consider the termination problem. Can a node quit right after receiving theRoutableMessage
for itself? What happens to the rest of the nodes? - What happens, if a link has a negative cost? How many messages are sent before the
routing_tables
converge?
Please consult the moodle page, this exercise is not via this framework.
Look at exercises/exercise4.py
, here you should find the SuzukiKasami
class
implementing Suzuki-Kasami’s Mutex Algorithm.
For all exercises today, you can use the sync
network type - but most algorithms should work for async
also.
-
Examine the algorithm
- Make a doodle on the blackboard/paper showing a few processes, their state, and messages exchanged. Use e.g. a sequence diagram.
- Define the purpose of the vectors
_rn
and_ln
.
-
Discuss the following situations
- Is it possible that a node receives a token request message after the corresponding request has been granted? Sketch a scenario.
- How can a node know which nodes have ungranted requests?
- How does the queue grow?
-
Characterize the algorithms performance and correctness:
- Is the algorithm correct? (ME1, ME2, ME3)
- How does it perform ? (bandwidth, enter/exit delay)
- How does it cope with failures? / How can it be made fault tolerant?
-
Bonus exercise, modifying the
TokenRing
class ofexercises/exercise4.py
:- Implement heartbeats in the token-ring algorithm for failure detection,
- Make it robust against node-failiures, and
- Make it possible for new processes to join the ring.
-
Extracurricular exercise/challenge (only if you have nothing better to do over the weekend)
- Extend the mutex algorithm implementations s.t. the
do_work()
call starts an asynchronous process (e.g. a future) which later calls arelease()
method on the mutex classes. - Check that the algorithms still work, and modify where needed.
- Submit a pull-request!
- Extend the mutex algorithm implementations s.t. the
-
Identify two problems with Reliable Multicast over IP
- What is a practical problem for Reliable Multicast over IP?
- What is a theoretical problem for Reliable Multicast over IP?
-
Identify all the events in the following picture
- Compute the lamport clocks for each event
- Compute the vector clock for each event
- What is the difference in the orderings produced by vector and lamport clocks?
-
Design (and implement) Totally Ordered FIFO multicast (both requirements must be met). You can use the
TOSEQMulticast
fromexercise5.py
as a starting-point.- Is it reliable?
- if not, can it become so?
- if it is, argue why!
- Is it reliable?
-
Discuss FIFO ordering in two overlapping multicast groups
- FIFO is no longer guaranteed, how is it broken, and how do you fix it?
-
Bonus exercise: Fix the ISIS algorithm!
- Hint: how (and when) do you identify a tie?
-
Study the pseudo-code in the slides (on Moodle) and complete the implementation of the
King
Algorithm inexercise6.py
- How does the algorithm deal with a Byzantine king (try f=1, the first king is byzantine)?
- Why does the algorithm satisfy Byzantine integrity?
- Sketch/discuss a modification of your implementation such that the algorithm works in an
async
network, but looses its termination guarantee- What would happen with a Byzantine king?
- What would happen with a slow king?
- What about the combination of the above?
-
Bonus Exercise: Implement the Paxos algorithm in
exercise6.py
. See the pseudocode on Moodle (use the video for reference when in doubt) for the two interesting roles (proposer and acceptor).- Identify messages sent/received by each role
- Investigate
PAXOSNetwork
- Investigate
- Implement each role but the learner
- Assume that each device is both a
Proposer
and anAcceptor
(theLearner
is provided) - A class combining/forwarding messages is provided (
PAXOS
). - Your job is to implement the missing functionality in
Proposer
andAcceptor
, search for "TODO"
- Assume that each device is both a
- Demonstrate that your code works in an
async
environment- Try with a large number of devices (for instance 20 or 30)
- Discuss how you can use Paxos in "continued consensus" where you have to agree on the order of entries in a log-file
- Identify messages sent/received by each role
- DS5ed exercises 18.10 and 18.13
- Sketch an architecture for the following three systems: A bulletin board (simple reddit), a bank, a version control
system (e.g. GIT)
- Identify the system types (with respect to CAP).
- Which replication type is suitable, and for which parts of the system?
- If you go for a gossip solution, what is a suitable update frequency?
- BONUS Exercise: Implement the Bully algorithm (DS 5ed, page 660) in
exercise7.py
- In which replication scheme is it useful?
- What is the "extra cost" of electing a new leader in replication?
- Compare GFS and Chubby, and identify use cases that are better for one or the other solution.
- Consider the code in
exercise8.py
, which sketches GFS, and complete the "append record" implementation.- Take a look at how the GFS master is implemented, and how it translates file + chunk index to chunk handle
- Sketch a design for the "passive replication" solution of the GFS. Consider how many types of messages you need, when they should be sent, etc
- Implement your solution, starting from the handler of RecordAppendReqMessage of the GfsChunkserver class
- Try out your solution with a larger number of clients, to have more concurrent changes to the "file"
- Implement fault tolerance mechanisms(Heartbeat Detection and Failover Mechanism) in the system:
- Heartbeat Detection: The GFS master should periodically receive heartbeats from each chunkserver to monitor their health.
- Modify the GfsChunkserver class to send periodic heartbeat messages to the GfsMaster.
- Update the GfsMaster class to maintain a list of active chunkservers based on received heartbeats.
- Failover Mechanism: When a chunkserver fails (i.e., stops sending heartbeats), the requests should be redirected to available replicas.
- Introduce a simple mechanism to simulate a chunkserver failing.
- Update the GfsMaster to detect a missing heartbeat and mark the corresponding chunkserver as failed.
- Ensure the client (GfsClient) can reroute operations to other replicas if a chunkserver has failed.
- Heartbeat Detection: The GFS master should periodically receive heartbeats from each chunkserver to monitor their health.
- BONUS Exercise: Add shadow masters to the system. Clients and chunk servers will still interact with the first master to change the file system, but the shadow master can take over if the master shuts down.
NOTICE: To execute the code, issue for example:
python exercise_runner.py --lecture 8 --algorithm GfsNetwork --type async --devices 7
- How do MapReduce, Spark, and Pregel differ? Discuss the following:
- What are some major design and architectural differences in the three systems?
- What are the target use cases for the different system? When do they perform good or bad.
- What are trade-offs in terms of performance, scalability, and complexity for the three systems?
- Consider the code in
exercise9.py
, which sketches MapReduce, and complete it.- Unzip the file books.zip in ex9data/books.
- The Master is pretty much complete. The same can be said for the client. Take a look at how the Master is supposed to interact with Mappers and Reducers.
- Consider how to implement the Reducers. Take into account that we are simulating "local storage in the mappers" using memory.
- Look for the TODOs, and implement your solution.
- Try to change the number of mappers and reducers, and look at the "performance". In particular, look at how many rounds are needed to complete the job with the "sync" simulator.
- Add simulation for stragglers (slow workers)
- Modify the
MapReduceWorker
run
ordo_some_work
method to occasionally add a random delay for specific workers, which can rarely be very large. - Modify the
MapReduceMaster
to track the progress of each worker and reassign uncompleted tasks if a worker takes too long. - Discuss the real-world relevance of stragglers in distributed systems (e.g., slow network nodes, overloaded servers) and how they affect system throughput and overall performance.
- Modify the
NOTICE: To execute the code, issue for example:
python exercise_runner.py --lecture 9 --algorithm MapReduceNetwork --type async --devices 6
-
There are "exercises" (actually, questions) on the Moodle page. I suggest to start with them.
-
Consider the code in
exercise10.py
, which sketches a blockchain similar to bitcoin. Consider that transactions are just strings and we will do no check on the transactions. I had to add a very "random" termination condition. I associate a miner to each client, the code will never stop if I have an odd number of devices.- Take a look at the Block and the Blockchain classes (they are NOT devices) and consider how the blockchain is supposed to grow.
- Design the logic for when a miner sends a blockchain (with its new block) to another miner. What do you do when you receive a new block? What if there is a fork? Can it happen? How do you manage it to preserve the "longest chain" rule?
- Look for the TODOs, and implement your solution.
- Try the code for both sync and async devices. Does it work in both cases?
-
Consider the modified blockchain, we will simulate an attack that controls a percentage of the total computing power. The goal of the attacker is to create a fork of the blockchain that ignore the last 5 blocks, and eventually make the fork the dominant chain in the network.
- Think about how you can exploit a majority control of the network to alter the blockchain history by introducing a fork.
- How can the attackers ensure that their fork becomes the longest chain and eventually gets accepted by other miners?
- How do you deal with new transactions in both the original chain and the forked chain?
- Take a look at the existing BlockchainNetwork and BlockchainMiner classes. Modify these to implement your blockchain attack.
- Create the BlockchainAttacker, which misbehaves if _self.id() is lower than a fraction of the total number of nodes.
- Initiate a fork of the blockchain using the attackers.
- Make the attackers collude to ensure that the new fork surpasses the original.
- Observe how the other miners react to your fork and if they eventually switch to your chain.
- Play around with the number of attackers and discuss their impact on the network, can they create a new longest chain without majority control?
- Think about how you can exploit a majority control of the network to alter the blockchain history by introducing a fork.
NOTICE: To execute the code, issue for example:
python exercise_runner.py --lecture 10 --algorithm BlockchainNetwork --type async --devices 4
- There are "exercises" on the Moodle page. I suggest to start with them.
- Consider the code in
exercise11.py
, which sets up the finger tables for chord nodes. I have a client, connected always to the same node, which issues some PUTs.- Take a look at how the finger tables are populated, but please use the slides, since the code can be quite cryptic.
- Design the logic for the routing process, thus: when do I end the routing process? Who should I send the message to, if I am not the destination?
- Look for the TODOs, and implement your solution.
- If you have time, implement the JOIN process for device 1.
NOTICE: To execute the code, issue for example:
python exercise_runner.py --lecture 11 --algorithm ChordNetwork --type async --devices 10
- There are "exercises" on the Moodle page. I suggest to start with them.
- Consider the code in
exercise12.py
, which creates the topology for your IoT wireless network. The goal is to implement AODV.- Please note that you can
self.medium().send()
messages only to the nodes inself.neighbors
. This simulates a wireless network with limited range. - Design the logic for the Route Request process. What can you use as a broadcast id? Design also the Route Reply process, which should be much easier.
- Look for the TODOs, and implement your solution.
- Please note that you can
- Now consider a node going offline in the network. Simulate a node failure in the AODV-based network and observe how the network handles it.
- Discuss how you expect the network to handle the disconnection of a node.
- Implement logic that simulates a node failure for one of the nodes by disconnecting it from its neighbours.
- Observe the network's routing behaviour after the node fails. Does it match your expectations?
NOTICE: To execute the code, issue for example:
python exercise_runner.py --lecture 12 --algorithm AodvNode --type async --devices 10