Tetris, a model predictive control (MPC)-based container scheduling strategy to judiciously migrate long-running workloads for cluster load balancing.
Tetris comprises three pieces of modules including a container resource predictor, an MPC-based container scheduler, and a container migration module of K8s. After users submit workload requests to the containerized cluster, Tetris first leverages the predictor to estimate the resource consumption of containers over a given time window
As Tetris aims to jointly optimize the load imbalance degree and migration cost, we first build a discrete-time dynamic model to capture the load imbalance degree of clusters and migration cost of containers and next formulate a container scheduling optimization problem based on the dynamic model.
By using the variance of resource consumption of all cluster servers to represent the load imbalance degree of the cluster, we can model the cluster load imbalance degree at each timeslot as
To denote the migration cost by the sum of the unit cost of migrated containers, the migration cost is formulated as
To sum up, we further formulate the cost function
Based on our discrete-time dynamic model above, we proceed to formulate the long-term optimization problem of container scheduling based on MPC. At each time
- Containerd 1.6.6
- Kubernetes 1.24.3
- Python 3.8.8
- numpy
- pandas
- matplotlib
- sklearn
- scipy
- simpy
- pyDOE
The project of Tetris includes two parts:
- Simulator_Alibaba_trace: The trace-driven simulator based on Alibaba cluster trace v2018.
- Prototype-K8s: The prototype system of Tetris based on K8s.
$ git clone https://github.com/icloud-ecnu/Tetris.git
$ cd Tetris
$ python3 -m pip install --upgrade pip
$ pip install -r requirements.txt
Download the dataset from Alibaba cluster trace v2018. We have also support simulations with Alibaba cluster trace v2022
Run the script to download the processed dataset:
-
If you want to get the data in 1 hour,
$ cd Tetris/Simulator_Alibaba_trace/data/ $ ./data_download_1h.sh
-
If you want to get the data in 10 seconds,
$ cd Tetris/Simulator_Alibaba_trace/data/ $ sh ./data_download_10s.sh
$ cd Tetris/Simulator_Alibaba_trace
$ python3 main.py
After you run the script, you will get the results in Tetris/Simulator_Alibaba_trace/metric/
, which is a CSV file.
*Noted that if you want to use the data in 10 seconds in experiments, please replace filepath = './data/resource_1h/'
in main.py
with filepath = './data/resource_10s/'
.
Run the commands to send requests to Apache Tomcat server:
# Tomcat
$ sudo apt-get update
$ sudo apt-get install apache2-utils
$ cd Tetris/Prototype-K8s
$ sh request-tomcat.sh
# Redis
$ cd Tetris/Prototype-K8s
$ sh request-redis.sh
# Resnet
$ cd Tetris/Prototype-K8s
$ sh request-resnet.sh
*Remember to replace the IP address in request.sh
with the IP address of the master in your own K8s cluster.
$ cd Tetris/Prototype-K8s
$ nohup python3 -u ./runk8s.py --algo=tetris > ./log/runk8s_tetris.log &
After you run the script, you will get the log in Tetris/Prototype-K8s/log/
which is a .txt file, and the results in Tetris/Prototype-K8s/metric/
, which is a .csv file.
After the program runs, the information and running results of container migrations will be output on the screen and also be stored in the specified path (see above for details). The files can show different metrics of the algorithm, whose meanings are as follows:
clock | eval_bal | eval_mig | sum | sums | violation |
---|---|---|---|---|---|
Timeslot | The load imbalance degree in this timeslot | The migration cost in this timeslot | The value of the optimization objective in this timeslot | The cumulative value of the optimization objective | The number of SLO violations in this timeslot |
For example, the table below is a partial display of the experimental results using Tetris:
clock | eval_bal | eval_mig | sum | sums | violation |
---|---|---|---|---|---|
0 | 39783.11131 | 3.1 | 39832.56251475 | 39832.56251475 | 0 |
1 | 31311.3932865 | 5.16 | 31393.7056065 | 71226.2681212501 | 0 |
2 | 26269.59334975 | 1.56 | 26294.47846975 | 97520.7465910001 | 0 |
Fei Xu, Xiyue Shen, Shuohao Lin, Li Chen, Fen Xiao, Zhi Zhou, and Fangming Liu. "Tetris: Scheduling Long-Running Workloads for Load Balancing in Shared Containerized Clusters," accepted by IEEE Transactions on Service Computing, 2024.