Apollo - a tool for picking the best cloud configuration for big data analytics jobs running on clouds. Cloud configuration cotains the VM type (for example ec2.a1.large in Amazon EC2) and the number VM instances. The models we support for picking the best cloud configuration include Bayesian Optimizaion (BO) and Random Forest (RF).
Copyright: Institute of Software, Chinese Academy of Sciences
Contact: [email protected]
- Download the project to local environment. Require python 2.7 at least.
- Execute
python run.py
to run the server. - Use RESTful API to select the next cloud configuration.
Initialization
GET: http://URL/pribo
RESPONSE:
{
'count':X,
'ram':X,
'cpu_count':X,
'netType':X,
'diskType':X
}
Set a configuration
PUT: http://URL/pribo/
parameters:
{
'count':X,
'ram':X,
'cpu_count':X,
'netType':X,
'diskType':X,
'time':X
}
RESPONSE:
{
'count':X,
'ram':X,
'cpu_count':X,
'netType':X,
'diskType':X
}
Do following steps to select the best cloud configuration.
Client:
Send Initialization request to server.
Server:
Randomly selecting 3 different cloud configurations to initialize the process, reture the details of configurations.
Client:
Running the job using those 3 cloud configurations, put the job running time to server.
Server:
Training the model using BO or RF model, then return the next cloud configuration which will achieve optimal job running time.
Client:
Apply the new cloud configuration, and feedback the job running time to server.
Server:
Keep on training the model with new data, until the result is confidence enough to be the best.
End.