Rank tasks running as docker containers in a cluster.
Task Ranker runs as a cron job on a specified schedule. Each time the task ranker is run, it fetches data from Prometheus, filters the data as required and then submits it to a task ranking strategy. The task ranking strategy uses the data received to calibrate currently running tasks on the cluster and then rank them accordingly. The results of the strategy are then fed back to the user through callbacks.
You will need to have a working Golang environment running at least 1.12 and a Linux environment.
Run the below command to download and install Task Ranker.
go get github.com/pradykaushik/task-ranker
Task Ranker can be used in environments where,
- Prometheus is used to collect container specific metrics from hosts on the cluster that are running docker containers.
- cAdvisor, a docker native metrics exporter is run on the hosts to export resource isolation and usage information of running containers.
See cAdvisor docs for more information on how to monitor cAdvisor with Prometheus.
CAdvisor prefixes all container labels with container_label_
.
Given that the Task Ranker only talks to Prometheus, the labels provided should also include these prefixes.
For example, let us say that we launch a task in a docker container using the command below.
docker run --label task_id="1234" -t repository/name:version
CAdvisor would then export container_label_task_id
as the container label.
Task Ranker configuration requires two components to be configured and provided.
- DataFetcher - Responsible for fetching data from Prometheus, filtering it
using the provided labels and submitting it to the chosen strategy.
- Endpoint: Prometheus HTTP API endpoint.
- Ranking Strategy - Uses the data to calibrate currently running tasks and then rank them accordingly.
- Labels: Used for filtering the time series data using the specified label matching operation.
- Receiver of the task ranking results.
Task Ranker is configured as shown below. The below code snippet shows how Task Ranker can be configured to,
- fetch time series data from a Prometheus server running at http://localhost:9090.
- data is fetched every 5 seconds.
- use the cpushares strategy to rank tasks.
- filter out metrics where
container_label_task_id!=""
. - filter out metrics where
container_label_task_host!=""
. - use
container_label_task_id
as the dedicated label to help retrieve the task identifier. - use
container_label_task_host
as the dedicated label to help retrieve the hostname on which the task is running. - use
dummyTaskRanksReceiver
as the receiver of ranked tasks.
type dummyTaskRanksReceiver struct{}
func (r *dummyTaskRanksReceiver) Receive(rankedTasks entities.RankedTasks) {
log.Println(rankedTasks)
}
prometheusDataFetcher, err = prometheus.NewDataFetcher(
prometheus.WithPrometheusEndpoint("http://localhost:9090"))
tRanker, err = New(
WithDataFetcher(prometheusDataFetcher),
WithSchedule("?/5 * * * * *"),
WithStrategy("cpushares", []*query.LabelMatcher{
{Type: query.TaskID, Label: "container_label_task_id", Operator: query.NotEqual, Value: ""},
{Type: query.TaskHostname, Label: "container_label_task_host", Operator: query.Equal, Value: "localhost"},
}, new(dummyTaskRanksReceiver), 1*time.Second))
The task ranker schedule (in seconds) SHOULD be a positive multiple of the prometheus scrape interval. This simplifies the calculation of the time difference between data points fetched from successive query executions.
You can now also configure the strategies using initialization options. This also allows for configuring the time duration of range queries, enabling fine-grained control over the number of data points over which the strategy is applied. See below code snippet for strategy configuration using options.
WithStrategyOptions("dummyStrategy",
strategies.WithLabelMatchers([]*query.LabelMatcher{...}
strategies.WithTaskRanksReceiver(new(testTaskRanksReceiver)),
strategies.WithRange(query.Seconds, 5)))
Note: Currently, none of the strategies implemented (cpushares and cpuutil) support range queries.
Dedicated Label Matchers can be used to retrieve the task ID and host information from data retrieved from Prometheus. Strategies can mandate the requirement for one or more dedicated labels.
Currently, the following dedicated label matchers are supported.
- TaskID - This is used to flag a label as one that can be used to fetch the unique identifier of a task.
- TaskHostname - This is used to flag a label as one that can be used to fetch the name of the host on which the task is running.
Strategies can demand that one or more dedicated labels be provided. For instance, if a strategy ranks all tasks running on the cluster, then it can mandate only TaskID dedicated label. On the other hand if a strategy ranks colocated tasks, then it can mandate both TaskID and TaskHostname dedicated labels.
Dedicated label matchers will need to be provided when using strategies that demand them.
The below code snippet shows how a dedicated label can be provided when configuring the Task Ranker.
WithStrategy("strategy-name", []*query.LabelMatcher{
{Type: query.TaskID, Label: "taskid_label", Operator: query.NotEqual, Value: ""},
... // Other label matchers.
})
Once the Task Ranker has been configured, then you can start it by calling tRanker.Start()
.
Call tRanker.Stop()
to stop the task ranker.
Run ./create_test_env
to,
- bring up a docker-compose installation running Prometheus and cAdvisor.
- run tasks in docker containers.
Each container is allocated different cpu-shares. For more information on running Prometheus and cAdvisor locally see here.
Once you have Prometheus and cAdvisor running (test by running curl http://localhost:9090/metrics
or use the browser),
Now run the below command to run tests.
go test -v ./...
The task ranking results are displayed on the console. Below is what it will look like.
HOST = localhost
========================================================================
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = 0
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = 1
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = 2
...
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = n
========================================================================
Once finished testing, tear down the test environment by running ./tear_down_test_env
.
Task Ranker uses logrus for logging. To prevent logs from Task Ranker mixing in with logs from the application that is using it, console logging is disabled. There are two types of logs as mentioned below.
- Task Ranker logs - These logs are Task Ranker specific and correspond to functioning of the library. These logs are written to a file named task_ranker_logs_<timestamp>.log.
- Task Ranking Results logs - These are the results of task ranking using one of task ranking strategies. These logs are written to a file named task_ranking_results_<timestamp>.log. To simplify parsing these logs are written in JSON format.
By default, all topics are enabled for logging.
We can now also disable topics to be logged. To do this, set an environment variable named TASK_RANKER_LOGS_DISABLE_TOPICS
as shown below.
export TASK_RANKER_LOGS_DISABLE_TOPICS=topic1,topic2,...
The list of topics available can be viewed here.
Follow instructions here to setup Prometheus and cAdvisor on bare-metal.