Skip to content

pradykaushik/task-ranker

Repository files navigation

Task Ranker

GoDoc Build

Rank tasks running as docker containers in a cluster.

Task Ranker runs as a cron job on a specified schedule. Each time the task ranker is run, it fetches data from Prometheus, filters the data as required and then submits it to a task ranking strategy. The task ranking strategy uses the data received to calibrate currently running tasks on the cluster and then rank them accordingly. The results of the strategy are then fed back to the user through callbacks.

You will need to have a working Golang environment running at least 1.12 and a Linux environment.

How To Use?

Run the below command to download and install Task Ranker.

go get github.com/pradykaushik/task-ranker

Environment

Task Ranker can be used in environments where,

  • Prometheus is used to collect container specific metrics from hosts on the cluster that are running docker containers.
  • cAdvisor, a docker native metrics exporter is run on the hosts to export resource isolation and usage information of running containers.

See cAdvisor docs for more information on how to monitor cAdvisor with Prometheus.

Container Label Prefixes

CAdvisor prefixes all container labels with container_label_. Given that the Task Ranker only talks to Prometheus, the labels provided should also include these prefixes. For example, let us say that we launch a task in a docker container using the command below.

docker run --label task_id="1234" -t repository/name:version

CAdvisor would then export container_label_task_id as the container label.

Configuration

Task Ranker configuration requires two components to be configured and provided.

  1. DataFetcher - Responsible for fetching data from Prometheus, filtering it using the provided labels and submitting it to the chosen strategy.
  2. Ranking Strategy - Uses the data to calibrate currently running tasks and then rank them accordingly.
    • Labels: Used for filtering the time series data using the specified label matching operation.
    • Receiver of the task ranking results.

Task Ranker is configured as shown below. The below code snippet shows how Task Ranker can be configured to,

  • fetch time series data from a Prometheus server running at http://localhost:9090.
  • data is fetched every 5 seconds.
  • use the cpushares strategy to rank tasks.
  • filter out metrics where container_label_task_id!="".
  • filter out metrics where container_label_task_host!="".
  • use container_label_task_id as the dedicated label to help retrieve the task identifier.
  • use container_label_task_host as the dedicated label to help retrieve the hostname on which the task is running.
  • use dummyTaskRanksReceiver as the receiver of ranked tasks.
type dummyTaskRanksReceiver struct{}

func (r *dummyTaskRanksReceiver) Receive(rankedTasks entities.RankedTasks) {
	log.Println(rankedTasks)
}

prometheusDataFetcher, err = prometheus.NewDataFetcher(
    prometheus.WithPrometheusEndpoint("http://localhost:9090"))

tRanker, err = New(
    WithDataFetcher(prometheusDataFetcher),
    WithSchedule("?/5 * * * * *"),
    WithStrategy("cpushares", []*query.LabelMatcher{
        {Type: query.TaskID, Label: "container_label_task_id", Operator: query.NotEqual, Value: ""},
        {Type: query.TaskHostname, Label: "container_label_task_host", Operator: query.Equal, Value: "localhost"},
    }, new(dummyTaskRanksReceiver), 1*time.Second))

The task ranker schedule (in seconds) SHOULD be a positive multiple of the prometheus scrape interval. This simplifies the calculation of the time difference between data points fetched from successive query executions.

You can now also configure the strategies using initialization options. This also allows for configuring the time duration of range queries, enabling fine-grained control over the number of data points over which the strategy is applied. See below code snippet for strategy configuration using options.

WithStrategyOptions("dummyStrategy",
    strategies.WithLabelMatchers([]*query.LabelMatcher{...}
    strategies.WithTaskRanksReceiver(new(testTaskRanksReceiver)),
    strategies.WithRange(query.Seconds, 5)))

Note: Currently, none of the strategies implemented (cpushares and cpuutil) support range queries.

Dedicated Label Matchers

Dedicated Label Matchers can be used to retrieve the task ID and host information from data retrieved from Prometheus. Strategies can mandate the requirement for one or more dedicated labels.

Currently, the following dedicated label matchers are supported.

  1. TaskID - This is used to flag a label as one that can be used to fetch the unique identifier of a task.
  2. TaskHostname - This is used to flag a label as one that can be used to fetch the name of the host on which the task is running.

Strategies can demand that one or more dedicated labels be provided. For instance, if a strategy ranks all tasks running on the cluster, then it can mandate only TaskID dedicated label. On the other hand if a strategy ranks colocated tasks, then it can mandate both TaskID and TaskHostname dedicated labels.

Dedicated label matchers will need to be provided when using strategies that demand them.
The below code snippet shows how a dedicated label can be provided when configuring the Task Ranker.

WithStrategy("strategy-name", []*query.LabelMatcher{
    {Type: query.TaskID, Label: "taskid_label", Operator: query.NotEqual, Value: ""},
    ... // Other label matchers.
})

Start the Task Ranker

Once the Task Ranker has been configured, then you can start it by calling tRanker.Start().

Stop the Task Ranker

Call tRanker.Stop() to stop the task ranker.

Test Locally

Setup

Run ./create_test_env to,

  1. bring up a docker-compose installation running Prometheus and cAdvisor.
  2. run tasks in docker containers.

Each container is allocated different cpu-shares. For more information on running Prometheus and cAdvisor locally see here.

Once you have Prometheus and cAdvisor running (test by running curl http://localhost:9090/metrics or use the browser),

Test

Now run the below command to run tests.

go test -v ./...

The task ranking results are displayed on the console. Below is what it will look like.

HOST = localhost
========================================================================
		
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = 0
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = 1
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = 2
...
[TaskID = <task id>,Hostname = localhost,Weight = <weight>,], Rank = n
========================================================================

Tear-Down

Once finished testing, tear down the test environment by running ./tear_down_test_env.

Logs

Task Ranker uses logrus for logging. To prevent logs from Task Ranker mixing in with logs from the application that is using it, console logging is disabled. There are two types of logs as mentioned below.

  1. Task Ranker logs - These logs are Task Ranker specific and correspond to functioning of the library. These logs are written to a file named task_ranker_logs_<timestamp>.log.
  2. Task Ranking Results logs - These are the results of task ranking using one of task ranking strategies. These logs are written to a file named task_ranking_results_<timestamp>.log. To simplify parsing these logs are written in JSON format.

Disable Topics from Task-Ranker Logs

By default, all topics are enabled for logging. We can now also disable topics to be logged. To do this, set an environment variable named TASK_RANKER_LOGS_DISABLE_TOPICS as shown below.

export TASK_RANKER_LOGS_DISABLE_TOPICS=topic1,topic2,...

The list of topics available can be viewed here.

Bare Metal Setup

Follow instructions here to setup Prometheus and cAdvisor on bare-metal.