Skip to content

Commit

Permalink
Fix #2
Browse files Browse the repository at this point in the history
  • Loading branch information
wlandau-lilly committed Dec 8, 2023
1 parent 481eb48 commit 93f4db8
Show file tree
Hide file tree
Showing 4 changed files with 222 additions and 64 deletions.
3 changes: 2 additions & 1 deletion R/crew_aws_batch_monitor.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
#' @param job_queue Character of length 1, name of the AWS Batch
#' job queue.
#' @param job_definition Character of length 1, name of the AWS Batch
#' job definition.
#' job definition. The job definition might or might not exist
#' at the time `crew_aws_batch_monitor()` is called. Either way is fine.
#' @param log_group Character of length 1,
#' AWS Batch CloudWatch log group to get job logs.
#' The default log group is often "/aws/batch/job", but not always.
Expand Down
107 changes: 87 additions & 20 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -93,46 +93,113 @@ str(groups$SecurityGroups[[1L]])
#> $ VpcId : chr "vpc-00000"
```

# Job definition management
# Job management

You will most likely need to create custom job definitions for your use case. Typically this involves choosing a container image in [AWS Elastic Container Registry (ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/getting-started-cli.html) and specifying the resource requirements of jobs. AWS has documentation for The `paws.compute` package makes it straightforward to manage job definitions. Please see the AWS Batch functions at <https://www.paws-r-sdk.com/docs/reference_index/> to register, describe, and deregister job definitions. To see how ECR image URLs work, visit <https://docs.aws.amazon.com/AmazonECR/latest/userguide/getting-started-cli.html>.
With `crew.aws.batch`, your `crew` controller automatically submits jobs to AWS Batch. These jobs may fail or linger for any number of reasons, which could impede work and increase costs. So before you use `crew_controller_aws_batch()`, please learn how to monitor and terminate AWS Batch jobs manually.

To ["register" (create or overwrite) a job definition](https://www.paws-r-sdk.com/docs/batch_register_job_definition/), use the R code below and replace the values with the ones you want.
`crew.aws.batch` defines a "monitor" class to help you take control of jobs and job definitions. Create a monitor object with `crew_aws_batch_monitor()`. You will need to supply a job definition name and a job queue name.

```{r}
client <- paws.compute::batch()
client$register_job_definition(
jobDefinitionName = "JOB_DEFINITION_NAME",
type = "container",
containerProperties = list(
image = "AWS_ACCOUNT_ID.dkr.ecr.AWS_REGION.amazonaws.com/ECR_REPOSITORY_NAME:IMAGE_TAG",
vcpus = 2,
memory = 4000
)
monitor <- crew_aws_batch_monitor(
job_definition = "YOUR_JOB_DEFINITION_NAME",
job_queue = "YOUR_JOB_QUEUE_NAME"
)
```

To collect information about existing job definitions, you can either ask for all job definitions,
The job definition may or may not exist at this point. If it does not exist, you can register with `register()`, an oversimplified limited-scope method which creates container-based job definitions with the `"awslogs"` log driver (for CloudWatch).^[The log group supplied to `crew_aws_batch_monitor()` must be valid. The default is `"/aws/batch/log"`, which may not exist if your system administrator has a custom logging policy.] Below, your container image can be as simple as a Docker Hub identifier (like `"alpine:latest:`) or a full URI of an ECR image.^[For the `crew` controller, you will definitely want an image with R and `crew` installed. For the purposes of testing the monitor, `"alpine:latest"` will work.]

```{r}
client$describe_job_definitions()
monitor$register(
image = "AWS_ACCOUNT_ID.dkr.ecr.AWS_REGION.amazonaws.com/ECR_REPOSITORY_NAME:IMAGE_TAG",
platform_capabilities = "EC2",
memory_units = "gigabytes",
memory = 8,
cpus = 2
)
```

You can submit individual AWS Batch jobs to test your computing environment.

```{r}
job1 <- monitor$submit(name = "job1", command = c("echo", "hello\nworld"))
job2 <- monitor$submit(name = "job2", command = c("echo", "job\nsubmitted"))
job2
#> # A tibble: 1 × 3
#> name id arn
#> <chr> <chr> <chr>
#> 1 job2 c38d55ad-4a86-4371-9994-6ea8882f5726 arn:aws:batch:us-east-2:0…
```

Method `status()` checks the status of an individual job.

```{r}
monitor$status(id = job2$id)
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job2 c38d55ad-4a86-43… arn:… runnable NA 1.70e12 NA NA
```

The `jobs()` method gets the status of all the jobs within the job queue and job definition you originally supplied to `crew_aws_batch_monitor()`. This may include many more jobs than the ones you submitted during the life cycle of the current `monitor` object.

```{r}
monitor$jobs()
#> # A tibble: 2 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job1 653df636-ac74-43… arn:… succeeded Essen… 1.70e12 1.70e12 1.70e12
#> 2 job2 c38d55ad-4a86-43… arn:… runnable NA 1.70e12 NA NA
```

The [job state](https://docs.aws.amazon.com/batch/latest/userguide/job_states.html) can be `"submitted"`, `"pending"`, `"runnable"`, `"starting"`, `"running"`, `"succeeded"`, or `"failed"`. The monitor has a method for each job state to get only the jobs with that state.

```{r}
monitor$succeeded()
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job1 653df636-ac74-43… arn:… succeeded NA 1.70e12 1.70e12 1.70e12
```

In addition, there is an `active()` method for just states `"submitted"`, `"pending"`, `"runnable"`, `"starting"`, and `"running"`, and there is an `inactive()` method for just the `"succeeded"` and `"failed"` states.

```{r}
monitor$inactive()
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job1 653df636-ac74-43… arn:… succeeded NA 1.70e12 1.70e12 1.70e12
```

To terminate a job, use the `terminate()` method. This has the effect of both canceling and terminating the job, although you may not see the change right away if the job is currently `"runnable"`. Manually terminated jobs are listed as failed.

```{r}
monitor$terminate(id = job2$id)
```

or just the a specific version of a specific job definition.
To get the CloudWatch logs of a job, use the `log()` method. This method returns a `tibble` with the log messages and numeric timestamps.

```{r}
client$describe_job_definitions("JOB_DEFINITION_NAME:1")
log <- monitor$log(id = job1$id)
log
#> # A tibble: 2 × 3
#> message timestamp ingestion_time
#> <chr> <dbl> <dbl>
#> 1 hello 1702068378163 1702068378245
#> 2 world 1702068378163 1702068378245
```

To delete a job definition, specify the name and version of the job definition (or the full [Amazon Resource Name](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html), or ARN).
If the log messages are too long to conveniently view in the `tibble`, you can print them to your screen with `cat()` or `writeLines()`.

```{r}
client$deregister_job_definitions("JOB_DEFINITION_NAME:1")
writeLines(log$message)
#> hello
#> world
```

# Package usage
# Using `crew` with AWS Batch workers

To start using `crew.aws.batch`, first create a controller object. Also supply the names of your job queue and job definition, as well as any optional flags and settings you may need.
To start using `crew.aws.batch` in earnest, first create a controller object. Also supply the names of your job queue and job definition, as well as any optional flags and settings you may need. If you do not already have a job definition, the "monitor" object above can help you create one (see above).

```{r}
library(crew.aws.batch)
Expand Down
173 changes: 131 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,64 +110,146 @@ str(groups$SecurityGroups[[1L]])
#> $ VpcId : chr "vpc-00000"
```

# Job definition management

You will most likely need to create custom job definitions for your use
case. Typically this involves choosing a container image in [AWS Elastic
Container Registry
(ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/getting-started-cli.html)
and specifying the resource requirements of jobs. AWS has documentation
for The `paws.compute` package makes it straightforward to manage job
definitions. Please see the AWS Batch functions at
<https://www.paws-r-sdk.com/docs/reference_index/> to register,
describe, and deregister job definitions. To see how ECR image URLs
work, visit
<https://docs.aws.amazon.com/AmazonECR/latest/userguide/getting-started-cli.html>.

To [“register” (create or overwrite) a job
definition](https://www.paws-r-sdk.com/docs/batch_register_job_definition/),
use the R code below and replace the values with the ones you want.
# Job management

With `crew.aws.batch`, your `crew` controller automatically submits jobs
to AWS Batch. These jobs may fail or linger for any number of reasons,
which could impede work and increase costs. So before you use
`crew_controller_aws_batch()`, please learn how to monitor and terminate
AWS Batch jobs manually.

`crew.aws.batch` defines a “monitor” class to help you take control of
jobs and job definitions. Create a monitor object with
`crew_aws_batch_monitor()`. You will need to supply a job definition
name and a job queue name.

``` r
client <- paws.compute::batch()
client$register_job_definition(
jobDefinitionName = "JOB_DEFINITION_NAME",
type = "container",
containerProperties = list(
image = "AWS_ACCOUNT_ID.dkr.ecr.AWS_REGION.amazonaws.com/ECR_REPOSITORY_NAME:IMAGE_TAG",
vcpus = 2,
memory = 4000
)
monitor <- crew_aws_batch_monitor(
job_definition = "YOUR_JOB_DEFINITION_NAME",
job_queue = "YOUR_JOB_QUEUE_NAME"
)
```

To collect information about existing job definitions, you can either
ask for all job definitions,
The job definition may or may not exist at this point. If it does not
exist, you can register with `register()`, an oversimplified
limited-scope method which creates container-based job definitions with
the `"awslogs"` log driver (for CloudWatch).[^3] Below, your container
image can be as simple as a Docker Hub identifier (like
`"alpine:latest:`) or a full URI of an ECR image.[^4]

``` r
client$describe_job_definitions()
monitor$register(
image = "AWS_ACCOUNT_ID.dkr.ecr.AWS_REGION.amazonaws.com/ECR_REPOSITORY_NAME:IMAGE_TAG",
platform_capabilities = "EC2",
memory_units = "gigabytes",
memory = 8,
cpus = 2
)
```

or just the a specific version of a specific job definition.
You can submit individual AWS Batch jobs to test your computing
environment.

``` r
client$describe_job_definitions("JOB_DEFINITION_NAME:1")
job1 <- monitor$submit(name = "job1", command = c("echo", "hello\nworld"))
job2 <- monitor$submit(name = "job2", command = c("echo", "job\nsubmitted"))
job2
#> # A tibble: 1 × 3
#> name id arn
#> <chr> <chr> <chr>
#> 1 job2 c38d55ad-4a86-4371-9994-6ea8882f5726 arn:aws:batch:us-east-2:0…
```

To delete a job definition, specify the name and version of the job
definition (or the full [Amazon Resource
Name](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html),
or ARN).
Method `status()` checks the status of an individual job.

``` r
client$deregister_job_definitions("JOB_DEFINITION_NAME:1")
monitor$status(id = job2$id)
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job2 c38d55ad-4a86-43… arn:… runnable NA 1.70e12 NA NA
```

# Package usage
The `jobs()` method gets the status of all the jobs within the job queue
and job definition you originally supplied to
`crew_aws_batch_monitor()`. This may include many more jobs than the
ones you submitted during the life cycle of the current `monitor`
object.

``` r
monitor$jobs()
#> # A tibble: 2 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job1 653df636-ac74-43… arn:… succeeded Essen… 1.70e12 1.70e12 1.70e12
#> 2 job2 c38d55ad-4a86-43… arn:… runnable NA 1.70e12 NA NA
```

To start using `crew.aws.batch`, first create a controller object. Also
supply the names of your job queue and job definition, as well as any
optional flags and settings you may need.
The [job
state](https://docs.aws.amazon.com/batch/latest/userguide/job_states.html)
can be `"submitted"`, `"pending"`, `"runnable"`, `"starting"`,
`"running"`, `"succeeded"`, or `"failed"`. The monitor has a method for
each job state to get only the jobs with that state.

``` r
monitor$succeeded()
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job1 653df636-ac74-43… arn:… succeeded NA 1.70e12 1.70e12 1.70e12
```

In addition, there is an `active()` method for just states
`"submitted"`, `"pending"`, `"runnable"`, `"starting"`, and `"running"`,
and there is an `inactive()` method for just the `"succeeded"` and
`"failed"` states.

``` r
monitor$inactive()
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 job1 653df636-ac74-43… arn:… succeeded NA 1.70e12 1.70e12 1.70e12
```

To terminate a job, use the `terminate()` method. This has the effect of
both canceling and terminating the job, although you may not see the
change right away if the job is currently `"runnable"`. Manually
terminated jobs are listed as failed.

``` r
monitor$terminate(id = job2$id)
```

To get the CloudWatch logs of a job, use the `log()` method. This method
returns a `tibble` with the log messages and numeric timestamps.

``` r
log <- monitor$log(id = job1$id)
log
#> # A tibble: 2 × 3
#> message timestamp ingestion_time
#> <chr> <dbl> <dbl>
#> 1 hello 1702068378163 1702068378245
#> 2 world 1702068378163 1702068378245
```

If the log messages are too long to conveniently view in the `tibble`,
you can print them to your screen with `cat()` or `writeLines()`.

``` r
writeLines(log$message)
#> hello
#> world
```

# Using `crew` with AWS Batch workers

To start using `crew.aws.batch` in earnest, first create a controller
object. Also supply the names of your job queue and job definition, as
well as any optional flags and settings you may need. If you do not
already have a job definition, the “monitor” object above can help you
create one (see above).

``` r
library(crew.aws.batch)
Expand Down Expand Up @@ -255,7 +337,7 @@ By contributing to this project, you agree to abide by its terms.
citation("crew.aws.batch")
To cite package 'crew.aws.batch' in publications use:

Landau WM (2023). _crew.aws.batch: A Crew Launcher Plugin for AWS
Landau WM (????). _crew.aws.batch: A Crew Launcher Plugin for AWS
Batch_. R package version 0.0.0.9001,
https://github.com/wlandau/crew.aws.batch,
<https://wlandau.github.io/crew.aws.batch/>.
Expand All @@ -265,7 +347,6 @@ A BibTeX entry for LaTeX users is
@Manual{,
title = {crew.aws.batch: A Crew Launcher Plugin for AWS Batch},
author = {William Michael Landau},
year = {2023},
note = {R package version 0.0.0.9001,
https://github.com/wlandau/crew.aws.batch},
url = {https://wlandau.github.io/crew.aws.batch/},
Expand All @@ -281,3 +362,11 @@ https://github.com/wlandau/crew.aws.batch},
TLS encryption turned on (default:
`tls = crew_tls(mode = "automatic")`). Please understand and comply
with all the security policies of your organization.

[^3]: The log group supplied to `crew_aws_batch_monitor()` must be
valid. The default is `"/aws/batch/log"`, which may not exist if
your system administrator has a custom logging policy.

[^4]: For the `crew` controller, you will definitely want an image with
R and `crew` installed. For the purposes of testing the monitor,
`"alpine:latest"` will work.
3 changes: 2 additions & 1 deletion man/crew_aws_batch_monitor.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 93f4db8

Please sign in to comment.