-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rapids] removed spark tests, updated to a more recent rapids release #1219
Merged
Merged
Changes from all commits
Commits
Show all changes
96 commits
Select commit
Hold shift + click to select a range
260c707
[gpu] clean-up of sources.list and keyring file assertion
cjac c248aaf
merge from master
cjac f942492
merged from custom-images/examples/secure-boot/install_gpu_driver.sh
cjac d6a86cb
added comments for difficut to understand functions
cjac 3e8007e
tested with 24.06 ; using conda for cuda 12
cjac c385546
tested with 24.06 ; using conda for cuda 12
cjac 4bf628a
removed os check functions and the use of them
cjac e370b80
capturing runtime of mamba install
cjac cecf837
retry failed mamba with conda
cjac 6f91fb1
increase machine type ; reduce disk size ; test 11.8 (12.4 is default)
cjac 48205a8
spark does not yet have 24.08.0
cjac d085df2
tested with 2.1 and 2.2
cjac aae3c86
always create environment ; run test scripts with python from envs/da…
cjac eb95860
skipping dask with yarn runtime tests for now
cjac 9a4d536
added copyright block
cjac 97dd7ad
temporary changes to improve test performance
cjac 86c7671
increasing machine type, attempting 2024.06 again now that I have fix…
cjac 151597f
refactored code a bit
cjac a1ab571
how did this get in this change?
cjac 62262db
we are seeing an error in this config file ; investigate
cjac 77f9fa0
temporary changes to improve test performance
cjac 8ccbc27
Adding disable shielded boot flag and disk type ssd flag to enhance t…
prince-cs 25f0d96
tested on debian11 w/ cuda11
cjac c6991e8
added skein tests for dask-yarn
cjac 52f5fec
accidentally using the wrong bigtable.sh in this PR ; checking out ma…
cjac aad851a
using correct conda env for dask-yarn environment
cjac e20aa9a
added skein test for dask
cjac fd9449b
that was the wrong filename
cjac c69d951
perform the skein tests before skipping the dask ones
cjac 5b23ddb
whitespace changes
cjac 536aef9
removing the excessive logging
cjac b476bae
taking master hostname from argv ; added array test
cjac f7aed92
defining two separate services to ease debugging
cjac c9d41f4
dask service tests are passing
cjac b6273c8
refactored yarn tests to its own py file ; updated rapids.sh to separ…
cjac 8d18024
tested with debian and rocky
cjac f88df7b
added skein test
cjac d71470f
reduced operations slightly when setting master hostname
cjac aa68bc8
python operators. amirite?
cjac facb14b
status fails ; list-units | grep works
cjac 8559fdd
explicitly including cudf
cjac c3ea723
corrected variable name
cjac 6a14ff1
working with cuda12 + yarn as dask runtime
cjac 8e93293
removed pinning for numba as per jakirkham
cjac 1b82dc1
easing the version constraints some
cjac 7d65472
fully changing the variable name
cjac 7cdf483
removing test_skein.py
cjac ca74b49
removed extra lines from rebase
cjac 2e7979f
reducing line count
cjac de965fa
relaxed cuda version to 11.8
cjac d01e349
disabling rocky9 tests for now
cjac 6aa28a3
skipping the whole test on rocky9 for now
cjac 467ce89
trying 24.08
cjac 33b8d5e
increase max cluster age for rocky9 ; using CUDA_VERSION=11.8 for non…
cjac 2c1c6a0
increase timeout for init actions as well as max-age from previous co…
cjac f4b6dda
reverted attempt to change a r/o variable
cjac d72bb06
trying with 24.08
cjac e22cb45
removing spark from the rapids tests
cjac 973c81b
2.2.20 is known to work
cjac 9963dfb
using new fangled key management path
cjac 5bbb8fc
explicitly specifying path to curl ; also installing curl
cjac ee13c9a
perform update before install
cjac c28bb4b
modified to run as a custom-images script
cjac 531a472
remove delta from master for gpu/
cjac 062f087
recently tested to have worked with n1-standard-4 and 54GB
cjac 050f8c4
reduce log noise from Dockerfile
cjac aa4afb9
removing delta from dask on master
cjac c75d120
update verify_dask_instance test to use systemd unit defined in dask …
cjac 85ac0ac
removing quotes from systemctl command
cjac 3314334
protecting from empty string state
cjac c158a55
replacing removed dask-runtime=yarn instance test
cjac 3eda60d
[dask-rapids] merge from custom-images
cjac dbfa4c0
revert to master
cjac 1c9c7fe
refactored to match dask ; removed all spark code paths (see spark-ra…
cjac 1c7a31d
added some testing helpers and documentation
cjac caf9307
dask-yarn tests do not work ; disabling until new release of dask-yar…
cjac 7fdda0c
increase max idle time ; print the command to be run
cjac dd12f02
cleaned up comment positioning and content
cjac 5cd3951
using ram disk for temp files if we have it
cjac 3519fe0
double quotes will allow temp directory variable to be expanded corre…
cjac 12e253d
using else instead of is_rocky
cjac e8a44fe
corrected release version names
cjac caab9be
revert to mainline
cjac 6d900bf
simplify and modernize this comment
cjac 13cb723
default to using internal IP ; have not yet renamed rapids to dask-ra…
cjac aec628d
prepare layout for rename of rapids to dask-rapids
cjac 8c67d21
reduce noise from docker run
cjac a31f10c
reduce noise in docker build
cjac a6fa424
removing older GPU from list
cjac e5b6e3f
removing delta from master
cjac f0f906a
Merge branch 'GoogleCloudDataproc:master' into rapids-20240806
cjac 5b93e3a
Thread.yield()
cjac 38ba6e3
improved documentation
cjac 91907ae
default to non-private ip ; maybe that is why this last run failed
cjac 6d8c32b
revert dataproc_test_case.py to last known good
cjac 7c8ce57
using correct df command ; using greater or equal to rapids version ;…
cjac File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# This Dockerfile builds the container from which rapids tests are run | ||
# This process needs to be executed manually from a git clone | ||
# | ||
# See manual-test-runner.sh for instructions | ||
|
||
FROM gcr.io/cloud-builders/gcloud | ||
|
||
RUN useradd -m -d /home/ia-tests -s /bin/bash ia-tests | ||
|
||
RUN apt-get -qq update \ | ||
&& apt-get -y -qq install \ | ||
apt-transport-https apt-utils \ | ||
ca-certificates libmime-base64-perl gnupg \ | ||
curl jq less screen > /dev/null 2>&1 && apt-get clean | ||
|
||
# Install bazel signing key, repo and package | ||
ENV bazel_kr_path=/usr/share/keyrings/bazel-release.pub.gpg | ||
ENV bazel_repo_data="http://storage.googleapis.com/bazel-apt stable jdk1.8" | ||
|
||
RUN /usr/bin/curl -s https://bazel.build/bazel-release.pub.gpg \ | ||
| gpg --dearmor -o "${bazel_kr_path}" \ | ||
&& echo "deb [arch=amd64 signed-by=${bazel_kr_path}] ${bazel_repo_data}" \ | ||
| dd of=/etc/apt/sources.list.d/bazel.list status=none \ | ||
&& apt-get update -qq | ||
|
||
RUN apt-get autoremove -y -qq && \ | ||
apt-get install -y -qq default-jdk python3-setuptools bazel > /dev/null 2>&1 && \ | ||
apt-get clean | ||
|
||
|
||
# Install here any utilities you find useful when troubleshooting | ||
RUN apt-get -y -qq install emacs-nox vim uuid-runtime > /dev/null 2>&1 && apt-get clean | ||
|
||
WORKDIR /init-actions | ||
|
||
USER ia-tests | ||
COPY --chown=ia-tests:ia-tests . ${WORKDIR} | ||
|
||
ENTRYPOINT ["/bin/bash"] | ||
#CMD ["/bin/bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these files committed by mistake? |
||
# For debugging, uncomment the following line | ||
# | ||
|
||
# screen -L -t monitor 0 /bin/bash | ||
|
||
screen -L -t 2.0-debian10 1 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.0-debian10 ; exec /bin/bash' | ||
#screen -L -t 2.0-rocky8 2 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.0-rocky8 ; exec /bin/bash' | ||
#screen -L -t 2.0-ubuntu18 3 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.0-ubuntu18 ; exec /bin/bash' | ||
|
||
#screen -L -t 2.1-debian11 4 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.1-debian11 ; exec /bin/bash' | ||
#screen -L -t 2.1-rocky8 5 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.1-rocky8 ; exec /bin/bash' | ||
#screen -L -t 2.1-ubuntu20 6 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.1-ubuntu20 ; exec /bin/bash' | ||
|
||
#screen -L -t 2.2-debian12 7 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.2-debian12 ; exec /bin/bash' | ||
#screen -L -t 2.2-rocky9 8 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.2-rocky9 ; exec /bin/bash' | ||
#screen -L -t 2.2-ubuntu22 9 sh -c '/bin/bash -x rapids/run-bazel-tests.sh 2.2-ubuntu22 ; exec /bin/bash' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"PROJECT_ID":"example-yyyy-nn", | ||
"PURPOSE":"cuda-pre-init", | ||
"BUCKET":"my-bucket-name", | ||
"IMAGE_VERSION":"2.2-debian12", | ||
"ZONE":"us-west4-ñ" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
#!/bin/bash | ||
|
||
# This script sets up the gcloud environment and launches tests in a screen session | ||
# | ||
# To run the script, the following will bootstrap | ||
# | ||
# git clone [email protected]:GoogleCloudDataproc/initialization-actions | ||
# git checkout rapids-20240806 | ||
# cd initialization-actions | ||
# cp rapids/env.json.sample env.json | ||
# vi env.json | ||
# docker build -f rapids/Dockerfile -t rapids-init-actions-runner:latest . | ||
# time docker run -it rapids-init-actions-runner:latest rapids/manual-test-runner.sh | ||
# | ||
# The bazel run(s) happen in separate screen windows. | ||
# To see a list of screen windows, press ^a " | ||
# Num Name | ||
# | ||
# 0 monitor | ||
# 1 2.0-debian10 | ||
# 2 sh | ||
|
||
|
||
readonly timestamp="$(date +%F-%H-%M)" | ||
export BUILD_ID="$(uuidgen)" | ||
|
||
tmp_dir="/tmp/${BUILD_ID}" | ||
log_dir="${tmp_dir}/logs" | ||
mkdir -p "${log_dir}" | ||
|
||
IMAGE_VERSION="$1" | ||
if [[ -z "${IMAGE_VERSION}" ]] ; then | ||
IMAGE_VERSION="$(jq -r .IMAGE_VERSION env.json)" ; fi ; export IMAGE_VERSION | ||
export PROJECT_ID="$(jq -r .PROJECT_ID env.json)" | ||
export REGION="$(jq -r .REGION env.json)" | ||
export BUCKET="$(jq -r .BUCKET env.json)" | ||
|
||
gcs_log_dir="gs://${BUCKET}/${BUILD_ID}/logs" | ||
|
||
function exit_handler() { | ||
RED='\\e[0;31m' | ||
GREEN='\\e[0;32m' | ||
NC='\\e[0m' | ||
echo 'Cleaning up before exiting.' | ||
|
||
# TODO: list clusters which match our BUILD_ID and clean them up | ||
# TODO: remove any test related resources in the project | ||
|
||
echo 'Uploading local logs to GCS bucket.' | ||
gsutil -m rsync -r "${log_dir}/" "${gcs_log_dir}/" | ||
|
||
if [[ -f "${tmp_dir}/tests_success" ]]; then | ||
echo -e "${GREEN}Workflow succeeded, check logs at ${log_dir}/ or ${gcs_log_dir}/${NC}" | ||
exit 0 | ||
else | ||
echo -e "${RED}Workflow failed, check logs at ${log_dir}/ or ${gcs_log_dir}/${NC}" | ||
exit 1 | ||
fi | ||
} | ||
|
||
trap exit_handler EXIT | ||
|
||
# screen session name | ||
session_name="manual-rapids-tests" | ||
|
||
gcloud config set project ${PROJECT_ID} | ||
gcloud config set dataproc/region ${REGION} | ||
gcloud auth login | ||
gcloud config set compute/region ${REGION} | ||
|
||
export INTERNAL_IP_SSH="true" | ||
|
||
# Run tests in screen session so we can monitor the container in another window | ||
screen -US "${session_name}" -c rapids/bazel.screenrc | ||
|
||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL