From 8abc5f2fb3cd57d2578c42ffe77c821accf69289 Mon Sep 17 00:00:00 2001 From: Julien Phalip Date: Fri, 15 Sep 2023 14:50:46 -0700 Subject: [PATCH 1/2] Simplify acceptance tests Use default values for some environment variables to make it easier to run the tests --- README-template.md | 43 +++++++++--------- README.md | 41 ++++++++--------- cloudbuild/presubmit.sh | 2 +- .../hive/bigquery/connector/TestUtils.java | 17 +++++-- .../acceptance/AcceptanceTestConstants.java | 6 +-- .../acceptance/AcceptanceTestUtils.java | 25 +++++++++-- .../DataprocAcceptanceTestBase.java | 44 +++++++++++-------- .../integration/IntegrationTestsBase.java | 13 +----- 8 files changed, 108 insertions(+), 83 deletions(-) diff --git a/README-template.md b/README-template.md index d09dba75..7d1fdaa4 100644 --- a/README-template.md +++ b/README-template.md @@ -40,7 +40,11 @@ To build the connector jar: ### Prerequisite -Make sure you have the BigQuery Storage API enabled in your GCP project. Follow [these instructions](https://cloud.google.com/bigquery/docs/reference/storage/#enabling_the_api). +Enable the BigQuery Storage API for your project: + +```sh +gcloud services enable bigquerystorage.googleapis.com +``` ### Option 1: connectors init action @@ -614,19 +618,19 @@ There are multiple options to override the default behavior and to provide custo for specific users, specific groups, or for all users that run the Hive query by default using the below properties: - - `bq.impersonation.service.account.for.user.` (not set by default) + - `bq.impersonation.service.account.for.user.` (not set by default) - The service account to be impersonated for a specific user. You can specify multiple - properties using that pattern for multiple users. + The service account to be impersonated for a specific user. You can specify multiple + properties using that pattern for multiple users. - - `bq.impersonation.service.account.for.group.` (not set by default) + - `bq.impersonation.service.account.for.group.` (not set by default) - The service account to be impersonated for a specific group. You can specify multiple - properties using that pattern for multiple groups. + The service account to be impersonated for a specific group. You can specify multiple + properties using that pattern for multiple groups. - - `bq.impersonation.service.account` (not set by default) + - `bq.impersonation.service.account` (not set by default) - Default service account to be impersonated for all users. + Default service account to be impersonated for all users. If any of the above properties are set then the service account specified will be impersonated by generating a short-lived credentials when accessing BigQuery. @@ -711,7 +715,7 @@ export PROJECT=my-gcp-project export BIGLAKE_LOCATION=us export BIGLAKE_REGION=us-central1 export BIGLAKE_CONNECTION=hive-integration-tests -export BIGLAKE_BUCKET=${USER}-biglake-test +export BIGLAKE_BUCKET=${PROJECT}-biglake-tests ``` Create the test BigLake connection: @@ -773,19 +777,16 @@ You must use Java version 8, as it's the version that Hive itself uses. Make sur Acceptance tests create Dataproc clusters with the connector and run jobs to verify it. -The following environment variables must be set and **exported** first. - -* `GOOGLE_APPLICATION_CREDENTIALS` - the full path to a credentials JSON, either a service account or the result of a - `gcloud auth login` run -* `GOOGLE_CLOUD_PROJECT` - The Google cloud platform project used to test the connector -* `TEST_BUCKET` - The GCS bucked used to test writing to BigQuery during the integration tests -* `ACCEPTANCE_TEST_BUCKET` - The GCS bucked used to test writing to BigQuery during the acceptance tests - To run the acceptance tests: -```sh -./mvnw verify -Pdataproc21,acceptance -``` +1. Enable the Dataproc API for your project: + ```sh + gcloud services enable dataproc.googleapis.com + ``` +2. Run the tests: + ```sh + ./mvnw verify -Pdataproc21,acceptance + ``` If you want to avoid rebuilding `shaded-dependencies` and `shaded-test-dependencies` when there is no changes in these modules, you can break it down into several steps, and only rerun the necessary steps: diff --git a/README.md b/README.md index c6f56c49..7d1fdaa4 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,11 @@ To build the connector jar: ### Prerequisite -Make sure you have the BigQuery Storage API enabled in your GCP project. Follow [these instructions](https://cloud.google.com/bigquery/docs/reference/storage/#enabling_the_api). +Enable the BigQuery Storage API for your project: + +```sh +gcloud services enable bigquerystorage.googleapis.com +``` ### Option 1: connectors init action @@ -614,19 +618,19 @@ There are multiple options to override the default behavior and to provide custo for specific users, specific groups, or for all users that run the Hive query by default using the below properties: - - `bq.impersonation.service.account.for.user.` (not set by default) + - `bq.impersonation.service.account.for.user.` (not set by default) - The service account to be impersonated for a specific user. You can specify multiple - properties using that pattern for multiple users. + The service account to be impersonated for a specific user. You can specify multiple + properties using that pattern for multiple users. - - `bq.impersonation.service.account.for.group.` (not set by default) + - `bq.impersonation.service.account.for.group.` (not set by default) - The service account to be impersonated for a specific group. You can specify multiple - properties using that pattern for multiple groups. + The service account to be impersonated for a specific group. You can specify multiple + properties using that pattern for multiple groups. - - `bq.impersonation.service.account` (not set by default) + - `bq.impersonation.service.account` (not set by default) - Default service account to be impersonated for all users. + Default service account to be impersonated for all users. If any of the above properties are set then the service account specified will be impersonated by generating a short-lived credentials when accessing BigQuery. @@ -773,19 +777,16 @@ You must use Java version 8, as it's the version that Hive itself uses. Make sur Acceptance tests create Dataproc clusters with the connector and run jobs to verify it. -The following environment variables must be set and **exported** first. - -* `GOOGLE_APPLICATION_CREDENTIALS` - the full path to a credentials JSON, either a service account or the result of a - `gcloud auth login` run -* `GOOGLE_CLOUD_PROJECT` - The Google cloud platform project used to test the connector -* `TEST_BUCKET` - The GCS bucked used to test writing to BigQuery during the integration tests -* `ACCEPTANCE_TEST_BUCKET` - The GCS bucked used to test writing to BigQuery during the acceptance tests - To run the acceptance tests: -```sh -./mvnw verify -Pdataproc21,acceptance -``` +1. Enable the Dataproc API for your project: + ```sh + gcloud services enable dataproc.googleapis.com + ``` +2. Run the tests: + ```sh + ./mvnw verify -Pdataproc21,acceptance + ``` If you want to avoid rebuilding `shaded-dependencies` and `shaded-test-dependencies` when there is no changes in these modules, you can break it down into several steps, and only rerun the necessary steps: diff --git a/cloudbuild/presubmit.sh b/cloudbuild/presubmit.sh index 4090644a..7c54c972 100644 --- a/cloudbuild/presubmit.sh +++ b/cloudbuild/presubmit.sh @@ -27,7 +27,7 @@ readonly ACTION=$1 readonly PROFILES="dataproc21" readonly MVN="./mvnw -B -e -Dmaven.repo.local=/workspace/.repository" -export TEST_BUCKET=dataproc-integ-tests +export INTEGRATION_BUCKET=dataproc-integ-tests export BIGLAKE_BUCKET=dataproc-integ-tests export BIGLAKE_CONNECTION=hive-integration-tests diff --git a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/TestUtils.java b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/TestUtils.java index 9509b57b..f3b836a8 100644 --- a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/TestUtils.java +++ b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/TestUtils.java @@ -46,7 +46,7 @@ public class TestUtils { public static final String MANAGED_TEST_TABLE_NAME = "managed_test"; public static final String FIELD_TIME_PARTITIONED_TABLE_NAME = "field_time_partitioned"; public static final String INGESTION_TIME_PARTITIONED_TABLE_NAME = "ingestion_time_partitioned"; - public static final String TEST_BUCKET_ENV_VAR = "TEST_BUCKET"; + public static final String INTEGRATION_BUCKET_ENV_VAR = "INTEGRATION_BUCKET"; // The BigLake bucket and connection must be created before running the tests. // Also, the connection's service account must be given permission to access the bucket. @@ -211,8 +211,9 @@ public static String getBigLakeBucket() { * Returns the name of the bucket used to store temporary Avro files when testing the indirect * write method. This bucket is created automatically when running the tests. */ - public static String getTestBucket() { - return System.getenv().getOrDefault(TEST_BUCKET_ENV_VAR, getProject() + "-integration-tests"); + public static String getIntegrationTestBucket() { + return System.getenv() + .getOrDefault(INTEGRATION_BUCKET_ENV_VAR, getProject() + "-integration-tests"); } public static void createBqDataset(String dataset) { @@ -269,7 +270,15 @@ private static Storage getStorageClient() { } public static void createBucket(String bucketName) { - getStorageClient().create(BucketInfo.newBuilder(bucketName).setLocation(LOCATION).build()); + try { + getStorageClient().create(BucketInfo.newBuilder(bucketName).setLocation(LOCATION).build()); + } catch (StorageException e) { + if (e.getCode() == 409) { + // The bucket already exists, which is okay. + return; + } + throw e; + } } public static void uploadBlob(String bucketName, String objectName, byte[] contents) { diff --git a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestConstants.java b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestConstants.java index 6b235487..8f602f49 100644 --- a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestConstants.java +++ b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestConstants.java @@ -15,17 +15,13 @@ */ package com.google.cloud.hive.bigquery.connector.acceptance; -import com.google.common.base.Preconditions; import org.apache.parquet.Strings; public class AcceptanceTestConstants { public static final String REGION = "us-west1"; public static final String DATAPROC_ENDPOINT = REGION + "-dataproc.googleapis.com:443"; - public static final String PROJECT_ID = - Preconditions.checkNotNull( - System.getenv("GOOGLE_CLOUD_PROJECT"), - "Please set the 'GOOGLE_CLOUD_PROJECT' environment variable"); + public static final String ACCEPTANCE_BUCKET_ENV_VAR = "ACCEPTANCE_BUCKET"; public static final boolean CLEAN_UP_CLUSTER = Strings.isNullOrEmpty(System.getenv("CLEAN_UP_CLUSTER")) diff --git a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestUtils.java b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestUtils.java index 22b579c7..184a830c 100644 --- a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestUtils.java +++ b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/AcceptanceTestUtils.java @@ -15,6 +15,7 @@ */ package com.google.cloud.hive.bigquery.connector.acceptance; +import com.google.auth.oauth2.GoogleCredentials; import com.google.cloud.WriteChannel; import com.google.cloud.bigquery.BigQuery; import com.google.cloud.bigquery.BigQuery.DatasetDeleteOption; @@ -75,7 +76,6 @@ public String getMarker() { } // must be set in order to run the acceptance test - static final String BUCKET = System.getenv("ACCEPTANCE_TEST_BUCKET"); private static final BigQuery bq = BigQueryOptions.getDefaultInstance().getService(); static Storage storage = @@ -163,8 +163,27 @@ public static BlobId uploadToGcs(ByteBuffer content, String destinationUri, Stri return blobId; } - public static String createTestBaseGcsDir(String testId) { - return String.format("gs://%s/hivebq-tests/%s", BUCKET, testId); + public static String getAcceptanceProject() { + String project = System.getenv("GOOGLE_CLOUD_PROJECT"); + if (project != null) { + return project; + } + try { + return GoogleCredentials.getApplicationDefault().getQuotaProjectId(); + } catch (IOException e) { + throw new RuntimeException(e); + } + } + + public static String getAcceptanceTestBucket() { + return System.getenv() + .getOrDefault( + AcceptanceTestConstants.ACCEPTANCE_BUCKET_ENV_VAR, + AcceptanceTestUtils.getAcceptanceProject() + "-acceptance-tests"); + } + + public static String getTestBaseGcsDir(String testId) { + return String.format("gs://%s/hivebq-tests/%s", getAcceptanceTestBucket(), testId); } public static Blob getBlob(String gcsDirUri, String fileSuffix) throws URISyntaxException { diff --git a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/DataprocAcceptanceTestBase.java b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/DataprocAcceptanceTestBase.java index 266d6831..28a4d7c6 100644 --- a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/DataprocAcceptanceTestBase.java +++ b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/acceptance/DataprocAcceptanceTestBase.java @@ -23,7 +23,6 @@ import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestConstants.CONNECTOR_JAR_DIRECTORY; import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestConstants.CONNECTOR_JAR_PREFIX; import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestConstants.DATAPROC_ENDPOINT; -import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestConstants.PROJECT_ID; import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestConstants.REGION; import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestUtils.createBqDataset; import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestUtils.deleteBqDatasetAndTables; @@ -34,8 +33,8 @@ import static com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestUtils.uploadConnectorJar; import static com.google.common.truth.Truth.assertThat; +import com.google.cloud.hive.bigquery.connector.TestUtils; import com.google.cloud.hive.bigquery.connector.acceptance.AcceptanceTestUtils.ClusterProperty; -import com.google.common.collect.ImmutableList; import com.google.common.collect.ImmutableMap; import java.time.Duration; import java.util.Collections; @@ -65,8 +64,6 @@ public class DataprocAcceptanceTestBase { protected static final ClusterProperty DISABLE_CONSCRYPT = ClusterProperty.of("dataproc:dataproc.conscrypt.provider.enable", "false", "nc"); - protected static final ImmutableList DISABLE_CONSCRYPT_LIST = - ImmutableList.builder().add(DISABLE_CONSCRYPT).build(); private AcceptanceTestContext context; @@ -82,16 +79,19 @@ protected static AcceptanceTestContext setup( String dataprocImageVersion, List clusterProperties) throws Exception { String testId = generateTestId(dataprocImageVersion, clusterProperties); String clusterName = generateClusterName(testId); - String testBaseGcsDir = AcceptanceTestUtils.createTestBaseGcsDir(testId); + String testBaseGcsDir = AcceptanceTestUtils.getTestBaseGcsDir(testId); String connectorJarUri = testBaseGcsDir + "/connector.jar"; String connectorInitActionUri = testBaseGcsDir + "/connectors.sh"; Map properties = clusterProperties.stream() .collect(Collectors.toMap(ClusterProperty::getKey, ClusterProperty::getValue)); - String bqProject = PROJECT_ID; + String bqProject = AcceptanceTestUtils.getAcceptanceProject(); + ; String bqDataset = "hivebq_test_dataset_" + testId.replace("-", "_"); String bqTable = "hivebq_test_table_" + testId.replace("-", "_"); + TestUtils.createBucket(AcceptanceTestUtils.getAcceptanceTestBucket()); + uploadConnectorJar(CONNECTOR_JAR_DIRECTORY, CONNECTOR_JAR_PREFIX, connectorJarUri); uploadConnectorInitAction(CONNECTOR_INIT_ACTION_PATH, connectorInitActionUri); @@ -99,12 +99,7 @@ protected static AcceptanceTestContext setup( createBqDataset(bqProject, bqDataset); createClusterIfNeeded( - clusterName, - dataprocImageVersion, - testId, - properties, - connectorJarUri, - connectorInitActionUri); + clusterName, dataprocImageVersion, properties, connectorJarUri, connectorInitActionUri); AcceptanceTestContext testContext = new AcceptanceTestContext( @@ -155,7 +150,6 @@ private interface ThrowingConsumer { protected static void createClusterIfNeeded( String clusterName, String dataprocImageVersion, - String testId, Map properties, String connectorJarUri, String connectorInitActionUri) @@ -165,12 +159,20 @@ protected static void createClusterIfNeeded( clusterName, dataprocImageVersion, properties, connectorJarUri, connectorInitActionUri); System.out.println("Cluster spec:\n" + clusterSpec); System.out.println("Creating cluster " + clusterName + " ..."); - cluster(client -> client.createClusterAsync(PROJECT_ID, REGION, clusterSpec).get()); + cluster( + client -> + client + .createClusterAsync(AcceptanceTestUtils.getAcceptanceProject(), REGION, clusterSpec) + .get()); } protected static void deleteCluster(String clusterName) throws Exception { System.out.println("Deleting cluster " + clusterName + " ..."); - cluster(client -> client.deleteClusterAsync(PROJECT_ID, REGION, clusterName).get()); + cluster( + client -> + client + .deleteClusterAsync(AcceptanceTestUtils.getAcceptanceProject(), REGION, clusterName) + .get()); } private static void cluster(ThrowingConsumer command) throws Exception { @@ -189,7 +191,7 @@ private static Cluster createClusterSpec( String connectorInitActionUri) { return Cluster.newBuilder() .setClusterName(clusterName) - .setProjectId(PROJECT_ID) + .setProjectId(AcceptanceTestUtils.getAcceptanceProject()) .setConfig( ClusterConfig.newBuilder() .addInitializationActions( @@ -257,12 +259,18 @@ private Job runAndWait(String testName, Job job, Duration timeout) throws Except try (JobControllerClient jobControllerClient = JobControllerClient.create( JobControllerSettings.newBuilder().setEndpoint(DATAPROC_ENDPOINT).build())) { - Job request = jobControllerClient.submitJob(PROJECT_ID, REGION, job); + Job request = + jobControllerClient.submitJob(AcceptanceTestUtils.getAcceptanceProject(), REGION, job); String jobId = request.getReference().getJobId(); System.err.println(String.format("%s job ID: %s", testName, jobId)); CompletableFuture finishedJobFuture = CompletableFuture.supplyAsync( - () -> waitForJobCompletion(jobControllerClient, PROJECT_ID, REGION, jobId)); + () -> + waitForJobCompletion( + jobControllerClient, + AcceptanceTestUtils.getAcceptanceProject(), + REGION, + jobId)); Job jobInfo = finishedJobFuture.get(timeout.getSeconds(), TimeUnit.SECONDS); return jobInfo; } diff --git a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/integration/IntegrationTestsBase.java b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/integration/IntegrationTestsBase.java index 4ecc10e8..b83fa67c 100644 --- a/connector/src/test/java/com/google/cloud/hive/bigquery/connector/integration/IntegrationTestsBase.java +++ b/connector/src/test/java/com/google/cloud/hive/bigquery/connector/integration/IntegrationTestsBase.java @@ -20,7 +20,6 @@ import com.google.cloud.bigquery.*; import com.google.cloud.hive.bigquery.connector.config.HiveBigQueryConfig; -import com.google.cloud.storage.StorageException; import com.klarna.hiverunner.HiveRunnerExtension; import com.klarna.hiverunner.HiveShell; import com.klarna.hiverunner.annotations.HiveSQL; @@ -68,16 +67,8 @@ public class IntegrationTestsBase { @BeforeAll public static void setUpAll() { - testBucketName = getTestBucket(); - - // Create the temp bucket for indirect writes if it does not exist. - try { - createBucket(testBucketName); - } catch (StorageException e) { - if (e.getCode() == 409) { - // The bucket already exists, which is okay. - } - } + testBucketName = getIntegrationTestBucket(); + createBucket(testBucketName); // Upload datasets to the BigLake bucket. uploadBlob( From 6d6a9f9f5e81047a7418ac0275b824d7b5e24b90 Mon Sep 17 00:00:00 2001 From: Julien Phalip Date: Fri, 15 Sep 2023 15:04:20 -0700 Subject: [PATCH 2/2] Small doc fix --- README-template.md | 6 +++--- README.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/README-template.md b/README-template.md index 7d1fdaa4..c8860510 100644 --- a/README-template.md +++ b/README-template.md @@ -788,15 +788,15 @@ To run the acceptance tests: ./mvnw verify -Pdataproc21,acceptance ``` -If you want to avoid rebuilding `shaded-dependencies` and `shaded-test-dependencies` when there is no changes in these +If you want to avoid rebuilding `shaded-dependencies` and `shaded-acceptance-tests-dependencies` when there is no changes in these modules, you can break it down into several steps, and only rerun the necessary steps: ```sh # Install hive-bigquery-parent/pom.xml to Maven local repo mvn install:install-file -Dpackaging=pom -Dfile=hive-bigquery-parent/pom.xml -DpomFile=hive-bigquery-parent/pom.xml -# Build and install shaded-dependencies and shaded-test-dependencies jars to Maven local repo -mvn clean install -pl shaded-dependencies,shaded-test-dependencies -Pdataproc21 -DskipTests +# Build and install shaded-deps-dataproc21 and shaded-acceptance-tests-dependencies jars to Maven local repo +mvn clean install -pl shaded-deps-dataproc21,shaded-acceptance-tests-dependencies -Pdataproc21 -DskipTests # Build and test connector mvn clean verify -pl connector -Pdataproc21,acceptance diff --git a/README.md b/README.md index 7d1fdaa4..c8860510 100644 --- a/README.md +++ b/README.md @@ -788,15 +788,15 @@ To run the acceptance tests: ./mvnw verify -Pdataproc21,acceptance ``` -If you want to avoid rebuilding `shaded-dependencies` and `shaded-test-dependencies` when there is no changes in these +If you want to avoid rebuilding `shaded-dependencies` and `shaded-acceptance-tests-dependencies` when there is no changes in these modules, you can break it down into several steps, and only rerun the necessary steps: ```sh # Install hive-bigquery-parent/pom.xml to Maven local repo mvn install:install-file -Dpackaging=pom -Dfile=hive-bigquery-parent/pom.xml -DpomFile=hive-bigquery-parent/pom.xml -# Build and install shaded-dependencies and shaded-test-dependencies jars to Maven local repo -mvn clean install -pl shaded-dependencies,shaded-test-dependencies -Pdataproc21 -DskipTests +# Build and install shaded-deps-dataproc21 and shaded-acceptance-tests-dependencies jars to Maven local repo +mvn clean install -pl shaded-deps-dataproc21,shaded-acceptance-tests-dependencies -Pdataproc21 -DskipTests # Build and test connector mvn clean verify -pl connector -Pdataproc21,acceptance