Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark-bigtable-connector: Scala Version Upgrade (2.12 -> 2.13) #52

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions examples/scala-sbt/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,26 @@
* limitations under the License.
*/

name := "spark-bigtable-example"
/** build settings for scala 2.12 */
/*
name := "spark-bigtable-example-scala2.12"
version := "0.1"
scalaVersion := "2.12.18"
val sparkBigtable = "spark-bigtable-scala2.12"
*/

/** build settings for scala 2.13 */
name := "spark-bigtable-example-scala2.13"
version := "0.1"
scalaVersion := "2.13.14"
val sparkBigtable = "spark-bigtable-scala2.13"

scalaVersion := "2.12.19"
val sparkVersion = "3.0.1"
val sparkVersion = "3.5.1"

libraryDependencies += "com.google.cloud.spark.bigtable" % "spark-bigtable_2.12" % "0.1.0"
resolvers += Resolver.mavenLocal

libraryDependencies ++= Seq(
"com.google.cloud.spark.bigtable" % sparkBigtable % "0.2.1",
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.slf4j" % "slf4j-reload4j" % "1.7.36",
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,12 @@ object WordCount extends App {
)
.drop("frequency_double")

// "bigtable" is not working - throwing Data Source not found
// when full path like below is provided, it works!
val bigtableFormat = "com.google.cloud.spark.bigtable.BigtableDefaultSource"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work without this PR's changes? If so this would be a breaking change and we should try to understand why it is failing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. Now it is working with "bigtable"


dfToWrite.write
.format("bigtable")
.format(bigtableFormat)
.option("catalog", catalog)
.option("spark.bigtable.project.id", projectId)
.option("spark.bigtable.instance.id", instanceId)
Expand All @@ -66,7 +70,7 @@ object WordCount extends App {
println("DataFrame was written to Bigtable.")

val readDf = spark.read
.format("bigtable")
.format(bigtableFormat)
.option("catalog", catalog)
.option("spark.bigtable.project.id", projectId)
.option("spark.bigtable.instance.id", instanceId)
Expand Down
204 changes: 198 additions & 6 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.google.cloud.spark.bigtable</groupId>
Expand All @@ -25,7 +25,6 @@
<name>Spark Bigtable Connector Build Parent</name>
<description>Parent project for all the Spark Bigtable Connector artifacts</description>
<url>https://github.com/GoogleCloudDataproc/spark-bigtable-connector</url>

<licenses>
<license>
<name>Apache License, Version 2.0</name>
Expand All @@ -52,8 +51,9 @@
</scm>

<modules>
<module>spark-bigtable_2.12</module>
<module>spark-bigtable_2.12-it</module>
<module>spark-bigtable-scala2.12</module>
<module>spark-bigtable-scala2.13</module>
<module>spark-bigtable-core-it</module>
</modules>

<distributionManagement>
Expand All @@ -77,8 +77,8 @@
<bigtable.java.version>2.42.0</bigtable.java.version>
<bigtable.java.emulator.version>0.175.0</bigtable.java.emulator.version>

<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
<scalatest.version>3.2.16</scalatest.version>
<surefire.version>3.0.0-M5</surefire.version>
<junit.version>4.13.2</junit.version>
Expand All @@ -94,6 +94,198 @@
<commons-lang.version>2.6</commons-lang.version>
<openlineage.version>1.22.0</openlineage.version>
</properties>
<dependencies>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should some (all?) of these dependencies be on the -core package? Things like google-cloud-bigtable and such are dependencies for core, but since the parent package doesn't really have any source code we can leave these out of here and keep this pom focused on package assembly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core is just to provide all the source file. Created a new module (spark-bigtable). scala2.12 and scala2.13 are placed inside it. common dependencies are part of this new module now.

<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigtable</artifactId>
<version>${bigtable.java.version}</version>
</dependency>
<dependency>
<groupId>com.google.api.grpc</groupId>
<artifactId>grpc-google-cloud-bigtable-admin-v2</artifactId>
<version>${bigtable.java.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-reload4j</artifactId>
<version>${slf4j-reload4j.version}</version>
</dependency>
<dependency>
<groupId>io.openlineage</groupId>
<artifactId>spark-extension-interfaces</artifactId>
<version>${openlineage.version}</version>
</dependency>
<dependency>
<groupId>io.openlineage</groupId>
<artifactId>spark-extension-entrypoint</artifactId>
<version>1.0.0</version>
<scope>provided</scope>
</dependency>

<!-- Test dependencies -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigtable-emulator</artifactId>
<version>${bigtable.java.emulator.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>4.7.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.10.1</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
<addHeader>false</addHeader>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer">
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>${app.main.class}</Main-Class>
<X-Compile-Source-JDK>${maven.compile.source}</X-Compile-Source-JDK>
<X-Compile-Target-JDK>${maven.compile.target}</X-Compile-Target-JDK>
</manifestEntries>
</transformer>
</transformers>
<artifactSet>
<excludes>
<exclude>org.slf4j:slf4j-reload4j</exclude>
<exclude>org.slf4j:slf4j-api</exclude>
<exclude>ch.qos.reload4j:reload4j</exclude>
</excludes>
</artifactSet>
<relocations>
<relocation>
<pattern>io.netty</pattern>
<shadedPattern>com.google.cloud.spark.bigtable.repackaged.io.netty</shadedPattern>
</relocation>
<relocation>
<pattern>io.grpc</pattern>
<shadedPattern>com.google.cloud.spark.bigtable.repackaged.io.grpc</shadedPattern>
</relocation>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>com.google.cloud.spark.bigtable.repackaged.com.google</shadedPattern>
<excludes>
<exclude>com.google.cloud.spark.bigtable.**</exclude>
</excludes>
</relocation>
<relocation>
<pattern>io.openlineage.spark.shade</pattern>
<shadedPattern>com.google.cloud.spark.bigtable.repackaged.io.openlineage.spark.shade</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>
<!-- Enable scalatest. -->
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<version>2.2.0</version>
<configuration>
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<junitxml>.</junitxml>
<filereports>WDF TestSuite.txt</filereports>
<runpath>../third_party/hbase-spark-connector/hbase-connectors/src/test</runpath>
</configuration>
<executions>
<execution>
<id>test</id>
<goals>
<goal>test</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>license-maven-plugin</artifactId>
<version>2.4.0</version>
<executions>
<execution>
<id>default-cli</id>
<phase>generate-resources</phase>
<goals>
<goal>add-third-party</goal>
</goals>
<configuration>
<excludedScopes>test,provided</excludedScopes>
<generateBundle>true</generateBundle>
</configuration>
</execution>
</executions>
</plugin>
<!-- Include the source codes of dependencies with reciprocal licenses into the JAR. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.6.0</version>
<executions>
<execution>
<id>src-dependencies</id>
<phase>validate</phase>
<goals>
<goal>unpack-dependencies</goal>
</goals>
<configuration>
<includeArtifactIds>javax.annotation-api</includeArtifactIds>
<classifier>sources</classifier>
<failOnMissingClassifierArtifact>false</failOnMissingClassifierArtifact>
<outputDirectory>${project.build.directory}/sources</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

<profiles>
<profile>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@
<groupId>com.google.cloud.spark.bigtable</groupId>
<artifactId>spark-bigtable-connector</artifactId>
<version>0.2.1</version> <!-- ${NEXT_VERSION_FLAG} -->
<relativePath>../</relativePath>
<relativePath>../pom.xml</relativePath>
</parent>

<groupId>com.google.cloud.spark.bigtable</groupId>
<artifactId>spark-bigtable_2.12-it</artifactId>
<artifactId>spark-bigtable-core-it</artifactId>
<name>Google Bigtable - Spark Connector Integration Tests</name>
<version>0.2.1</version> <!-- ${NEXT_VERSION_FLAG} -->

Expand Down
Loading