Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/pravega/flink-connectors
Browse files Browse the repository at this point in the history
…into docusaurus
  • Loading branch information
Claudio Fahey committed Mar 24, 2021
2 parents 739e9b2 + 2cbdbac commit 2672d09
Show file tree
Hide file tree
Showing 15 changed files with 298 additions and 121 deletions.
62 changes: 62 additions & 0 deletions .github/workflows/build-artifacts.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
name: build-artifacts

on: [push, pull_request, workflow_dispatch]
# workflow_dispatch should make manually triggered ci/cd possible
# workflow file (like this) with `workflow_dispatch` after on should exist on the **master** or default branch,
# or there will be no ui for a manual trigger. https://github.community/t/workflow-dispatch-event-not-working/128856/2

env:
GRADLE_OPTS: "-Xms128m -Xmx1024m"
ORG_GRADLE_PROJECT_logOutput: true

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Set up JDK 11
uses: actions/setup-java@v1
with:
java-version: '11' # major or semver Java version will be acceptable, see https://github.com/marketplace/actions/setup-java-jdk#basic
java-package: jdk # (jre, jdk, or jdk+fx) - defaults to jdk
architecture: x64 # (x64 or x86) - defaults to x64

- name: Cache gradle modules
uses: actions/cache@v2
with:
# gradle packages need to be cached
path: |
.gradle
$HOME/.gradle
$HOME/.m2
# key to identify the specific cache
# so if we upgrade some modules, the key(hash) of `gradle.properties` will change
# and we will rerun the download to get the newest packages
key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}

- name: Grant execute permission for gradlew
run: chmod +x gradlew

- name: Build via Gradle
run: ./gradlew clean build

- name: Build via Gradle with Scala 2.11
run: |
./gradlew clean build -PflinkScalaVersion=2.11
bash <(curl -s https://codecov.io/bash) -t 9c42ff48-d98f-4444-af05-cf734aa1dbd0
snapshot:
needs: [build]
runs-on: ubuntu-latest
# only publish the snapshot when it is a push on the master or the release branch (starts with r0.x or r1.x)
if: ${{ github.event_name == 'push' && (github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/heads/r0.') || startsWith(github.ref, 'refs/heads/r1.')) }}
env:
BINTRAY_USER: ${{ secrets.BINTRAY_USER }}
BINTRAY_KEY: ${{ secrets.BINTRAY_KEY }}
steps:
- name: Publish to repo
run: ./gradlew publishToRepo -PpublishUrl=jcenterSnapshot -PpublishUsername=$BINTRAY_USER -PpublishPassword=$BINTRAY_KEY

- name: Publish to repo with Scala 2.11
run: ./gradlew publishToRepo -PpublishUrl=jcenterSnapshot -PpublishUsername=$BINTRAY_USER -PpublishPassword=$BINTRAY_KEY -PflinkScalaVersion=2.11
4 changes: 0 additions & 4 deletions .gitmodules

This file was deleted.

61 changes: 0 additions & 61 deletions .travis.yml

This file was deleted.

14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,19 @@ The connectors can be used to build end-to-end stream processing pipelines (see

- Table API support to access Pravega Streams for both **Batch** and **Streaming** use case.

## Compatibility Matrix
The [master](https://github.com/pravega/flink-connectors) branch will always have the most recent
supported versions of Flink and Pravega.

| Git Branch | Pravega Version | Java Version To Build Connector | Java Version To Run Connector | Flink Version | Status | Artifact Link |
|-------------------------------------------------------------------------------------|------|---------|--------------|------|-------------------|----------------------------------------------------------------------------------------|
| [master](https://github.com/pravega/flink-connectors) | 0.10 | Java 11 | Java 8 or 11 | 1.12 | Under Development | http://oss.jfrog.org/jfrog-dependencies/io/pravega/pravega-connectors-flink-1.12_2.12/ |
| [r0.10-flink1.11](https://github.com/pravega/flink-connectors/tree/r0.10-flink1.11) | 0.10 | Java 11 | Java 8 or 11 | 1.11 | Under Development | http://oss.jfrog.org/jfrog-dependencies/io/pravega/pravega-connectors-flink-1.11_2.12/ |
| [r0.10-flink1.10](https://github.com/pravega/flink-connectors/tree/r0.10-flink1.10) | 0.10 | Java 11 | Java 8 or 11 | 1.10 | Under Development | http://oss.jfrog.org/jfrog-dependencies/io/pravega/pravega-connectors-flink-1.10_2.12/ |
| [r0.9](https://github.com/pravega/flink-connectors/tree/r0.9) | 0.9 | Java 11 | Java 8 or 11 | 1.11 | Released | https://repo1.maven.org/maven2/io/pravega/pravega-connectors-flink-1.11_2.12/0.9.0/ |
| [r0.9-flink1.10](https://github.com/pravega/flink-connectors/tree/r0.9-flink1.10) | 0.9 | Java 11 | Java 8 or 11 | 1.10 | Released | https://repo1.maven.org/maven2/io/pravega/pravega-connectors-flink-1.10_2.12/0.9.0/ |
| [r0.9-flink1.9](https://github.com/pravega/flink-connectors/tree/r0.9-flink1.9) | 0.9 | Java 11 | Java 8 or 11 | 1.9 | Released | https://repo1.maven.org/maven2/io/pravega/pravega-connectors-flink-1.9_2.12/0.9.0/ |

## Documentation
To learn more about how to build and use the Flink Connector library, follow the connector documentation [here](http://pravega.io/).

Expand All @@ -33,4 +46,3 @@ More examples on how to use the connectors with Flink application can be found i

Flink connectors for Pravega is 100% open source and community-driven. All components are available
under [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0.html) on GitHub.

182 changes: 182 additions & 0 deletions documentation/src/docs/dev-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
<!--
Copyright (c) Dell Inc., or its subsidiaries. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
-->
# Flink Connector - Dev Guide
Learn how to build your own applications that using Flink connector for Pravega.


# Prerequisites
To complete this guide, you need:

* JDK 8 or 11 installed with JAVA_HOME configured appropriately
* Pravega running(Check [here](https://pravega.io/docs/latest/getting-started/) to get started with Pravega)
* Use Gradle or Maven



# Goal
In this guide, we will create a straightforward example application that writes data collected from an external network stream into a Pravega Stream and read the data from the Pravega Stream.
We recommend that you follow the instructions from [Bootstrapping project](#Bootstrapping-the-Project) onwards to create the application step by step.
However, you can go straight to the completed example at [flink-connector-examples](https://github.com/pravega/pravega-samples/tree/master/flink-connector-examples).




# Starting Flink
Download Flink release and un-tar it. We use Flink 1.11.2 here.
```
$ tar -xzf flink-1.11.2-bin-scala_2.11.tgz
$ cd flink-1.11.2-bin-scala_2.11
```
Start a cluster
```
$ ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host.
Starting taskexecutor daemon on host.
```
When you are finished you can quickly stop the cluster and all running components.
```
$ ./bin/stop-cluster.sh
```

# Bootstrapping the Project.

Using Gradle or Maven to bootstrap a sample application against Pravega. Let's create a word count application as an example.
### Gradle
You can follow [here](https://ci.apache.org/projects/flink/flink-docs-stable/dev/project-configuration.html#gradle) to create a gradle project.

Add the below snippet to dependencies section of build.gradle in the app directory, connector dependencies should be part of the shadow jar. For flink connector dependency, we need to choose the connector which aligns the Flink major version and Scala version if you use Scala, along with the same Pravega version you run.
```
compile group 'org.apache.flink', name: 'flink-streaming-java_2.12', version: '1.11.2'
flinkShadowJar group: 'io.pravega', name: 'pravega-connectors-flink-1.11_2.12', version: '0.9.0'
```
Define custom configurations `flinkShadowJar`
```
// -> Explicitly define the // libraries we want to be included in the "flinkShadowJar" configuration!
configurations {
flinkShadowJar // dependencies which go into the shadowJar
// always exclude these (also from transitive dependencies) since they are provided by Flink
flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading'
flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305'
flinkShadowJar.exclude group: 'org.slf4j'
flinkShadowJar.exclude group: 'org.apache.logging.log4j'
}
```

Invoke `gradle clean shadowJar` to build/package the project. You will find a JAR file that contains your application, plus connectors and libraries that you may have added as dependencies to the application: `build/libs/<project-name>-<version>-all.jar`.


### Maven

You can check [maven-quickstart](https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#maven-quickstart) to find how to start with Maven.

Add below dependencies into Maven POM, these dependencies should be part of the shadow jar
```
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.11.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.pravega</groupId>
<artifactId>pravega-connectors-flink-1.11_2.12</artifactId>
<version>0.9.0</version>
</dependency>
```

Invoke `mvn clean package` to build/package your project. You will find a JAR file that contains your application, plus connectors and libraries that you may have added as dependencies to the application: `target/<artifact-id>-<version>.jar`.




## Create an application that writes to Pravega

Let’s first create a pravega configuration reading from arguments:
```java
ParameterTool params = ParameterTool.fromArgs(args);
PravegaConfig pravegaConfig = PravegaConfig
.fromParams(params)
.withDefaultScope("my_scope");
```
Then we need to initialize the Flink execution environment
```java
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
```
Create a datastream that gets input data by connecting to the socket
```java
DataStream<String> dataStream = env.socketTextStream(host, port);
```
A Pravega Stream may be used as a data sink within a Flink program using an instance of `io.pravega.connectors.flink.FlinkPravegaWriter`. We add an instance of the writer to the dataflow program:
```java
FlinkPravegaWriter<String> writer = FlinkPravegaWriter.<String>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withSerializationSchema(new SimpleStringSchema())
.build();
dataStream.addSink(writer).name("Pravega Sink");
```
Then we execute the job within the Flink environment
```java
env.execute("PravegaWriter");
```
Executing the above lines should ensure we have created a PravegaWriter job

## Create an application that reads from Pravega
Creating a Pravega Reader is similar to Pravega Writer
First create a pravega configuration reading from arguments:
```java
ParameterTool params = ParameterTool.fromArgs(args);
PravegaConfig pravegaConfig = PravegaConfig
.fromParams(params)
.withDefaultScope("my_scope");
```
Initialize the Flink execution environment
```java
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
```
A Pravega Stream may be used as a data source within a Flink streaming program using an instance of `io.pravega.connectors.flink.FlinkPravegaReader`. The reader reads a given Pravega Stream (or multiple streams) as a DataStream
```java
FlinkPravegaReader<String> source = FlinkPravegaReader.<String>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(new SimpleStringSchema())
.build();
```
Then create a datastream count each word over a 10 second time period
```java
DataStream<WordCount> dataStream = env.addSource(source).name("Pravega Stream")
.flatMap(new Tokenizer()) // The Tokenizer() splits the line into words, and emit streams of "WordCount(word, 1)"
.keyBy("word")
.timeWindow(Time.seconds(10))
.sum("count");
```
Create an output sink to print to stdout for verification
```java
dataStream.print();
```
Then we execute the job within the Flink environment
```java
env.execute("PravegaReader");
```

## Run in flink environment
First build your application. From Flink's perspective, the connector to Pravega is part of the streaming application (not part of Flink's core runtime), so the connector code must be part of the application's code artifact (JAR file). Typically, a Flink application is bundled as a `fat-jar` (also known as an `uber-jar`) , such that all its dependencies are embedded.

Make sure your Pravega and Flink are running. Use the packaged jar, and run:
```
flink run -c <classname> ${your-app}.jar --controller <pravega-controller-uri>
```

# What’s next?
This guide covered the creation of a application that uses Flink connector to read and wirte from a pravega stream. However, there is much more. We recommend continuing the journey by going through [flink connector documents](https://pravega.io/docs/latest/connectors/flink-connector/) and check other examples on [flink-connector-examples](https://github.com/pravega/pravega-samples/tree/master/flink-connector-examples).
4 changes: 2 additions & 2 deletions documentation/src/docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,12 @@ The connectors can be used to build end-to-end stream processing pipelines (see

Building the connectors from the source is only necessary when we want to use or contribute to the latest (*unreleased*) version of the Pravega Flink connectors.

The connector project is linked to a specific version of Pravega, based on a [git submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules) pointing to a commit-id. By default the sub-module option is disabled and the build step will make use of the Pravega version defined in the `gradle.properties` file. You could override this option by enabling `usePravegaVersionSubModule` flag in `gradle.properties` to `true`.
The connector project is linked to a specific version of Pravega and the version is defined in the `gradle.properties` file (`pravegaVersion`).

Checkout the source code repository by following below steps:

```
git clone --recursive https://github.com/pravega/flink-connectors.git
git clone https://github.com/pravega/flink-connectors.git
```

After cloning the repository, the project can be built by running the below command in the project root directory `flink-connectors`.
Expand Down
5 changes: 1 addition & 4 deletions gradle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

# 3rd party Versions.
checkstyleToolVersion=7.1
flinkVersion=1.11.2
flinkVersion=1.12.2
flinkScalaVersion=2.12
jacksonVersion=2.8.9
lombokVersion=1.18.4
Expand All @@ -34,9 +34,6 @@ pravegaVersion=0.10.0-2815.7461614-SNAPSHOT
schemaRegistryVersion=0.2.0-50.7d32981-SNAPSHOT
apacheCommonsVersion=3.7

# flag to indicate if Pravega sub-module should be used instead of the version defined in 'pravegaVersion'
usePravegaVersionSubModule=false

# These properties are only needed for publishing to maven central
# Pravega Signing Key
signing.keyId=
Expand Down
1 change: 0 additions & 1 deletion pravega
Submodule pravega deleted from 746161
Loading

0 comments on commit 2672d09

Please sign in to comment.