Skip to content

Commit

Permalink
chore: rename to raystack (#226)
Browse files Browse the repository at this point in the history
SSL/TLS support for Dagger Kafka Source
Add support for kafka producer config linger.ms in kafka sink
feat: stencil schema auto refresh and fix typehandler bug
fix:[longbow] excluded module google-cloud-bigtable from minimaljar
feat: bump version to 0.9.6
  • Loading branch information
ravisuhag authored Jul 15, 2023
1 parent 203524c commit 1ee28ab
Show file tree
Hide file tree
Showing 766 changed files with 5,709 additions and 4,100 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: core-dependencies
name: dependencies-publish

on:
workflow_dispatch
Expand Down
52 changes: 0 additions & 52 deletions CHANGELOG.md

This file was deleted.

62 changes: 35 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,63 @@
# Dagger
![build workflow](https://github.com/odpf/dagger/actions/workflows/build.yml/badge.svg)
![package workflow](https://github.com/odpf/dagger/actions/workflows/package.yml/badge.svg)

![build workflow](https://github.com/raystack/dagger/actions/workflows/build.yml/badge.svg)
![package workflow](https://github.com/raystack/dagger/actions/workflows/package.yml/badge.svg)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?logo=apache)](LICENSE)
[![Version](https://img.shields.io/github/v/release/odpf/dagger?logo=semantic-release)](https://github.com/odpf/dagger/releases/latest)
[![Version](https://img.shields.io/github/v/release/raystack/dagger?logo=semantic-release)](https://github.com/raystack/dagger/releases/latest)

Dagger or Data Aggregator is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink
for stateful processing of data. With Dagger, you don't need to write custom applications or complicated code to process
for stateful processing of data. With Dagger, you don't need to write custom applications or complicated code to process
data as a stream. Instead, you can write SQL queries and UDFs to do the processing and analysis on streaming data.

![](docs/static/img/overview/dagger_overview.png)

## Key Features

Discover why to use Dagger

- **Processing:** Dagger can transform, aggregate, join and enrich streaming data, both real-time and historical.
- **Scale:** Dagger scales in an instant, both vertically and horizontally for high performance streaming sink and zero data drops.
- **Extensibility:** Add your own sink to dagger with a clearly defined interface or choose from already provided ones. Use Kafka and/or Parquet Files as stream sources.
- **Flexibility:** Add custom business logic in form of plugins \(UDFs, Transformers, Preprocessors and Post Processors\) independent of the core logic.
- **Metrics:** Always know what’s going on with your deployment with built-in [monitoring](https://odpf.github.io/dagger/docs/reference/metrics) of throughput, response times, errors and more.
- **Metrics:** Always know what’s going on with your deployment with built-in [monitoring](https://raystack.github.io/dagger/docs/reference/metrics) of throughput, response times, errors and more.

## What problems Dagger solves?
* Map reduce -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html)
* Enrichment -> [Post Processors](https://odpf.github.io/dagger/docs/advance/post_processor)
* Aggregation -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html), [UDFs](https://odpf.github.io/dagger/docs/guides/use_udf)
* Masking -> [Hash Transformer](https://odpf.github.io/dagger/docs/reference/transformers#HashTransformer)
* Deduplication -> [Deduplication Transformer](https://odpf.github.io/dagger/docs/reference/transformers#DeDuplicationTransformer)
* Realtime long window processing -> [Longbow](https://odpf.github.io/dagger/docs/advance/longbow)

To know more, follow the detailed [documentation](https://odpf.github.io/dagger/).
- Map reduce -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html)
- Enrichment -> [Post Processors](https://raystack.github.io/dagger/docs/advance/post_processor)
- Aggregation -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html), [UDFs](https://raystack.github.io/dagger/docs/guides/use_udf)
- Masking -> [Hash Transformer](https://raystack.github.io/dagger/docs/reference/transformers#HashTransformer)
- Deduplication -> [Deduplication Transformer](https://raystack.github.io/dagger/docs/reference/transformers#DeDuplicationTransformer)
- Realtime long window processing -> [Longbow](https://raystack.github.io/dagger/docs/advance/longbow)

To know more, follow the detailed [documentation](https://raystack.github.io/dagger/).

## Usage

Explore the following resources to get started with Dagger:

* [Guides](https://odpf.github.io/dagger/docs/guides/overview) provides guidance on [creating Dagger](https://odpf.github.io/dagger/docs/guides/create_dagger) with different sinks.
* [Concepts](https://odpf.github.io/dagger/docs/concepts/overview) describes all important Dagger concepts.
* [Advance](https://odpf.github.io/dagger/docs/advance/overview) contains details regarding advance features of Dagger.
* [Reference](https://odpf.github.io/dagger/docs/reference/overview) contains details about configurations, metrics and other aspects of Dagger.
* [Contribute](https://odpf.github.io/dagger/docs/contribute/contribution) contains resources for anyone who wants to contribute to Dagger.
* [Usecase](https://odpf.github.io/dagger/docs/usecase/overview) describes examples use cases which can be solved via Dagger.
* [Examples](https://odpf.github.io/dagger/docs/examples/overview) contains tutorials to try out some of Dagger's features with real-world usecases
- [Guides](https://raystack.github.io/dagger/docs/guides/overview) provides guidance on [creating Dagger](https://raystack.github.io/dagger/docs/guides/create_dagger) with different sinks.
- [Concepts](https://raystack.github.io/dagger/docs/concepts/overview) describes all important Dagger concepts.
- [Advance](https://raystack.github.io/dagger/docs/advance/overview) contains details regarding advance features of Dagger.
- [Reference](https://raystack.github.io/dagger/docs/reference/overview) contains details about configurations, metrics and other aspects of Dagger.
- [Contribute](https://raystack.github.io/dagger/docs/contribute/contribution) contains resources for anyone who wants to contribute to Dagger.
- [Usecase](https://raystack.github.io/dagger/docs/usecase/overview) describes examples use cases which can be solved via Dagger.
- [Examples](https://raystack.github.io/dagger/docs/examples/overview) contains tutorials to try out some of Dagger's features with real-world usecases

## Running locally

Please follow this [Dagger Quickstart Guide](https://odpf.github.io/dagger/docs/guides/quickstart) for setting up a local running Dagger consuming from Kafka or to set up a Docker Compose for Dagger.
Please follow this [Dagger Quickstart Guide](https://raystack.github.io/dagger/docs/guides/quickstart) for setting up a local running Dagger consuming from Kafka or to set up a Docker Compose for Dagger.

**Note:** Sample configuration for running a basic dagger can be found [here](https://odpf.github.io/dagger/docs/guides/create_dagger#common-configurations). For detailed configurations, refer [here](https://odpf.github.io/dagger/docs/reference/configuration).
**Note:** Sample configuration for running a basic dagger can be found [here](https://raystack.github.io/dagger/docs/guides/create_dagger#common-configurations). For detailed configurations, refer [here](https://raystack.github.io/dagger/docs/reference/configuration).

Find more detailed steps on local setup [here](https://odpf.github.io/dagger/docs/guides/create_dagger).
Find more detailed steps on local setup [here](https://raystack.github.io/dagger/docs/guides/create_dagger).

## Running on cluster
Refer [here](https://odpf.github.io/dagger/docs/guides/deployment) for details regarding Dagger deployment.

## Running tests
Refer [here](https://raystack.github.io/dagger/docs/guides/deployment) for details regarding Dagger deployment.

## Running tests

```sh
# Running unit tests
$ ./gradlew clean test
Expand All @@ -67,12 +73,14 @@ $ ./gradlew clean

Development of Dagger happens in the open on GitHub, and we are grateful to the community for contributing bug fixes and improvements. Read below to learn how you can take part in improving Dagger.

Read our [contributing guide](https://odpf.github.io/dagger/docs/contribute/contribution) to learn about our development process, how to propose bug fixes and improvements, and how to build and test your changes to Dagger.
Read our [contributing guide](https://raystack.github.io/dagger/docs/contribute/contribution) to learn about our development process, how to propose bug fixes and improvements, and how to build and test your changes to Dagger.

To help you get your feet wet and get you familiar with our contribution process, we have a list of [good first issues](https://github.com/odpf/dagger/labels/good%20first%20issue) that contain bugs which have a relatively limited scope. This is a great place to get started.
To help you get your feet wet and get you familiar with our contribution process, we have a list of [good first issues](https://github.com/raystack/dagger/labels/good%20first%20issue) that contain bugs which have a relatively limited scope. This is a great place to get started.

## Credits
This project exists thanks to all the [contributors](https://github.com/odpf/dagger/graphs/contributors).

This project exists thanks to all the [contributors](https://github.com/raystack/dagger/graphs/contributors).

## License

Dagger is [Apache 2.0](LICENSE) licensed.
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ subprojects {
apply plugin: 'idea'
apply plugin: 'checkstyle'

group 'io.odpf'
group 'org.raystack'

checkstyle {
toolVersion '7.6.1'
Expand Down
4 changes: 2 additions & 2 deletions dagger-common/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ dependencies {
compileOnly group: 'org.apache.flink', name: 'flink-table', version: flinkVersion
compileOnly group: 'org.apache.flink', name: 'flink-table-api-java-bridge_2.11', version: flinkVersion
compileOnly group: 'org.apache.flink', name: 'flink-connector-kafka_2.11', version: flinkVersion
compileOnly 'org.raystack:stencil:0.4.0'

dependenciesCommonJar ('org.apache.hadoop:hadoop-client:2.8.3') {
exclude module:"commons-cli"
Expand All @@ -67,7 +68,6 @@ dependencies {
dependenciesCommonJar 'org.apache.flink:flink-metrics-dropwizard:' + flinkVersion
dependenciesCommonJar 'org.apache.flink:flink-json:' + flinkVersion
dependenciesCommonJar 'com.jayway.jsonpath:json-path:2.4.0'
dependenciesCommonJar 'io.odpf:stencil:0.2.1'
dependenciesCommonJar 'com.google.code.gson:gson:2.8.2'
dependenciesCommonJar 'org.apache.parquet:parquet-column:1.12.2'

Expand Down Expand Up @@ -127,7 +127,7 @@ publishing {
repositories {
maven {
name = "GitHubPackages"
url = "https://maven.pkg.github.com/odpf/dagger"
url = "https://maven.pkg.github.com/raystack/dagger"
credentials {
username = System.getenv("GITHUB_ACTOR")
password = System.getenv("GITHUB_TOKEN")
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package io.odpf.dagger.common.configuration;
package org.raystack.dagger.common.configuration;

import org.apache.flink.api.java.utils.ParameterTool;

Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,24 @@
package io.odpf.dagger.common.core;
package org.raystack.dagger.common.core;

public class Constants {
public static final String SCHEMA_REGISTRY_STENCIL_ENABLE_KEY = "SCHEMA_REGISTRY_STENCIL_ENABLE";
public static final boolean SCHEMA_REGISTRY_STENCIL_ENABLE_DEFAULT = false;
public static final String SCHEMA_REGISTRY_STENCIL_URLS_KEY = "SCHEMA_REGISTRY_STENCIL_URLS";
public static final String SCHEMA_REGISTRY_STENCIL_URLS_DEFAULT = "";
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS = "SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS";
public static final Integer SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS_DEFAULT = 60000;
public static final Integer SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS_DEFAULT = 10000;
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_HEADERS_KEY = "SCHEMA_REGISTRY_STENCIL_FETCH_HEADERS";
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_HEADERS_DEFAULT = "";
public static final String SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH_KEY = "SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH";
public static final boolean SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH_DEFAULT = false;
public static final String SCHEMA_REGISTRY_STENCIL_CACHE_TTL_MS_KEY = "SCHEMA_REGISTRY_STENCIL_CACHE_TTL_MS";
public static final Long SCHEMA_REGISTRY_STENCIL_CACHE_TTL_MS_DEFAULT = 900000L;
public static final String SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY_KEY = "SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY";
public static final String SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY_DEFAULT = "LONG_POLLING";
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_BACKOFF_MIN_MS_KEY = "SCHEMA_REGISTRY_STENCIL_FETCH_BACKOFF_MIN_MS";
public static final Long SCHEMA_REGISTRY_STENCIL_FETCH_BACKOFF_MIN_MS_DEFAULT = 60000L;
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_RETRIES_KEY = "SCHEMA_REGISTRY_STENCIL_FETCH_RETRIES";
public static final Integer SCHEMA_REGISTRY_STENCIL_FETCH_RETRIES_DEFAULT = 4;

public static final String UDF_TELEMETRY_GROUP_KEY = "udf";
public static final String GAUGE_ASPECT_NAME = "value";
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package io.odpf.dagger.common.core;
package org.raystack.dagger.common.core;

import io.odpf.dagger.common.configuration.Configuration;
import io.odpf.dagger.common.exceptions.DaggerContextException;
import org.raystack.dagger.common.configuration.Configuration;
import org.raystack.dagger.common.exceptions.DaggerContextException;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
package org.raystack.dagger.common.core;

import com.google.protobuf.Descriptors;

import java.io.Serializable;
import java.util.HashMap;
import java.util.List;
import java.util.Map;


public class FieldDescriptorCache implements Serializable {
private final Map<String, Integer> fieldDescriptorIndexMap = new HashMap<>();
private final Map<String, Integer> protoDescriptorArityMap = new HashMap<>();

public FieldDescriptorCache(Descriptors.Descriptor descriptor) {

cacheFieldDescriptorMap(descriptor);
}

public void cacheFieldDescriptorMap(Descriptors.Descriptor descriptor) {

if (protoDescriptorArityMap.containsKey(descriptor.getFullName())) {
return;
}
List<Descriptors.FieldDescriptor> descriptorFields = descriptor.getFields();
protoDescriptorArityMap.putIfAbsent(descriptor.getFullName(), descriptorFields.size());

for (Descriptors.FieldDescriptor fieldDescriptor : descriptorFields) {
fieldDescriptorIndexMap.putIfAbsent(fieldDescriptor.getFullName(), fieldDescriptor.getIndex());
}

for (Descriptors.FieldDescriptor fieldDescriptor : descriptorFields) {
if (fieldDescriptor.getType().toString().equals("MESSAGE")) {
cacheFieldDescriptorMap(fieldDescriptor.getMessageType());

}
}
}

public int getOriginalFieldIndex(Descriptors.FieldDescriptor fieldDescriptor) {
if (!fieldDescriptorIndexMap.containsKey(fieldDescriptor.getFullName())) {
throw new IllegalArgumentException("The Field Descriptor " + fieldDescriptor.getFullName() + " was not found in the cache");
}
return fieldDescriptorIndexMap.get(fieldDescriptor.getFullName());
}

public boolean containsField(String fieldName) {

return fieldDescriptorIndexMap.containsKey(fieldName);
}

public int getOriginalFieldCount(Descriptors.Descriptor descriptor) {
if (!protoDescriptorArityMap.containsKey(descriptor.getFullName())) {
throw new IllegalArgumentException("The Proto Descriptor " + descriptor.getFullName() + " was not found in the cache");
}
return protoDescriptorArityMap.get(descriptor.getFullName());
}
}
Loading

0 comments on commit 1ee28ab

Please sign in to comment.