Skip to content

Releases: YotpoLtd/metorikku

v0.0.46

06 Jun 21:16
Compare
Choose a tag to compare

Improvements:

  • Update to spark 2.4.3

Fixes

  • Fixed bug with schema registry kafka not supporting new columns added
  • Fixed hive save to never overwrite an S3 path if not specifically requested, also always update partitions and schema in hive after save.

v0.0.44

03 Jun 04:44
Compare
Choose a tag to compare

Improvements:

  • Add new job schema file (will add metric and test schema soon) thanks @ramizshahock
  • Support inline job configuration, you can now send a job JSON as a CLI parameter to metorikku (we are using this for airflow for example where we generate the JSON in airflow).

Fixes

  • Releases to Maven (apparently we weren't releasing to central maven for a long while now...)

v0.0.43

02 May 11:53
Compare
Choose a tag to compare

New:

Improvements:

  • Support kafka topic pattern
  • Support writing full dataframes to influxDB with all columns as tags
  • Add support for watermark in streaming metrics
  • Support streaming for all writers using batchMode

Fixes

  • Fixed hive implementation
  • Use Hive metastore in E2E instead of MySQL DB
  • Support tests without mocks
  • Multiple bug fixes

v0.0.42

28 Apr 17:43
Compare
Choose a tag to compare
Releasing 0.0.42

v0.0.41

07 Mar 14:53
Compare
Choose a tag to compare

New:

  • Added elasticsearch writer

v0.0.40

04 Mar 10:57
Compare
Choose a tag to compare

New:

  • Added ability to define a schema for a file input, supply schemaPath for the input with a path to a JSON Schema file and it will be used as the schema.
  • Added new flags for job: cacheOnPreview (caches the dataframe on each preview), showQuery (will print the query before each step)

Improvements:

  • Added support for named outputs (supporting multiple outputs with the same type)
  • Added repartition and coalesce flags as part of all the outputs (performed before the writer on the output dataframe
  • Added new File writer (basically CSV/JSON/Parquet writer all together)
  • File input/output is now generic and supports all formats by using a new format parameter
  • File input now supports multiple files with the , separator
  • Added full documentation for the metric config file

Fixes

  • Fixed readme

v0.0.38

20 Feb 13:50
Compare
Choose a tag to compare

New:

  • Add support for Apache Hive metasore read/write

Improvements:

  • Update Metorikku to spark 2.4.0

Fixes

  • Merged all file writers (JSON, Parquet, CSV) to a single writer

v0.0.37

12 Feb 16:58
Compare
Choose a tag to compare

Improvements:

  • Add new custom code step called DropColumns to support dropping columns from a table (currently not supported directly from spark SQL)

v0.0.36

12 Feb 15:43
Compare
Choose a tag to compare

New:

  • Added support to write instrumentation directly to influxDB
  • Added ability to add and use custom scala code (UDF) in a metric

Improvements:

  • Add support for testing streaming inputs in metorikku tester
  • Allow ignoring a failed step in a metric
  • Add new custom code step called AlignTable to align a table to another (make sure all columns are available and in the same order as the other table, useful for unions between different tables)

Fixes

  • Complete refactor to all the code, making it more readable and consistent.

v0.0.35

09 Jan 11:47
Compare
Choose a tag to compare

New:

  • Added cassandra input support

Improvements:

  • Added first E2E test around kafka using the new dockerized version of metorikku (check out the docker section in the readme)

Fixes

  • Fixed a bug in kafka lag write making it fail on offsets larger than an integer