Skip to content

Releases: apache/orc

v2.0.3

14 Nov 16:43
Compare
Choose a tag to compare

Milestone

Branch

Bug Fix

  • ORC-1796: [C++] Fix return wrong result if lack of has null

Test

  • ORC-1680: Bump bcpkix-jdk18on to 1.78
  • ORC-1702: Bump bcpkix-jdk18on to 1.78.1
  • ORC-1756: Bump snappy-java to 1.1.10.6 in bench module
  • ORC-1756: Upgrade snappy-java to 1.1.10.7 in bench module
  • ORC-1770: Upgrade parquet to 1.14.2 in bench module
  • ORC-1776: Remove MacOS 12 from GitHub Action CI and docs
  • ORC-1778: Upgrade Spark to 4.0.0-preview2 in bench module
  • ORC-1783: Add MacOS 15 to GitHub Action MacOS CI and docs
  • ORC-1790: Upgrade parquet to 1.14.3 in bench module
  • ORC-1800: Upgrade bcpkix-jdk18on to 1.79

Build and Dependency Changes

v1.9.5

14 Nov 16:25
Compare
Choose a tag to compare

Milestone

Changelog

BugFix

  • ORC-1741 Respect decimal reader isRepeating flag

Test

v1.8.8

11 Nov 15:05
Compare
Choose a tag to compare

Milestone

Changelog

Bug

ORC-1696: Fix ClassCastException when reading avro decimal type in benchmark
ORC-1738: [C++] Wrong Int128 maximum value

Test

ORC-1793: Upgrade Spark to 3.4.4

Documentation

ORC-1540: Remove MacOS 11 from GitHub Action CI

v1.7.11

13 Sep 15:48
660ce9a
Compare
Choose a tag to compare

Milestone

Branch

Bug Fix

  • ORC-1602: [C++] limit compression block size
  • ORC-1738: [C++] Fix wrong Int128 maximum value

Test

  • ORC-1540: Remove MacOS 11 from GitHub Action CI and docs
  • ORC-1556: Add Rocky Linux 9 Docker Test
  • ORC-1557: Add GitHub Action CI for Docker Test
  • ORC-1561: Remove Java11 and clang variants from docker/os-list.txt in branch-1.7
  • ORC-1578: Fix SparkBenchmark on sales data according to SPARK-40918
  • ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark

v2.0.2

15 Aug 16:21
801b2b9
Compare
Choose a tag to compare

Milestone

Branch

Improvements (tools)

  • ORC-1724: JsonFileDump utility should print user metadata
  • ORC-1740: Avoid the dump tool repeatedly parsing ColumnStatistics
  • ORC-1742: Support print the id, name and type of each column in dump tool

Bug Fix

  • ORC-1732: [C++] Fix detecting Homebrew-installed Protobuf on MacOS
  • ORC-1733: [C++][CMake] Fix CMAKE_MODULE_PATH not to use PROJECT_SOURCE_DIR
  • ORC-1738: [C++] Fix wrong Int128 maximum value
  • ORC-1741: Respect decimal reader isRepeating flag
  • ORC-1749: Fix supportVectoredIO for hadoop version string with optional patch labels
  • ORC-1751: [C++] fix syntax error in ThirdpartyToolchain

Test

  • ORC-1694: Upgrade gson to 2.9.0 for Benchmarks Hive
  • ORC-1697: Fix IllegalArgumentException when reading json timestamp type in benchmark
  • ORC-1700: Write parquet decimal type data in Benchmark using FIXED_LEN_BYTE_ARRAY type
  • ORC-1743: Upgrade Spark to 4.0.0-preview1
  • ORC-1744: Add ubuntu-24.04 to GitHub Action
  • ORC-1746: Bump netty-all to 4.1.110.Final in bench module
  • ORC-1752: Fix NumberFormatException when reading json timestamp type in benchmark
  • ORC-1753: Use Avro 1.12.0 in bench module

Build and Dependency Changes

v1.9.4

17 Jul 02:07
7878691
Compare
Choose a tag to compare

Milestone

Changelog

BugFix

  • ORC-1696 Fix ClassCastException when reading avro decimal type in bechmark
  • ORC-1721 Upgrade aircompressor to 0.27
  • ORC-1738 Wrong Int128 maximum value

Test

  • ORC-1619 Add MacOS 14 to GitHub Action
  • ORC-1699 Fix SparkBenchmark in Parquet format according to SPARK-40918

Task

  • ORC-1540 Remove MacOS 11 from GitHub Action CI

v2.0.1

15 May 04:31
23044d7
Compare
Choose a tag to compare

Milestone

Branch

Improvements (tools)

  • ORC-1644: Add merge tool to merge multiple ORC files into a single ORC file
  • ORC-1647: Tips for supporting ORC in the convert command
  • ORC-1667: Add check tool to check the index of the specified column

Bug Fix

  • ORC-1646: Close the reader when reading the schema with the convert command
  • ORC-1654: [C++] Count up EvaluatedRowGroupCount correctly
  • ORC-1684: [C++] Find tzdb without TZDIR when in conda-environments
  • ORC-1688: [C++] Do not access TZDB if there is no timestamp type
  • ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark

Task

  • ORC-1649:[C++][Conan] Add 2.0.0 to conan recipe and update release guide
  • ORC-1669: [C++] Deprecate HDFS support
  • ORC-1686: [C++] Avoid using std::filesystem

Test

  • ORC-1648: Add test to convert ORC in the convert command
  • ORC-1663: [C++] Enable TestTimezone.testMissingTZDB on Windows
  • ORC-1672: Remove test packages o.a.o.tools.check
  • ORC-1673: Remove test packages o.a.o.tools.[count|merge|sizes]
  • ORC-1676: Use Hive 4.0.0 in benchmark
  • ORC-1681: Remove redundant import statement in tests to fix checkstyle failures
  • ORC-1699: Fix SparkBenchmark in Parquet format according to SPARK-40918
  • ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
  • ORC-1707: Fix sun.util.calendar IllegalAccessException when SparkBenchmark runs on JDK17
  • ORC-1708: Support data/compress options in Hive benchmark

Build and Dependency Changes

Documentation

  • ORC-1668: Add merge command to Java tools documentation

v1.8.7

14 Apr 12:47
8d3c982
Compare
Choose a tag to compare

Milestone

Changelog

Bug

ORC-1528: Fix readBytes potential overflow in RecordReaderUtils.ChunkReader#create
ORC-1602: [C++] limit compression block size

Test

ORC-1556: Add Rocky Linux 9 Docker Test
ORC-1557: Add GitHub Action CI for Docker Test
ORC-1560: Remove Java11 and clang variants from docker/os-list.txt in branch-1.8
ORC-1562: Bump guava to 33.0.0-jre
ORC-1578: Fix SparkBenchmark on sales data according to SPARK-40918
ORC-1621: Switch to oraclelinux9 from rocky9

Documentation

ORC-1536: Remove hive-storage-api link from maven-javadoc-plugin
ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs

v1.9.3

21 Mar 05:18
Compare
Choose a tag to compare

Milestone

Changelog

BugFix

  • ORC-634 Fix the json output for double NaN and infinite
  • ORC-1553 Reading information from Row group, where there are 0 records of SArg column
  • ORC-1563 Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
  • ORC-1578 Fix SparkBenchmark according to SPARK-40918
  • ORC-1586 Fix IllegalAccessError when SparkBenchmark runs on JDK17
  • ORC-1602 [C++] limit compression block size
  • ORC-1607 Fix testDoubleNaNAndInfinite to use TestFileDump.checkOutput
  • ORC-1609 Fix the compilation problem of TestJsonFileDump in branch 1.9

Test

  • ORC-1556 Add Rocky Linux 9 Docker Test
  • ORC-1557 Add GitHub Action CI for Docker Test
  • ORC-1559 Remove Java11 and clang variants from docker/os-list.txt from branch-1.9

Task

  • ORC-1532 Upgrade opencsv to 5.9
  • ORC-1536 Remove hive-storage-api link from maven-javadoc-plugin
  • ORC-1576 Upgrade spark.jackson.version to 2.15.2 in bench module
  • ORC-1591 Lower log level from INFO to DEBUG in *ReaderImpl/WriterImpl/PhysicalFsWriter
  • ORC-1592 Suppress KeyProvider missing log
  • ORC-1616 Upgrade aircompressor to 0.26
  • ORC-1618 Disable building tests for snappy

Documentation:

  • ORC-1535 Remove generated Java docs from source tree

v2.0.0

08 Mar 21:20
46eb6ff
Compare
Choose a tag to compare

Milestone

Branch

This is a new major release which we cannot provide a changelog.

Summary of notable changes

ORC-1547: Spin-off ORC Format
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1507: Support Java 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1577: Use ZSTD as the default compression
ORC-1430: Use Hadoop 3.3.5 shaded clients
ORC-1456: Update Hadoop to 3.3.6
ORC-1251: Use Hadoop Vectored IO
ORC-1463: Support brotli codec
ORC-1100: Support vcpkg
ORC-1620: Add Apple Silicon Test Coverage

New Feature

ORC-998: Refactor compression output buffer within OutStream for better portability
ORC-1088: Suport ZSTD_JNI and columnn compress to set compression level
ORC-1100: Support vcpkg
ORC-1251: Use Hadoop Vectored IO
ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
ORC-1440: Check for protobuf config based module
ORC-1463: Support brotli codec
ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1531: Create orc-format module and repo
ORC-1545: Use orc-format 1.0.0-SNAPSHOT
ORC-1546: Use orc-format 1.0.0-alpha
ORC-1547: Spin-off ORC Format
ORC-1551: Use orc-format 1.0.0-beta
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1585: [C++] Add orc-format_ep as a dependency of orc

Improvement

ORC-1459: Mark DataBuffer::size() and DataBuffer::capacity() as const
ORC-1460: specification: Clarify how dictionary entries are sorted
ORC-1461: Mark Int128::getHighBits() and Int128::getLowBits() as const
ORC-1472: Replace deprecated method in TestMurmur3.java
ORC-1479: Enhance example usage message to use Uber jar
ORC-1481: [C++] Better error message when TZDB is unavailable
ORC-1504: Add lower bound check in get API for DynamicIntArray
ORC-1506: Replacing deprecated valueOf() with recommended forNumber()
ORC-1509: Auto grant contributor role to first-time contributors
ORC-1520: Remove JDK 8 settings from pom
ORC-1567: Add the -ignoreExtension configuration to the sizes and count commands of orc-tools
ORC-1570: Add supportVectoredIO API to HadoopShimsCurrent and use it
ORC-1571: Supports displaying raw data size in the meta command of orc-tools
ORC-1577: Use ZSTD as the default compression
ORC-1580: Change default DataBuffer constructor to use reserve instead of resize
ORC-1595: Add a short-cut to skip tiny inputs for ZstdCodec.compress
ORC-1596: Remove redundant Zstd.isError JNI usage
ORC-1597: Set bloom filter fpp to 1%
ORC-1600: Reduce getStaticMemoryManager sync block in OrcFile
ORC-1601: Reduce get HadoopShims sync block in HadoopShimsFactory
ORC-1610: Reduce the number of hash computation in CuckooSetBytes
ORC-1613: Zstd decompression supports direct buffer
ORC-1631: Supports summary output in sizes command
ORC-1637: [C++] Port conan recipe from upstream conan center
ORC-1638: Avoid System.exit(0) in count command
ORC-1639: [C++] Reduce unnecessary compiler flags in CMake
ORC-1641: Remove sourceFileExcludes from maven-javadoc-plugin
ORC-1642: Avoid System.exit(0) in scan command
ORC-1593: Set orc.compression.zstd.level to 3 by default

Bug Fix

ORC-634: Fix the json output for double NaN and infinite
ORC-1455: [C++] Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
ORC-1473: Zero-copy zeroCopyReadRanges and releaseBuffer bugs
ORC-1476: Maven build fail with unsupported platform: protoc-3.17.3-osx-aarch_64.exe
ORC-1480: [C++] Build failed when the BUILD_CPP_ENABLE_METRICS is ON
ORC-1500: [C++] The partition field does not support English special characters
ORC-1528: When using the orc.min.disk.seek.size configuration to read extremely large ORC files, a java.nio.BufferOverflowException may occur.
ORC-1553: Reading information from Row group, where there are 0 records of SArg column
ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
ORC-1568: Use readDiskRanges if orc.use.zerocopy is enabled
ORC-1575: Use ASF Archive URL instead Download URL
ORC-1578: Fix SparkBenchmark according to SPARK-40918
ORC-1588: Fix incorrect Decimal assert in LeafFilterFactory
ORC-1602: [C++] limit compression block size

Task

ORC-1422: Setting version to 2.0.0-SNAPSHOT
ORC-1434: Remove org.apache.hadoop from dependabot.yml
ORC-1484: Use JIRA_ACCESS_TOKEN in merge_orc_pr.py
ORC-1485: Enable checkstyle checks for test classes
ORC-1486: Fix checkstyle violations for tests in orc-core module
ORC-1492: Fix checkstyle violations for tests in mapreduce, tools, bench modules
ORC-1496: Use iterator to suggest backporting branches
ORC-1515: Skip publishing orc-example module
ORC-1516: Fix minor typo in comments in IOUtils
ORC-1518: Remove findbugs folders
ORC-1529: Fix minor typos in pom.xml
ORC-1530: Rename variables in RecordReaderUtils.ChunkReader#create
ORC-1535: Remove generated Java docs from source tree
ORC-1536: Remove hive-storage-api link from maven-javadoc-plugin
ORC-1540: Remove MacOS 11 from GitHub Action CI
ORC-1542: Use Pattern Matching for instanceof (JEP-394)
ORC-1549: Update libhdfspp.tar.gz by adding #include <cstdint>
ORC-1569: Remove HadoopShimsPre2_3, HadoopShimsPre2_6, HadoopShimsPre2_7 classes
ORC-1579: Add ASF Generative Tooling Guidance to PR template
ORC-1591: Lower log level from INFO ...

Read more