Releases: apache/orc
v2.0.3
Milestone
Branch
Bug Fix
- ORC-1796: [C++] Fix return wrong result if lack of has null
Test
- ORC-1680: Bump
bcpkix-jdk18on
to 1.78 - ORC-1702: Bump
bcpkix-jdk18on
to 1.78.1 - ORC-1756: Bump
snappy-java
to 1.1.10.6 inbench
module - ORC-1756: Upgrade
snappy-java
to 1.1.10.7 inbench
module - ORC-1770: Upgrade
parquet
to 1.14.2 inbench
module - ORC-1776: Remove
MacOS 12
from GitHub Action CI and docs - ORC-1778: Upgrade
Spark
to 4.0.0-preview2 inbench
module - ORC-1783: Add
MacOS 15
to GitHub Action MacOS CI and docs - ORC-1790: Upgrade
parquet
to 1.14.3 inbench
module - ORC-1800: Upgrade
bcpkix-jdk18on
to 1.79
Build and Dependency Changes
- ORC-1608: Upgrade
Hadoop
to 3.4.0 - ORC-1750: Bump
protobuf-java
to 3.25.4 - ORC-1769: Upgrade
zstd-jni
to 1.5.6-5 - ORC-1775: Upgrade
aircompressor
to 2.0.2 - ORC-1777: Bump
protobuf-java
to 3.25.5 - ORC-1781: Upgrade
zstd-jni
to 1.5.6-6 - ORC-1782: Upgrade
Hadoop
to 3.4.1 - ORC-1784: Upgrade
Maven
to 3.9.9 - ORC-1785: Upgrade
commons-csv
to 1.12.0 - ORC-1791: Remove
commons-lang3
dependency
v1.9.5
v1.8.8
v1.7.11
Milestone
Branch
Bug Fix
Test
- ORC-1540: Remove MacOS 11 from GitHub Action CI and docs
- ORC-1556: Add
Rocky Linux 9
Docker Test - ORC-1557: Add GitHub Action CI for
Docker Test
- ORC-1561: Remove Java11 and clang variants from
docker/os-list.txt
inbranch-1.7
- ORC-1578: Fix
SparkBenchmark
onsales
data according to SPARK-40918 - ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark
v2.0.2
Milestone
Branch
Improvements (tools)
- ORC-1724: JsonFileDump utility should print user metadata
- ORC-1740: Avoid the dump tool repeatedly parsing ColumnStatistics
- ORC-1742: Support print the id, name and type of each column in dump tool
Bug Fix
- ORC-1732: [C++] Fix detecting Homebrew-installed Protobuf on MacOS
- ORC-1733: [C++][CMake] Fix CMAKE_MODULE_PATH not to use PROJECT_SOURCE_DIR
- ORC-1738: [C++] Fix wrong Int128 maximum value
- ORC-1741: Respect decimal reader isRepeating flag
- ORC-1749: Fix
supportVectoredIO
for hadoop version string with optional patch labels - ORC-1751: [C++] fix syntax error in ThirdpartyToolchain
Test
- ORC-1694: Upgrade gson to 2.9.0 for Benchmarks Hive
- ORC-1697: Fix IllegalArgumentException when reading json timestamp type in benchmark
- ORC-1700: Write parquet decimal type data in Benchmark using
FIXED_LEN_BYTE_ARRAY
type - ORC-1743: Upgrade Spark to 4.0.0-preview1
- ORC-1744: Add
ubuntu-24.04
to GitHub Action - ORC-1746: Bump
netty-all
to 4.1.110.Final inbench
module - ORC-1752: Fix NumberFormatException when reading json timestamp type in benchmark
- ORC-1753: Use Avro 1.12.0 in
bench
module
Build and Dependency Changes
v1.9.4
Milestone
Changelog
BugFix
- ORC-1696 Fix ClassCastException when reading avro decimal type in bechmark
- ORC-1721 Upgrade
aircompressor
to 0.27 - ORC-1738 Wrong Int128 maximum value
Test
- ORC-1619 Add
MacOS 14
to GitHub Action - ORC-1699 Fix SparkBenchmark in Parquet format according to SPARK-40918
Task
- ORC-1540 Remove MacOS 11 from GitHub Action CI
v2.0.1
Milestone
Branch
Improvements (tools)
- ORC-1644: Add
merge
tool to merge multiple ORC files into a single ORC file - ORC-1647: Tips for supporting ORC in the
convert
command - ORC-1667: Add
check
tool to check the index of the specified column
Bug Fix
- ORC-1646: Close the reader when reading the schema with the
convert
command - ORC-1654: [C++] Count up EvaluatedRowGroupCount correctly
- ORC-1684: [C++] Find tzdb without TZDIR when in conda-environments
- ORC-1688: [C++] Do not access TZDB if there is no timestamp type
- ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark
Task
- ORC-1649:[C++][Conan] Add 2.0.0 to conan recipe and update release guide
- ORC-1669: [C++] Deprecate HDFS support
- ORC-1686: [C++] Avoid using std::filesystem
Test
- ORC-1648: Add test to convert ORC in the
convert
command - ORC-1663: [C++] Enable TestTimezone.testMissingTZDB on Windows
- ORC-1672: Remove test packages
o.a.o.tools.check
- ORC-1673: Remove test packages
o.a.o.tools.[count|merge|sizes]
- ORC-1676: Use Hive 4.0.0 in benchmark
- ORC-1681: Remove redundant import statement in tests to fix checkstyle failures
- ORC-1699: Fix SparkBenchmark in Parquet format according to SPARK-40918
- ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
- ORC-1707: Fix
sun.util.calendar
IllegalAccessException when SparkBenchmark runs on JDK17 - ORC-1708: Support data/compress options in Hive benchmark
Build and Dependency Changes
- ORC-1670: Upgrade
zstd-jni
to 1.5.6-1 - ORC-1679: Bump
zstd-jni
to 1.5.6-2 - ORC-1695: Upgrade gson to 2.10.1
- ORC-1698: Upgrade
commons-cli
to 1.7.0 - ORC-1705: Upgrade
zstd-jni
to 1.5.6-3 - ORC-1714: Bump commons-csv to 1.11.0
- ORC-1715: Bump org.objenesis:objenesis to 3.3
Documentation
- ORC-1668: Add
merge
command to Java tools documentation
v1.8.7
Milestone
Changelog
Bug
ORC-1528: Fix readBytes potential overflow in RecordReaderUtils.ChunkReader#create
ORC-1602: [C++] limit compression block size
Test
ORC-1556: Add Rocky Linux 9 Docker Test
ORC-1557: Add GitHub Action CI for Docker Test
ORC-1560: Remove Java11 and clang variants from docker/os-list.txt
in branch-1.8
ORC-1562: Bump guava
to 33.0.0-jre
ORC-1578: Fix SparkBenchmark
on sales data according to SPARK-40918
ORC-1621: Switch to oraclelinux9
from rocky9
Documentation
ORC-1536: Remove hive-storage-api
link from maven-javadoc-plugin
ORC-1563: Fix orc.bloom.filter.fpp
default value and orc.compress
notes of Spark and Hive config docs
v1.9.3
Milestone
Changelog
BugFix
- ORC-634 Fix the json output for double NaN and infinite
- ORC-1553 Reading information from Row group, where there are 0 records of SArg column
- ORC-1563 Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
- ORC-1578 Fix SparkBenchmark according to SPARK-40918
- ORC-1586 Fix IllegalAccessError when SparkBenchmark runs on JDK17
- ORC-1602 [C++] limit compression block size
- ORC-1607 Fix
testDoubleNaNAndInfinite
to useTestFileDump.checkOutput
- ORC-1609 Fix the compilation problem of TestJsonFileDump in branch 1.9
Test
- ORC-1556 Add
Rocky Linux 9
Docker Test - ORC-1557 Add GitHub Action CI for
Docker Test
- ORC-1559 Remove Java11 and clang variants from
docker/os-list.txt
frombranch-1.9
Task
- ORC-1532 Upgrade
opencsv
to 5.9 - ORC-1536 Remove
hive-storage-api
link frommaven-javadoc-plugin
- ORC-1576 Upgrade spark.jackson.version to 2.15.2 in bench module
- ORC-1591 Lower log level from INFO to DEBUG in *ReaderImpl/WriterImpl/PhysicalFsWriter
- ORC-1592 Suppress KeyProvider missing log
- ORC-1616 Upgrade
aircompressor
to 0.26 - ORC-1618 Disable building tests for snappy
Documentation:
- ORC-1535 Remove generated Java docs from source tree
v2.0.0
Milestone
Branch
This is a new major release which we cannot provide a changelog.
Summary of notable changes
ORC-1547: Spin-off ORC Format
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1507: Support Java 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1577: Use ZSTD as the default compression
ORC-1430: Use Hadoop 3.3.5 shaded clients
ORC-1456: Update Hadoop to 3.3.6
ORC-1251: Use Hadoop Vectored IO
ORC-1463: Support brotli codec
ORC-1100: Support vcpkg
ORC-1620: Add Apple Silicon Test Coverage
New Feature
ORC-998: Refactor compression output buffer within OutStream for better portability
ORC-1088: Suport ZSTD_JNI and columnn compress to set compression level
ORC-1100: Support vcpkg
ORC-1251: Use Hadoop Vectored IO
ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
ORC-1440: Check for protobuf config based module
ORC-1463: Support brotli codec
ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1531: Create orc-format module and repo
ORC-1545: Use orc-format 1.0.0-SNAPSHOT
ORC-1546: Use orc-format 1.0.0-alpha
ORC-1547: Spin-off ORC Format
ORC-1551: Use orc-format 1.0.0-beta
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1585: [C++] Add orc-format_ep as a dependency of orc
Improvement
ORC-1459: Mark DataBuffer::size() and DataBuffer::capacity() as const
ORC-1460: specification: Clarify how dictionary entries are sorted
ORC-1461: Mark Int128::getHighBits() and Int128::getLowBits() as const
ORC-1472: Replace deprecated method in TestMurmur3.java
ORC-1479: Enhance example usage message to use Uber jar
ORC-1481: [C++] Better error message when TZDB is unavailable
ORC-1504: Add lower bound check in get API for DynamicIntArray
ORC-1506: Replacing deprecated valueOf() with recommended forNumber()
ORC-1509: Auto grant contributor role to first-time contributors
ORC-1520: Remove JDK 8 settings from pom
ORC-1567: Add the -ignoreExtension
configuration to the sizes
and count
commands of orc-tools
ORC-1570: Add supportVectoredIO
API to HadoopShimsCurrent
and use it
ORC-1571: Supports displaying raw data size in the meta command of orc-tools
ORC-1577: Use ZSTD as the default compression
ORC-1580: Change default DataBuffer constructor to use reserve instead of resize
ORC-1595: Add a short-cut to skip tiny inputs for ZstdCodec.compress
ORC-1596: Remove redundant Zstd.isError
JNI usage
ORC-1597: Set bloom filter fpp to 1%
ORC-1600: Reduce getStaticMemoryManager sync block in OrcFile
ORC-1601: Reduce get HadoopShims sync block in HadoopShimsFactory
ORC-1610: Reduce the number of hash computation in CuckooSetBytes
ORC-1613: Zstd decompression supports direct buffer
ORC-1631: Supports summary output in sizes command
ORC-1637: [C++] Port conan recipe from upstream conan center
ORC-1638: Avoid System.exit(0) in count command
ORC-1639: [C++] Reduce unnecessary compiler flags in CMake
ORC-1641: Remove sourceFileExcludes
from maven-javadoc-plugin
ORC-1642: Avoid System.exit(0)
in scan
command
ORC-1593: Set orc.compression.zstd.level to 3 by default
Bug Fix
ORC-634: Fix the json output for double NaN and infinite
ORC-1455: [C++] Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
ORC-1473: Zero-copy zeroCopyReadRanges and releaseBuffer bugs
ORC-1476: Maven build fail with unsupported platform: protoc-3.17.3-osx-aarch_64.exe
ORC-1480: [C++] Build failed when the BUILD_CPP_ENABLE_METRICS is ON
ORC-1500: [C++] The partition field does not support English special characters
ORC-1528: When using the orc.min.disk.seek.size configuration to read extremely large ORC files, a java.nio.BufferOverflowException may occur.
ORC-1553: Reading information from Row group, where there are 0 records of SArg column
ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
ORC-1568: Use readDiskRanges
if orc.use.zerocopy
is enabled
ORC-1575: Use ASF Archive URL instead Download URL
ORC-1578: Fix SparkBenchmark according to SPARK-40918
ORC-1588: Fix incorrect Decimal assert in LeafFilterFactory
ORC-1602: [C++] limit compression block size
Task
ORC-1422: Setting version to 2.0.0-SNAPSHOT
ORC-1434: Remove org.apache.hadoop
from dependabot.yml
ORC-1484: Use JIRA_ACCESS_TOKEN in merge_orc_pr.py
ORC-1485: Enable checkstyle checks for test classes
ORC-1486: Fix checkstyle violations for tests in orc-core module
ORC-1492: Fix checkstyle violations for tests in mapreduce
, tools
, bench
modules
ORC-1496: Use iterator to suggest backporting branches
ORC-1515: Skip publishing orc-example module
ORC-1516: Fix minor typo in comments in IOUtils
ORC-1518: Remove findbugs folders
ORC-1529: Fix minor typos in pom.xml
ORC-1530: Rename variables in RecordReaderUtils.ChunkReader#create
ORC-1535: Remove generated Java docs from source tree
ORC-1536: Remove hive-storage-api
link from maven-javadoc-plugin
ORC-1540: Remove MacOS 11 from GitHub Action CI
ORC-1542: Use Pattern Matching for instanceof
(JEP-394)
ORC-1549: Update libhdfspp.tar.gz
by adding #include <cstdint>
ORC-1569: Remove HadoopShimsPre2_3, HadoopShimsPre2_6, HadoopShimsPre2_7 classes
ORC-1579: Add ASF Generative Tooling Guidance
to PR template
ORC-1591: Lower log level from INFO ...