Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Add deserialization of Bytes -> Decimal #1534

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Aug 9, 2023

Arrow2 already has support for Parquet FixedLenByteArray -> Decimal conversion

This PR adds support for Parquet (variable-length) ByteArray -> Decimal conversion, re-using most of the logic from FixedLenByteArray conversion

@ritchie46
Copy link
Collaborator

This PR adds support for Parquet (variable-length) ByteArray

I don't understand. Why would decimal be encoded in variable length binary?

@jaychia jaychia force-pushed the jay/bytes-decimal-pr branch from 7c57f50 to 8e6d836 Compare September 5, 2023 20:54
@codecov
Copy link

codecov bot commented Sep 5, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.05% ⚠️

Comparison is base (87ab844) 83.02% compared to head (ab04856) 82.98%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1534      +/-   ##
==========================================
- Coverage   83.02%   82.98%   -0.05%     
==========================================
  Files         391      391              
  Lines       42786    42814      +28     
==========================================
+ Hits        35523    35529       +6     
- Misses       7263     7285      +22     
Files Changed Coverage Δ
src/io/parquet/read/deserialize/simple.rs 82.73% <0.00%> (-3.54%) ⬇️
src/io/parquet/read/schema/convert.rs 93.73% <0.00%> (-0.49%) ⬇️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jaychia
Copy link
Contributor Author

jaychia commented Sep 5, 2023

Hi @ritchie46, apologies for the late reply!

Going by the Parquet spec, decimals are actually able to be encoded as int32, int64, fixed_len_byte_array and binary.

See: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

binary: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.

@ariesdevil
Copy link
Contributor

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants