-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numeric improvement and fix #65
Conversation
8eaab00
to
da43062
Compare
> * `numeric(9 < P <= 18, S)` is represented as `INT64` with `DECIMAL` logical type | ||
> * `numeric(18 < P <= 38, S)` is represented as `FIXED_LEN_BYTE_ARRAY(9-16)` with `DECIMAL` logical type | ||
> * `numeric(38 < P, S)` is represented as `BYTE_ARRAY` with `STRING` logical type | ||
> * `numeric` is allowed by Postgres. (precision and scale not specified). These are represented by a default precision (38) and scale (16) instead of writing them as string. You get runtime error if your table tries to read or write a numeric value which is not allowed by the default precision and scale (22 integral digits before decimal point, 16 digits after decimal point). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is true, but just being explicit in my understanding: So effectively if you want numeric
with larger precision than 38 you need to explicitly define it there, it just will not end up being an actual stored numeric, but reads/writes will work to convert transparently from the Postgres side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. An example
pg_parquet=# create table test(x numeric(40,2));
CREATE TABLE
pg_parquet=# insert into test values (1.23);
INSERT 0 1
pg_parquet=# copy test to '/tmp/test.parquet';
COPY 1
pg_parquet=# select * from parquet.metadata('/tmp/test.parquet');
-[ RECORD 1 ]-----------+-------------------------
uri | /tmp/test.parquet
row_group_id | 0
row_group_num_rows | 1
row_group_num_columns | 1
row_group_bytes | 65
column_id | 0
file_offset | 0
num_values | 1
path_in_schema | x
type_name | BYTE_ARRAY
stats_null_count | 0
stats_distinct_count |
stats_min | 1.23
stats_max | 1.23
compression | SNAPPY
encodings | PLAIN,RLE,RLE_DICTIONARY
index_page_offset |
dictionary_page_offset | 4
data_page_offset | 28
total_compressed_size | 69
total_uncompressed_size | 65
pg_parquet=# copy test from '/tmp/test.parquet';
COPY 1
@@ -65,8 +65,8 @@ pub(crate) struct PgToArrowAttributeContext { | |||
is_geometry: bool, | |||
is_map: bool, | |||
attribute_contexts: Option<Vec<PgToArrowAttributeContext>>, | |||
scale: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the changes here for? Related to the arm
support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres returns scale and precision as i32
since negative scales are allowed.
But we adjust negative scale e.g. (5, -2) => (7, 0)
. Hence, we can make it u32
. (not usize since pgrx::Numeric<P, S> requires P and S to be u32)
parquet_dest.copy_options.row_group_size = row_group_size; | ||
parquet_dest.copy_options.row_group_size_bytes = row_group_size_bytes; | ||
parquet_dest.copy_options.compression = compression; | ||
parquet_dest.copy_options.compression_level = compression_level; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of this change, or just an indepedent refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for confusion, independent
25ff037
to
d97f8ad
Compare
**Problem** Previously, we were writing unbounded numerics, that does not specify precision and scale (i.e. `numeric`), as text since they can be too large to represent as parquet decimal. Most of the time users ignore the precision for numeric columns, so they were written as text. That prevented pushing down some operators on the numeric type by execution engines. **Improvement** We start to read/write unbounded numerics as numeric(38, 16) to parquet file. We throw a runtime error if an unbounded numeric value exceeds 22 digits before the decimal or 16 digits after the decimal. For the ones that bump into the error, we give hint to change the column type to a numeric(p,s) with precision and scale specified, to get rid of the error. **Fix** Arrow to pg conversions were not correct for some cases e.g. when there is no decimal point. These cases are fixed and covered by tests.
d97f8ad
to
62cf7e3
Compare
Problem
Previously, we were writing unbounded numerics, that does not specify precision and scale (i.e.
numeric
), as text since they can be too large to represent as parquet decimal. Most of the time users ignores the precision for numeric columns, so they were written as text. That prevents pushing down some operators on the numeric type by execution engines.Improvement
We start to read/write unbounded numerics as
numeric(38, 16)
to parquet file. We throw a runtime error if an unbounded numeric value exceeds22 digits
before the decimal point or16 digits
after the decimal point.For the users that bump into the error, we give hint to change the column type to a numeric(p,s) with precision and scale specified, to get rid of the error.
Fixes
Arrow to pg conversions were not correct for some cases
Decimal128Type::format_decimal
by arrow)abs(scale)
toprecision
and setting thescale
to 0, which meansnumeric(5,-2) => numeric(7,0)
at parquet file. copy from can convert it back to numeric(5,-2))These cases are fixed and covered by tests.