Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option "--split-tables-larger-than" is not being honored for tables with pk of int or when switching to use "ctid" #872

Open
shane-borden opened this issue Aug 23, 2024 · 0 comments

Comments

@shane-borden
Copy link

Attempting to use the --split-tables-larger-than and in the case of a table with a pk of int or when switching to use CTID, pgcopydb is not splitting the copy. For example, given the following table:

([email protected]:5432) [tpcc] > \d+ stock
                                                    Table "public.stock"
    Column    |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 s_i_id       | integer               |           | not null |         | plain    |             |              |
 s_w_id       | integer               |           | not null |         | plain    |             |              |
 s_ytd        | integer               |           | not null |         | plain    |             |              |
 s_quantity   | smallint              |           | not null |         | plain    |             |              |
 s_order_cnt  | smallint              |           | not null |         | plain    |             |              |
 s_remote_cnt | smallint              |           | not null |         | plain    |             |              |
 s_dist_01    | character(24)         |           | not null |         | extended |             |              |
 s_dist_02    | character(24)         |           | not null |         | extended |             |              |
 s_dist_03    | character(24)         |           | not null |         | extended |             |              |
 s_dist_04    | character(24)         |           | not null |         | extended |             |              |
 s_dist_05    | character(24)         |           | not null |         | extended |             |              |
 s_dist_06    | character(24)         |           | not null |         | extended |             |              |
 s_dist_07    | character(24)         |           | not null |         | extended |             |              |
 s_dist_08    | character(24)         |           | not null |         | extended |             |              |
 s_dist_09    | character(24)         |           | not null |         | extended |             |              |
 s_dist_10    | character(24)         |           | not null |         | extended |             |              |
 s_data       | character varying(50) |           | not null |         | extended |             |              |
Indexes:
    "stock_i1" PRIMARY KEY, btree (s_i_id, s_w_id)
Access method: heap

([email protected]:5432) [tpcc] > \dt+ stock
                                  List of relations
 Schema | Name  | Type  | Owner | Persistence | Access method |  Size  | Description
--------+-------+-------+-------+-------------+---------------+--------+-------------
 public | stock | table | tpcc  | permanent   | heap          | 108 GB |
(1 row)

And specifying the following option, no split occurs:

pgcopydb list table-parts --schema-name public --table-name order_line --split-tables-larger-than 50GB
14:33:52.344 1099240 INFO   Running pgcopydb version 0.17-1.pgdg110+1 from "/usr/bin/pgcopydb"
14:33:52.387 1099240 INFO   Using work dir "/home/shaneborden_google_com/pgcopy_tmp/pgcopydb"
14:33:52.388 1099240 INFO   Table public.order_line is 97 GB large which is larger than --split-tables-larger-than 50 GB, and does not have a unique column of type integer: splitting by CTID
14:33:52.388 1099240 INFO   Table public.order_line COPY will be split 0-ways
        Part |          Min |          Max |        Count
-------------+--------------+--------------+-------------

If I switch to use a table with a singular pk, it still does not work:

([email protected]:5432) [postgres] > \d+ uuid_mod_test
                                                      Table "public.uuid_mod_test"
     Column     |              Type              | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
----------------+--------------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 id             | integer                        |           | not null |         | plain    |             |              |
 pkid           | text                           |           |          |         | extended |             |              |
 value          | numeric                        |           |          |         | main     |             |              |
 product_id     | integer                        |           |          |         | plain    |             |              |
 effective_date | timestamp(3) without time zone |           |          |         | plain    |             |              |
Indexes:
    "uuid_mod_test_pkey" PRIMARY KEY, btree (id)
    "func_mod_uuid_idx" btree (mod(abs(hashtext(pkid)), 12))
Access method: heap

([email protected]:5432) [postgres] > \dt+ uuid_mod_test
                                       List of relations
 Schema |     Name      | Type  |  Owner   | Persistence | Access method |  Size  | Description
--------+---------------+-------+----------+-------------+---------------+--------+-------------
 public | uuid_mod_test | table | postgres | permanent   | heap          | 893 MB |
(1 row)

And utilizing the following command the split still will not work:

pgcopydb list table-parts --schema-name public --table-name uuid_mod_test --split-tables-larger-than 1MB
15:05:47.567 1099748 INFO   Running pgcopydb version 0.17-1.pgdg110+1 from "/usr/bin/pgcopydb"
15:05:47.609 1099748 INFO   Using work dir "/home/shaneborden_google_com/pgcopy_tmp/postgres/pgcopydb"
15:05:47.610 1099748 INFO   Table public.uuid_mod_test COPY will be split 0-ways
        Part |          Min |          Max |        Count
-------------+--------------+--------------+-------------

What is needed to split these tables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant