From 1a24cd1eea6bdd2fa2fc802fda43c52cacf7a51d Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Tue, 4 Feb 2020 08:32:29 -0800 Subject: [PATCH 001/102] docs - resource group support of runaway query detection (#9508) * docs - resource group support of runaway query detection update GUC runaway_detector_activation_percent Add cross reference in --Admin Guide resource group memory management topic --CREATE RESOURCE GROUP parameter MEMORY_AUDITOR This will be backported to 5X)_STABLE * docs - minor edit * docs - review comment updates * docs - simplified description for resource groups --replaced requirement for vmtracker mem. auditor w/ admin_group, and default_group --Added global shared memory example from Simon * docs - created an Admin Guide section for resource group automatic query termination. * docs - fix math error --- .../admin_guide/workload_mgmt_resgroups.xml | 31 +++++++++- .../dita/ref_guide/config_params/guc-list.xml | 58 ++++++++++++++----- .../config_params/guc_category-list.xml | 4 ++ .../sql_commands/CREATE_RESOURCE_GROUP.xml | 13 ++++- 4 files changed, 88 insertions(+), 18 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/workload_mgmt_resgroups.xml b/gpdb-doc/dita/admin_guide/workload_mgmt_resgroups.xml index 0ba491273ebc..769a007bdeed 100644 --- a/gpdb-doc/dita/admin_guide/workload_mgmt_resgroups.xml +++ b/gpdb-doc/dita/admin_guide/workload_mgmt_resgroups.xml @@ -50,13 +50,14 @@
  • +
  • +
  • -
  • @@ -228,6 +229,14 @@ +

    + For queries managed by resource groups that are configured to use the + vmtracker memory auditor, Greenplum Database supports the automatic + termination of queries based on the amount of memory the queries are using. See the server + configuration parameter . +

    @@ -547,7 +556,7 @@ SET statement_mem='10 MB';

    - Using Resource Groups + Configuring and Using Resource Groups Significant Greenplum Database performance degradation has been observed when enabling resource group-based workload management on RedHat 6.x and CentOS 6.x @@ -822,6 +831,24 @@ gpstart

    + + Configuring Automatic Query Termination + +

    When resource groups have a global shared memory pool, the server configuration parameter + sets the percent of utilized global shared memory that + triggers the termination of queries that are managed by resource groups that are configured + to use the vmtracker memory auditor, such as admin_group + and default_group.

    +

    Resource groups have a global shared memory pool when the sum of the + MEMORY_LIMIT attribute values configured for all resource groups is less + than 100. For example, if you have 3 resource groups configured with + MEMORY_LIMIT values of 10 , 20, and 30, then global shared memory is 40% + = 100% - (10% + 20% + 30%).

    +

    For information about global shared memory, see .

    + +
    Assigning a Resource Group to a Role diff --git a/gpdb-doc/dita/ref_guide/config_params/guc-list.xml b/gpdb-doc/dita/ref_guide/config_params/guc-list.xml index f07eaeabd087..a200d758fcc8 100644 --- a/gpdb-doc/dita/ref_guide/config_params/guc-list.xml +++ b/gpdb-doc/dita/ref_guide/config_params/guc-list.xml @@ -4328,7 +4328,8 @@ gp_resource_manager

    Identifies the resource management scheme currently enabled in the Greenplum Database - cluster. The default scheme is to use resource queues.

    + cluster. The default scheme is to use resource queues. For information about Greenplum + Database resource management, see .

    @@ -8001,20 +8002,49 @@ runaway_detector_activation_percent - The runaway_detector_activation_percent server configuration parameter - is enforced only when resource queue-based resource management is active. -

    Sets the percentage of Greenplum Database vmem memory that triggers the termination of - queries. If the percentage of vmem memory that is utilized for a Greenplum Database segment - exceeds the specified value, Greenplum Database terminates queries based on memory usage, - starting with the query consuming the largest amount of memory. Queries are terminated until - the percentage of utilized vmem is below the specified percentage.

    +

    For queries that are managed by resource queues or resource groups, this parameter + determines when Greenplum Database terminates running queries based on the amount of memory + the queries are using. A value of 100 disables the automatic termination of queries based on + the percentage of memory that is utilized.

    +

    Either the resource queue or the resource group management scheme can be active in + Greenplum Database; both schemes cannot be active at the same time. The server configuration + parameter controls which scheme is + active.

    +

    When resource queues are enabled - This parameter sets the percent of utilized + Greenplum Database vmem memory that triggers the termination of queries. If the percentage + of vmem memory that is utilized for a Greenplum Database segment exceeds the specified + value, Greenplum Database terminates queries managed by resource queues based on memory + usage, starting with the query consuming the largest amount of memory. Queries are + terminated until the percentage of utilized vmem is below the specified percentage.

    Specify the maximum vmem value for active Greenplum Database segment instances with the - server configuration parameter .

    -

    For example, if vmem memory is set to 10GB, and the value of - runaway_detector_activation_percent is 90 (90%), Greenplum Database - starts terminating queries when the utilized vmem memory exceeds 9 GB.

    -

    A value of 0 disables the automatic termination of queries based on percentage of vmem that - is utilized.

    + server configuration parameter .

    +

    For example, if vmem memory is set to 10GB, and this parameter is 90 (90%), Greenplum + Database starts terminating queries when the utilized vmem memory exceeds 9 GB.

    +

    For information about resource queues, see .

    +

    When resource groups are enabled - This parameter sets the percent of utilized + resource group global shared memory that triggers the termination of queries that are + managed by resource groups that are configured to use the vmtracker memory + auditor, such as admin_group and default_group. For + information about memory auditors, see .

    +

    Resource groups have a global shared memory pool when the sum of the + MEMORY_LIMIT attribute values configured for all resource groups is less + than 100. For example, if you have 3 resource groups configured with + memory_limit values of 10 , 20, and 30, then global shared memory is 40% + = 100% - (10% + 20% + 30%). See .

    +

    If the percentage of utilized global shared memory exceeds the specified value, Greenplum + Database terminates queries based on memory usage, selecting from queries managed by the + resource groups that are configured to use the vmtracker memory auditor. + Greenplum Database starts with the query consuming the largest amount of memory. Queries are + terminated until the percentage of utilized global shared memory is below the specified + percentage.

    +

    For example, if global shared memory is 10GB, and this parameter is 90 (90%), Greenplum + Database starts terminating queries when the utilized global shared memory exceeds 9 GB.

    +

    For information about resource groups, see .

    diff --git a/gpdb-doc/dita/ref_guide/config_params/guc_category-list.xml b/gpdb-doc/dita/ref_guide/config_params/guc_category-list.xml index cbed481a38e0..9b3096289b64 100644 --- a/gpdb-doc/dita/ref_guide/config_params/guc_category-list.xml +++ b/gpdb-doc/dita/ref_guide/config_params/guc_category-list.xml @@ -1190,6 +1190,10 @@

    memory_spill_ratio

    +

    + runaway_detector_activation_percent +

    statement_mem

    diff --git a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_RESOURCE_GROUP.xml b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_RESOURCE_GROUP.xml index 08cc4068b97f..6f256c80565e 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_RESOURCE_GROUP.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_RESOURCE_GROUP.xml @@ -88,8 +88,17 @@ MEMORY_AUDITOR {vmtracker | cgroup} - The memory auditor for the resource group. Greenplum Database employs virtual memory tracking for role resources and cgroup memory tracking for resources used by external components. The default MEMORY_AUDITOR is vmtracker. When you create a resource group with vmtracker memory auditing, Greenplum Database tracks that resource group's memory internally. - When you create a resource group specifying the cgroup MEMORY_AUDITOR, Greenplum Database defers the accounting of memory used by that resource group to cgroups. CONCURRENCY must be zero (0) for a resource group that you create for external components such as PL/Container. You cannot assign a resource group that you create for external components to a Greenplum Database role. + The memory auditor for the resource group. Greenplum Database employs virtual memory + tracking for role resources and cgroup memory tracking for resources used by external + components. The default MEMORY_AUDITOR is vmtracker. + When you create a resource group with vmtracker memory auditing, + Greenplum Database tracks that resource group's memory internally. + When you create a resource group specifying the cgroup + MEMORY_AUDITOR, Greenplum Database defers the accounting of memory used + by that resource group to cgroups. CONCURRENCY must be zero (0) for a + resource group that you create for external components such as PL/Container. You cannot + assign a resource group that you create for external components to a Greenplum Database + role. From 3053682094f34dae0c80c2a6e229da2317453259 Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Tue, 4 Feb 2020 16:32:30 -0800 Subject: [PATCH 002/102] reorganize ddboost replication information. (#9520) * reorganize ddboost replication information. --move replication info. into separate topic. --update toc * docs - updated docs based on review comments. --created sections for gpbackup and gpbackup_manager --added link to example config. files. --- .../managing/backup-ddboost-plugin.xml | 53 +++---------------- .../dita/admin_guide/managing/backup.ditamap | 4 +- .../admin_guide/managing/replication-ddb.xml | 53 +++++++++++++++++++ 3 files changed, 64 insertions(+), 46 deletions(-) create mode 100644 gpdb-doc/dita/admin_guide/managing/replication-ddb.xml diff --git a/gpdb-doc/dita/admin_guide/managing/backup-ddboost-plugin.xml b/gpdb-doc/dita/admin_guide/managing/backup-ddboost-plugin.xml index 5e4ba4b75acb..b1d6201a289f 100644 --- a/gpdb-doc/dita/admin_guide/managing/backup-ddboost-plugin.xml +++ b/gpdb-doc/dita/admin_guide/managing/backup-ddboost-plugin.xml @@ -10,8 +10,8 @@ gprestore utilities to perform faster backups to the Dell EMC Data Domain storage appliance. You can also replicate a backup on a separate, remote Data Domain system for disaster recovery with - gpbackup or gpbackup_manager.

    + gpbackup or gpbackup_manager. + For information about replication, see .

    To use the DD Boost storage plugin application, you first create a configuration file to specify the location of the plugin, the DD Boost login, and the backup location. When you run gpbackup or gprestore, you specify the configuration file @@ -20,25 +20,12 @@

    If you perform a backup operation with the gpbackup option --plugin-config, you must also specify the --plugin-config option when you restore the backup with gprestore.

    -

    With gpbackup, you can replicate a backup set on a separate Data Domain - system for disaster recovery as part of a backup process. You add the backup replication - options to the configuration file. Set the replication option to - on and add the options that the plugin uses to access the remote Data - Domain system that stores the replicated backup.

    -

    With the gpbackup_manager replicate-backup command, you can replicate a - backup set that has been backed up by gpbackup to a Data Domain system. When - you run backup_manager replicate-backup, you specify a DD Boost configuration - file. The configuration file contains the same type of information that is in the - configuration file used to replicate a backup set with gpbackup.

    -

    To restore data from a replicated backup, you can use gprestore with the DD - Boost storage plugin and specify the location of the backup in the DD Boost configuration - file.

    -

    For information about replicating backups, see Notes.

    DD Boost Storage Plugin Configuration File Format

    The configuration file specifies the absolute path to the Greenplum Database DD Boost - storage plugin executable, DD Boost connection credentials, and Data Domain location.

    + storage plugin executable, DD Boost connection credentials, and Data Domain location. The + configuration file is required only on the master host. The DD Boost storage plugin + application must be in the same location on every Greenplum Database host.

    The DD Boost storage plugin configuration file uses the YAML 1.1 document format and implements its own schema for specifying the DD Boost information.

    @@ -131,8 +118,7 @@ is off. This option is ignored when you perform replication with the gpbackup_manager replicate-backup command. For information - about replication, see Notes. + about replication,see . replication-streams @@ -198,8 +184,8 @@
    -
    - Example +
    + Examples

    This is an example DD Boost storage plugin configuration file that is used in the next gpbackup example command. The name of the file is ddboost-test-config.yaml.

    @@ -249,35 +235,12 @@ options: license. Open source Greenplum Database cannot use the DD Boost software, but can back up to a Dell EMC Data Domain system mounted as an NFS share on the Greenplum master and segment hosts.

    -

    The DD Boost storage plugin application must be in the same location on every Greenplum - Database host. The configuration file is required only on the master host.

    When you perform a backup with the DD Boost storage plugin, the plugin stores the backup files in this location in the Data Domain storage unit.

    <directory>/backups/<datestamp>/<timestamp>

    Where <directory> is the location you specified in the DD Boost configuration file, and <datestamp> and <timestamp> are the backup date and time stamps.

    -

    You can replicate a backup created with gpbackup from the Data Domain - system where the backup is stored to a remote Data Domain system with - gpbackup or gpbackup_manager. Both methods require a DD - Boost configuration file that includes options that specify Data Domain system locations and - DD Boost configuration. There are some differences in how some of the options are - handled:

      -
    • When using gpbackup, the replication option must be - set to on. The replication-streams option is ignored, - the default value is used.
    • -
    • When using the gpbackup_manager replicate-backup command, the - replication option is ignored. The command always attempts to - replicate a backup.
    • -

    -

    When replicating a backup, the Data Domain system where the backup is stored must have - access to the remote Data Domain system where the replicated backup is stored.

    -

    When replicating a backup, the DD Boost storage plugin replicates the backup set on the - remote Data Domain system with DD Boost managed file replication.

    -

    Performing a backup operation with replication increases the time required to perform a - backup. The backup set is copied to the local Data Domain system, and then replicated on the - remote Data Domain system using DD Boost managed file replication. The backup operation - completes after the backup set is replicated on the remote system.

    diff --git a/gpdb-doc/dita/admin_guide/managing/backup.ditamap b/gpdb-doc/dita/admin_guide/managing/backup.ditamap index 7abea14df2e5..407fb36bdacb 100644 --- a/gpdb-doc/dita/admin_guide/managing/backup.ditamap +++ b/gpdb-doc/dita/admin_guide/managing/backup.ditamap @@ -21,7 +21,9 @@ - + + + diff --git a/gpdb-doc/dita/admin_guide/managing/replication-ddb.xml b/gpdb-doc/dita/admin_guide/managing/replication-ddb.xml new file mode 100644 index 000000000000..317c6662d320 --- /dev/null +++ b/gpdb-doc/dita/admin_guide/managing/replication-ddb.xml @@ -0,0 +1,53 @@ + + + + + Replicating Backups + +

    You can use gpbackup or gpbackup_manager + with the DD Boost storage plugin to replicate a backup from one Data Domain system to a + second, remote, Data Domain system for disaster recovery. You can replicate a backup as part + of the backup process, or replicate an existing backup set as a separate operation. Both + methods require a DD Boost configuration file that includes options that specify Data Domain system + locations and DD Boost configuration. The DD Boost storage plugin replicates the backup set + on the remote Data Domain system with DD Boost managed file replication.

    +

    When replicating a backup, the Data Domain system where the backup is stored must have + access to the remote Data Domain system where the replicated backup is stored.

    +

    To restore data from a replicated backup, use gprestore with the DD Boost + storage plugin. In the configuration file, specify the location of the backup in the DD + Boost configuration file.

    +

    For example configuration files, see in .

    +
    Replicate a Backup as Part of the Backup Process

    Use the + gpbackup utility to replicate a backup set as part of the backup + process.

    To enable replication during a back up, add the backup replication options + to the configuration file. Set the configuration file replication option + to on and add the options that the plugin uses to access the remote Data + Domain system that stores the replicated backup.

    When using + gpbackup, the replication option must be set to + on.

    The configuration file replication-streams + option is ignored, the default value is used.

    Performing a backup operation with + replication increases the time required to perform a backup. The backup set is copied to the + local Data Domain system, and then replicated on the remote Data Domain system using DD + Boost managed file replication. The backup operation completes after the backup set is + replicated on the remote system.
    +
    + Replicate an Existing Backup +

    Use the gpbackup_manager replicate-backup command to replicate an + existing backup set that is on a Data Domain system and was created by + gpbackup.

    +

    When you run backup_manager replicate-backup, specify a DD Boost + configuration file that contains the same type of information that is in the configuration + file used to replicate a backup set with gpbackup.

    +

    When using the gpbackup_manager replicate-backup command, the + configuration file replication option is ignored. The command always + attempts to replicate a back up.

    +
    + +
    +
    From dd64a0c7fd90f9f2a52c7abe27436ef857dbe1cf Mon Sep 17 00:00:00 2001 From: Shreedhar Hardikar Date: Tue, 14 Jan 2020 10:54:47 -0600 Subject: [PATCH 003/102] Support MCV based cardinality estimation for all text related types This commit enables the MCVs for text related types such as varchar, name etc to be passed to ORCA so that it can estimate the cardinalities for columns containing text related types. Prior to this commit, ORCA would estimate the cardinality to be 40% of the tuples which would cause mis-estimation for certain queries. Co-authored-by: Abhijit Subramanya Co-authored-by: Shreedhar Hardikar --- src/backend/gpopt/gpdbwrappers.cpp | 40 +++++++++++++++++++ .../translate/CTranslatorRelcacheToDXL.cpp | 2 + .../translate/CTranslatorScalarToDXL.cpp | 19 ++++++--- .../gpopt/translate/CTranslatorUtils.cpp | 2 + src/include/gpopt/gpdbwrappers.h | 6 +++ .../gpopt/translate/CTranslatorScalarToDXL.h | 2 +- src/include/utils/builtins.h | 2 + 7 files changed, 67 insertions(+), 6 deletions(-) diff --git a/src/backend/gpopt/gpdbwrappers.cpp b/src/backend/gpopt/gpdbwrappers.cpp index 56a1c1b5bb45..bee161f468a5 100644 --- a/src/backend/gpopt/gpdbwrappers.cpp +++ b/src/backend/gpopt/gpdbwrappers.cpp @@ -2643,6 +2643,26 @@ gpdb::IsCompositeType return false; } +bool +gpdb::IsTextRelatedType + ( + Oid typid + ) +{ + GP_WRAP_START; + { + /* catalog tables: pg_type */ + char typcategory; + bool typispreferred; + get_type_category_preferred(typid, &typcategory, &typispreferred); + + return typcategory == TYPCATEGORY_STRING; + } + GP_WRAP_END; + return false; +} + + int gpdb::GetIntFromValue ( @@ -3218,6 +3238,16 @@ gpdb::MakeGpPolicy GP_WRAP_END; } +uint32 +gpdb::HashChar(Datum d) +{ + GP_WRAP_START; + { + return DatumGetUInt32(DirectFunctionCall1(hashchar, d)); + } + GP_WRAP_END; +} + uint32 gpdb::HashBpChar(Datum d) { @@ -3238,6 +3268,16 @@ gpdb::HashText(Datum d) GP_WRAP_END; } +uint32 +gpdb::HashName(Datum d) +{ + GP_WRAP_START; + { + return DatumGetUInt32(DirectFunctionCall1(hashname, d)); + } + GP_WRAP_END; +} + uint32 gpdb::UUIDHash(Datum d) { diff --git a/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp b/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp index 3667395c1010..dea900a8c7fa 100644 --- a/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp +++ b/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp @@ -1618,6 +1618,7 @@ CTranslatorRelcacheToDXL::RetrieveType BOOL is_hashable = gpdb::IsOpHashJoinable(ptce->eq_opr, oid_type); BOOL is_merge_joinable = gpdb::IsOpMergeJoinable(ptce->eq_opr, oid_type); BOOL is_composite_type = gpdb::IsCompositeType(oid_type); + BOOL is_text_related_type = gpdb::IsTextRelatedType(oid_type); // get standard aggregates CMDIdGPDB *mdid_min = GPOS_NEW(mp) CMDIdGPDB(gpdb::GetAggregate("min", oid_type)); @@ -1666,6 +1667,7 @@ CTranslatorRelcacheToDXL::RetrieveType is_hashable, is_merge_joinable, is_composite_type, + is_text_related_type, mdid_type_relid, mdid_type_array, ptce->typlen diff --git a/src/backend/gpopt/translate/CTranslatorScalarToDXL.cpp b/src/backend/gpopt/translate/CTranslatorScalarToDXL.cpp index 844f63128adb..21da2abfd7f2 100644 --- a/src/backend/gpopt/translate/CTranslatorScalarToDXL.cpp +++ b/src/backend/gpopt/translate/CTranslatorScalarToDXL.cpp @@ -2175,12 +2175,12 @@ CTranslatorScalarToDXL::TranslateGenericDatumToDXL } LINT lint_value = 0; - if (CMDTypeGenericGPDB::HasByte2IntMapping(mdid)) + if (CMDTypeGenericGPDB::HasByte2IntMapping(md_type)) { - lint_value = ExtractLintValueFromDatum(mdid, is_null, bytes, length); + lint_value = ExtractLintValueFromDatum(md_type, is_null, bytes, length); } - return CMDTypeGenericGPDB::CreateDXLDatumVal(mp, mdid, type_modifier, is_null, bytes, length, lint_value, double_value); + return CMDTypeGenericGPDB::CreateDXLDatumVal(mp, mdid, md_type, type_modifier, is_null, bytes, length, lint_value, double_value); } @@ -2435,13 +2435,14 @@ CTranslatorScalarToDXL::ExtractByteArrayFromDatum LINT CTranslatorScalarToDXL::ExtractLintValueFromDatum ( - IMDId *mdid, + const IMDType *md_type, BOOL is_null, BYTE *bytes, ULONG length ) { - GPOS_ASSERT(CMDTypeGenericGPDB::HasByte2IntMapping(mdid)); + IMDId *mdid = md_type->MDId(); + GPOS_ASSERT(CMDTypeGenericGPDB::HasByte2IntMapping(md_type)); LINT lint_value = 0; if (is_null) @@ -2476,6 +2477,14 @@ CTranslatorScalarToDXL::ExtractLintValueFromDatum { hash = gpdb::HashBpChar((Datum) bytes); } + else if (mdid->Equals(&CMDIdGPDB::m_mdid_char)) + { + hash = gpdb::HashChar((Datum) bytes); + } + else if (mdid->Equals(&CMDIdGPDB::m_mdid_name)) + { + hash = gpdb::HashName((Datum) bytes); + } else { hash = gpdb::HashText((Datum) bytes); diff --git a/src/backend/gpopt/translate/CTranslatorUtils.cpp b/src/backend/gpopt/translate/CTranslatorUtils.cpp index 35f6e5c12311..574efe775e14 100644 --- a/src/backend/gpopt/translate/CTranslatorUtils.cpp +++ b/src/backend/gpopt/translate/CTranslatorUtils.cpp @@ -2379,10 +2379,12 @@ CTranslatorUtils::CreateDXLProjElemConstNULL } else { + const IMDType *md_type = md_accessor->RetrieveType(mdid); datum_dxl = CMDTypeGenericGPDB::CreateDXLDatumVal ( mp, mdid, + md_type, default_type_modifier, true /*fConstNull*/, NULL, /*pba */ diff --git a/src/include/gpopt/gpdbwrappers.h b/src/include/gpopt/gpdbwrappers.h index 1b7371218706..eb6a64889bc7 100644 --- a/src/include/gpopt/gpdbwrappers.h +++ b/src/include/gpopt/gpdbwrappers.h @@ -566,6 +566,8 @@ namespace gpdb { // check whether a type is composite bool IsCompositeType(Oid typid); + bool IsTextRelatedType(Oid typid); + // get integer value from an Integer value node int GetIntFromValue(Node *node); @@ -657,10 +659,14 @@ namespace gpdb { int numsegments); + uint32 HashChar(Datum d); + uint32 HashBpChar(Datum d); uint32 HashText(Datum d); + uint32 HashName(Datum d); + uint32 UUIDHash(Datum d); void * GPDBMemoryContextAlloc(MemoryContext context, Size size); diff --git a/src/include/gpopt/translate/CTranslatorScalarToDXL.h b/src/include/gpopt/translate/CTranslatorScalarToDXL.h index b63785e1618c..d389b89fb052 100644 --- a/src/include/gpopt/translate/CTranslatorScalarToDXL.h +++ b/src/include/gpopt/translate/CTranslatorScalarToDXL.h @@ -477,7 +477,7 @@ namespace gpdxl static LINT ExtractLintValueFromDatum ( - IMDId *mdid, + const IMDType *md_type, BOOL is_null, BYTE *bytes, ULONG len diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 52f3c9ca712c..640527cec3e0 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -795,12 +795,14 @@ extern Datum bpchar_larger(PG_FUNCTION_ARGS); extern Datum bpchar_smaller(PG_FUNCTION_ARGS); extern Datum bpcharlen(PG_FUNCTION_ARGS); extern Datum bpcharoctetlen(PG_FUNCTION_ARGS); +extern Datum hashchar(PG_FUNCTION_ARGS); extern Datum hashbpchar(PG_FUNCTION_ARGS); extern Datum bpchar_pattern_lt(PG_FUNCTION_ARGS); extern Datum bpchar_pattern_le(PG_FUNCTION_ARGS); extern Datum bpchar_pattern_gt(PG_FUNCTION_ARGS); extern Datum bpchar_pattern_ge(PG_FUNCTION_ARGS); extern Datum btbpchar_pattern_cmp(PG_FUNCTION_ARGS); +extern Datum hashname(PG_FUNCTION_ARGS); extern Datum hashtext(PG_FUNCTION_ARGS); extern Datum hashvarlena(PG_FUNCTION_ARGS); From 7c9bfc01156044307becc296917cc3c3856bc96e Mon Sep 17 00:00:00 2001 From: Abhijit Subramanya Date: Mon, 3 Feb 2020 16:38:10 -0800 Subject: [PATCH 004/102] Bump ORCA version to v3.90.0 --- concourse/tasks/compile_gpdb.yml | 2 +- config/orca.m4 | 4 ++-- configure | 4 ++-- depends/conanfile_orca.txt | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/concourse/tasks/compile_gpdb.yml b/concourse/tasks/compile_gpdb.yml index daf62314bac4..686178ddff10 100644 --- a/concourse/tasks/compile_gpdb.yml +++ b/concourse/tasks/compile_gpdb.yml @@ -19,5 +19,5 @@ params: BLD_TARGETS: OUTPUT_ARTIFACT_DIR: gpdb_artifacts CONFIGURE_FLAGS: - ORCA_TAG: v3.88.0 + ORCA_TAG: v3.90.0 RC_BUILD_TYPE_GCS: diff --git a/config/orca.m4 b/config/orca.m4 index 2f2f46f60404..2eb2195e674d 100644 --- a/config/orca.m4 +++ b/config/orca.m4 @@ -40,10 +40,10 @@ AC_RUN_IFELSE([AC_LANG_PROGRAM([[ #include ]], [ -return strncmp("3.88.", GPORCA_VERSION_STRING, 5); +return strncmp("3.90.", GPORCA_VERSION_STRING, 5); ])], [AC_MSG_RESULT([[ok]])], -[AC_MSG_ERROR([Your ORCA version is expected to be 3.88.XXX])] +[AC_MSG_ERROR([Your ORCA version is expected to be 3.90.XXX])] ) AC_LANG_POP([C++]) ])# PGAC_CHECK_ORCA_VERSION diff --git a/configure b/configure index fff493ff3f87..0324cee4fea6 100755 --- a/configure +++ b/configure @@ -14948,7 +14948,7 @@ int main () { -return strncmp("3.88.", GPORCA_VERSION_STRING, 5); +return strncmp("3.90.", GPORCA_VERSION_STRING, 5); ; return 0; @@ -14958,7 +14958,7 @@ if ac_fn_cxx_try_run "$LINENO"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 $as_echo "ok" >&6; } else - as_fn_error $? "Your ORCA version is expected to be 3.88.XXX" "$LINENO" 5 + as_fn_error $? "Your ORCA version is expected to be 3.90.XXX" "$LINENO" 5 fi rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ diff --git a/depends/conanfile_orca.txt b/depends/conanfile_orca.txt index 99a1cb07b3c4..46371b2a1212 100644 --- a/depends/conanfile_orca.txt +++ b/depends/conanfile_orca.txt @@ -1,5 +1,5 @@ [requires] -orca/v3.88.0@gpdb/stable +orca/v3.90.0@gpdb/stable [imports] include, * -> build/include From f98f833198f047728bc7b79372f824d281795473 Mon Sep 17 00:00:00 2001 From: Alexandra Wang Date: Mon, 3 Feb 2020 16:02:01 -0800 Subject: [PATCH 005/102] Avoid possibility of out-of-bound write for neededColumnContextWalker neededColumnContextWalker() is called to scan through VARs for targetlist, quals, etc.. It should only look at VARS for the table being scanned and avoid all other VARS. Currently, we are not aware of any plans which can produce situation where neededColumnContextWalker() will encounter some other VARs. But for GPDB5, we get OUTER vars here if Index scan is right tree for NestedLoop join. Hence, seems better to have the protective code to not write out-of-bound. Adds test to cover the scenario as well which is missing currently. Co-authored-by: Ashwin Agrawal Reviewed-by: Richard Guo Reviewed-by: Asim R P Co-authored-by: Alexandra Wang --- src/backend/executor/execQual.c | 9 ++-- src/test/regress/expected/indexjoin.out | 40 +++++++++++++++++- .../regress/expected/indexjoin_optimizer.out | 41 ++++++++++++++++++- src/test/regress/sql/indexjoin.sql | 23 ++++++++++- 4 files changed, 106 insertions(+), 7 deletions(-) diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index 004998c41f8c..94a4db9b1c37 100644 --- a/src/backend/executor/execQual.c +++ b/src/backend/executor/execQual.c @@ -6676,11 +6676,12 @@ neededColumnContextWalker(Node *node, neededColumnContext *c) { Var *var = (Var *)node; - if (var->varattno > 0) - { - Assert(var->varattno <= c->n); + if (IS_SPECIAL_VARNO(var->varno)) + return false; + + if (var->varattno > 0 && var->varattno <= c->n) c->mask[var->varattno - 1] = true; - } + /* * If all attributes are included, * set all entries in mask to true. diff --git a/src/test/regress/expected/indexjoin.out b/src/test/regress/expected/indexjoin.out index 38ae08bd0e3b..b3fb1fa74c8f 100644 --- a/src/test/regress/expected/indexjoin.out +++ b/src/test/regress/expected/indexjoin.out @@ -132,4 +132,42 @@ ORDER BY 1 asc ; 201011261320 | 3 (8 rows) -set optimizer_enable_hashjoin = on; +-- Test Index Scan on CO table as the right tree of a NestLoop join. +create table no_index_table(fake_col1 int, fake_col2 int, fake_col3 int, a int, b int) distributed by (a, b); +insert into no_index_table values (1,1,1,1,1); +create table with_index_table(x int, y int) with (appendonly=true, orientation=column); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'x' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +create index with_index_table_index on with_index_table (x); +insert into with_index_table select i, 1 from generate_series(1, 20)i; +set enable_material to off; +set enable_seqscan to off; +set enable_mergejoin to off; +set enable_hashjoin to off; +set enable_nestloop to on; +set optimizer_enable_materialize to off; +set optimizer_enable_hashjoin to off; +explain (costs off) +SELECT * from with_index_table td JOIN no_index_table ro ON td.y = ro.a AND td.x = ro.b; + QUERY PLAN +--------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + -> Nested Loop + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: ro.b + -> Seq Scan on no_index_table ro + -> Bitmap Heap Scan on with_index_table td + Recheck Cond: (x = ro.b) + Filter: (ro.a = y) + -> Bitmap Index Scan on with_index_table_index + Index Cond: (x = ro.b) + Optimizer: Postgres query optimizer +(11 rows) + +SELECT * from with_index_table td JOIN no_index_table ro ON td.y = ro.a AND td.x = ro.b; + x | y | fake_col1 | fake_col2 | fake_col3 | a | b +---+---+-----------+-----------+-----------+---+--- + 1 | 1 | 1 | 1 | 1 | 1 | 1 +(1 row) + +reset all; diff --git a/src/test/regress/expected/indexjoin_optimizer.out b/src/test/regress/expected/indexjoin_optimizer.out index ec9fbdece482..fccda453e3c6 100644 --- a/src/test/regress/expected/indexjoin_optimizer.out +++ b/src/test/regress/expected/indexjoin_optimizer.out @@ -136,4 +136,43 @@ ORDER BY 1 asc ; 201011261320 | 3 (8 rows) -set optimizer_enable_hashjoin = on; +-- Test Index Scan on CO table as the right tree of a NestLoop join. +create table no_index_table(fake_col1 int, fake_col2 int, fake_col3 int, a int, b int) distributed by (a, b); +insert into no_index_table values (1,1,1,1,1); +create table with_index_table(x int, y int) with (appendonly=true, orientation=column); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'x' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +create index with_index_table_index on with_index_table (x); +insert into with_index_table select i, 1 from generate_series(1, 20)i; +set enable_material to off; +set enable_seqscan to off; +set enable_mergejoin to off; +set enable_hashjoin to off; +set enable_nestloop to on; +set optimizer_enable_materialize to off; +set optimizer_enable_hashjoin to off; +explain (costs off) +SELECT * from with_index_table td JOIN no_index_table ro ON td.y = ro.a AND td.x = ro.b; + QUERY PLAN +--------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + -> Nested Loop + Join Filter: true + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: no_index_table.b + -> Seq Scan on no_index_table + -> Bitmap Heap Scan on with_index_table + Recheck Cond: (x = no_index_table.b) + Filter: (y = no_index_table.a) + -> Bitmap Index Scan on with_index_table_index + Index Cond: (x = no_index_table.b) + Optimizer: Pivotal Optimizer (GPORCA) version 3.90.0 +(12 rows) + +SELECT * from with_index_table td JOIN no_index_table ro ON td.y = ro.a AND td.x = ro.b; + x | y | fake_col1 | fake_col2 | fake_col3 | a | b +---+---+-----------+-----------+-----------+---+--- + 1 | 1 | 1 | 1 | 1 | 1 | 1 +(1 row) + +reset all; diff --git a/src/test/regress/sql/indexjoin.sql b/src/test/regress/sql/indexjoin.sql index a99d9944c32c..ef612f3901bd 100644 --- a/src/test/regress/sql/indexjoin.sql +++ b/src/test/regress/sql/indexjoin.sql @@ -2528,4 +2528,25 @@ WHERE tq.sym = tt.symbol AND GROUP BY 1 ORDER BY 1 asc ; -set optimizer_enable_hashjoin = on; +-- Test Index Scan on CO table as the right tree of a NestLoop join. +create table no_index_table(fake_col1 int, fake_col2 int, fake_col3 int, a int, b int) distributed by (a, b); +insert into no_index_table values (1,1,1,1,1); + +create table with_index_table(x int, y int) with (appendonly=true, orientation=column); +create index with_index_table_index on with_index_table (x); +insert into with_index_table select i, 1 from generate_series(1, 20)i; + +set enable_material to off; +set enable_seqscan to off; +set enable_mergejoin to off; +set enable_hashjoin to off; +set enable_nestloop to on; + +set optimizer_enable_materialize to off; +set optimizer_enable_hashjoin to off; + +explain (costs off) +SELECT * from with_index_table td JOIN no_index_table ro ON td.y = ro.a AND td.x = ro.b; +SELECT * from with_index_table td JOIN no_index_table ro ON td.y = ro.a AND td.x = ro.b; + +reset all; From ae9dd076f900c857f27d7388c887ed673671c057 Mon Sep 17 00:00:00 2001 From: David Yozie Date: Wed, 5 Feb 2020 11:27:24 -0800 Subject: [PATCH 006/102] Docs - change versioning form 6.3 -> 6.4 --- gpdb-doc/book/config.yml | 20 ++-- .../source/subnavs/cloud-subnav.erb | 4 +- .../source/subnavs/gpdb-landing-subnav.erb | 18 +-- .../source/subnavs/pxf-subnav.erb | 108 +++++++++--------- gpdb-doc/book/redirects.rb | 8 +- gpdb-doc/dita/admin_guide/admin_guide.ditamap | 8 +- .../best_practices/best-practices.ditamap | 8 +- .../dita/install_guide/install_guide.ditamap | 4 +- gpdb-doc/dita/ref_guide/ref_guide.ditamap | 8 +- .../security-guide/security-guide.ditamap | 4 +- .../dita/utility_guide/utility_guide.ditamap | 8 +- 11 files changed, 99 insertions(+), 99 deletions(-) diff --git a/gpdb-doc/book/config.yml b/gpdb-doc/book/config.yml index ae45be65d3bd..66637d999909 100644 --- a/gpdb-doc/book/config.yml +++ b/gpdb-doc/book/config.yml @@ -6,19 +6,19 @@ sections: - repository: name: markdown at_path: common - directory: 6-3/common + directory: 6-4/common subnav_template: gpdb-landing-subnav - repository: name: markdown at_path: pxf - directory: 6-3/pxf + directory: 6-4/pxf subnav_template: pxf-subnav - repository: name: markdown at_path: cloud - directory: 6-3/cloud + directory: 6-4/cloud subnav_template: cloud-subnav @@ -26,7 +26,7 @@ dita_sections: - repository: name: dita at_path: install_guide - directory: 6-3/install_guide + directory: 6-4/install_guide ditamap_location: install_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval @@ -34,7 +34,7 @@ dita_sections: - repository: name: dita at_path: analytics - directory: 6-3/analytics + directory: 6-4/analytics ditamap_location: analytics.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval @@ -42,35 +42,35 @@ dita_sections: - repository: name: dita at_path: admin_guide - directory: 6-3/admin_guide + directory: 6-4/admin_guide ditamap_location: admin_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: security-guide - directory: 6-3/security-guide + directory: 6-4/security-guide ditamap_location: security-guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: best_practices - directory: 6-3/best_practices + directory: 6-4/best_practices ditamap_location: best-practices.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: utility_guide - directory: 6-3/utility_guide + directory: 6-4/utility_guide ditamap_location: utility_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: ref_guide - directory: 6-3/ref_guide + directory: 6-4/ref_guide ditamap_location: ref_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval diff --git a/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb b/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb index fda26fd23d45..972d56ddab12 100644 --- a/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb +++ b/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb @@ -3,10 +3,10 @@ diff --git a/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb b/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb index a3195108f547..f00f749ed9e3 100644 --- a/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb +++ b/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb @@ -3,32 +3,32 @@ diff --git a/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb b/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb index 2619d44c6d4f..85dee6aa8b7b 100644 --- a/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb +++ b/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb @@ -3,114 +3,114 @@ diff --git a/gpdb-doc/book/redirects.rb b/gpdb-doc/book/redirects.rb index 290ec9496fbb..a88f8df2f326 100644 --- a/gpdb-doc/book/redirects.rb +++ b/gpdb-doc/book/redirects.rb @@ -1,4 +1,4 @@ -r301 '/', '/6-3/common/gpdb-features.html' -r301 '/index.html', '/6-3/common/gpdb-features.html' -r301 '/6-3/index.html', '/6-3/common/gpdb-features.html' -r301 %r{(.*)/homenav.html}, '/6-3/common/gpdb-features.html' \ No newline at end of file +r301 '/', '/6-4/common/gpdb-features.html' +r301 '/index.html', '/6-4/common/gpdb-features.html' +r301 '/6-4/index.html', '/6-4/common/gpdb-features.html' +r301 %r{(.*)/homenav.html}, '/6-4/common/gpdb-features.html' \ No newline at end of file diff --git a/gpdb-doc/dita/admin_guide/admin_guide.ditamap b/gpdb-doc/dita/admin_guide/admin_guide.ditamap index 56ec19654600..5529e9793dfa 100644 --- a/gpdb-doc/dita/admin_guide/admin_guide.ditamap +++ b/gpdb-doc/dita/admin_guide/admin_guide.ditamap @@ -1,10 +1,10 @@ - - + + diff --git a/gpdb-doc/dita/best_practices/best-practices.ditamap b/gpdb-doc/dita/best_practices/best-practices.ditamap index 8225fd72f362..20b8a607b51e 100644 --- a/gpdb-doc/dita/best_practices/best-practices.ditamap +++ b/gpdb-doc/dita/best_practices/best-practices.ditamap @@ -2,10 +2,10 @@ Greenplum Database Best Practices - - + + diff --git a/gpdb-doc/dita/install_guide/install_guide.ditamap b/gpdb-doc/dita/install_guide/install_guide.ditamap index 8b07aacf3e9d..bbd46c7aa36e 100644 --- a/gpdb-doc/dita/install_guide/install_guide.ditamap +++ b/gpdb-doc/dita/install_guide/install_guide.ditamap @@ -1,8 +1,8 @@ - + diff --git a/gpdb-doc/dita/ref_guide/ref_guide.ditamap b/gpdb-doc/dita/ref_guide/ref_guide.ditamap index e8f55ba895e3..6a8278a14ca8 100644 --- a/gpdb-doc/dita/ref_guide/ref_guide.ditamap +++ b/gpdb-doc/dita/ref_guide/ref_guide.ditamap @@ -1,10 +1,10 @@ - - + + diff --git a/gpdb-doc/dita/security-guide/security-guide.ditamap b/gpdb-doc/dita/security-guide/security-guide.ditamap index 50ba3a667b85..03046c731792 100644 --- a/gpdb-doc/dita/security-guide/security-guide.ditamap +++ b/gpdb-doc/dita/security-guide/security-guide.ditamap @@ -1,9 +1,9 @@ Security Configuration Guide - - diff --git a/gpdb-doc/dita/utility_guide/utility_guide.ditamap b/gpdb-doc/dita/utility_guide/utility_guide.ditamap index 70f973291e70..16eaab4bda9e 100644 --- a/gpdb-doc/dita/utility_guide/utility_guide.ditamap +++ b/gpdb-doc/dita/utility_guide/utility_guide.ditamap @@ -1,10 +1,10 @@ - - + + From e459e34e4529a0ea0a538e77c78948e36277ff5f Mon Sep 17 00:00:00 2001 From: Hubert Zhang Date: Thu, 6 Feb 2020 14:22:49 +0800 Subject: [PATCH 007/102] Skip column acl check in gp_acquire_sample_rows Using 'select pg_catalog.gp_acquire_sample_rows(...)' instead of 'select * from pg_catalog.gp_acquire_sample_rows(...) as (...)' to avoid specify columns in function return value explicitly. The old style requires USAGE privilege on each columns which is not consistent with GPDB 5X. The following SQL failed to pass acl check in master now: revoke all on schema public from public; create role gmuser1; grant create on schema public to gmuser1; create extension citext; create table testid (id int , test citext); alter table testid owner to gmuser1; analyze testid; Idea from Ashwin Agrawal Idea from Taylor Vesely Reviewed-by: Zhenghua Lyu (cherry picked from commit e33727360d529c517e829d98ef46912486911c77) --- src/backend/commands/analyze.c | 259 +++++++++++++++--- src/test/regress/expected/analyze.out | 18 ++ .../regress/expected/incremental_analyze.out | 2 +- src/test/regress/sql/analyze.sql | 19 ++ 4 files changed, 259 insertions(+), 39 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 18775aed935a..0eae28d74184 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -115,6 +115,7 @@ #include "utils/syscache.h" #include "utils/timestamp.h" #include "utils/tqual.h" +#include "utils/typcache.h" #include "catalog/heap.h" #include "cdb/cdbappendonlyam.h" @@ -152,6 +153,9 @@ typedef struct typedef BlockSamplerData *BlockSampler; +/* Fix attr number of return record of function gp_acquire_sample_rows */ +#define FIX_ATTR_NUM 3 + /* Per-index data for ANALYZE */ typedef struct AnlIndexData { @@ -2341,6 +2345,154 @@ acquire_index_number_of_blocks(Relation indexrel, Relation tablerel) } } +/* + * parse_record_to_string + * + * CDB: a copy of record_in, but only parse the record string + * into separate strs for each column. + */ +static void +parse_record_to_string(char *string, TupleDesc tupdesc, char** values, bool *nulls) +{ + char *ptr; + int ncolumns; + int i; + bool needComma; + StringInfoData buf; + + Assert(string != NULL); + Assert(values != NULL); + Assert(nulls != NULL); + + ncolumns = tupdesc->natts; + needComma = false; + + /* + * Scan the string. We use "buf" to accumulate the de-quoted data for + * each column, which is then fed to the appropriate input converter. + */ + ptr = string; + + /* Allow leading whitespace */ + while (*ptr && isspace((unsigned char) *ptr)) + ptr++; + if (*ptr++ != '(') + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("malformed record literal: \"%s\"", string), + errdetail("Missing left parenthesis."))); + } + + initStringInfo(&buf); + + for (i = 0; i < ncolumns; i++) + { + /* Ignore dropped columns in datatype, but fill with nulls */ + if (tupdesc->attrs[i]->attisdropped) + { + values[i] = NULL; + nulls[i] = true; + continue; + } + + if (needComma) + { + /* Skip comma that separates prior field from this one */ + if (*ptr == ',') + ptr++; + else + { + /* *ptr must be ')' */ + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("malformed record literal: \"%s\"", string), + errdetail("Too few columns."))); + } + } + + /* Check for null: completely empty input means null */ + if (*ptr == ',' || *ptr == ')') + { + values[i] = NULL; + nulls[i] = true; + } + else + { + /* Extract string for this column */ + bool inquote = false; + + resetStringInfo(&buf); + while (inquote || !(*ptr == ',' || *ptr == ')')) + { + char ch = *ptr++; + + if (ch == '\0') + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("malformed record literal: \"%s\"", + string), + errdetail("Unexpected end of input."))); + } + if (ch == '\\') + { + if (*ptr == '\0') + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("malformed record literal: \"%s\"", + string), + errdetail("Unexpected end of input."))); + } + appendStringInfoChar(&buf, *ptr++); + } + else if (ch == '"') + { + if (!inquote) + inquote = true; + else if (*ptr == '"') + { + /* doubled quote within quote sequence */ + appendStringInfoChar(&buf, *ptr++); + } + else + inquote = false; + } + else + appendStringInfoChar(&buf, ch); + } + + values[i] = palloc(strlen(buf.data) + 1); + memcpy(values[i], buf.data, strlen(buf.data) + 1); + nulls[i] = false; + } + + /* + * Prep for next column + */ + needComma = true; + } + + if (*ptr++ != ')') + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("malformed record literal: \"%s\"", string), + errdetail("Too many columns."))); + } + /* Allow trailing whitespace */ + while (*ptr && isspace((unsigned char) *ptr)) + ptr++; + if (*ptr) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("malformed record literal: \"%s\"", string), + errdetail("Junk after right parenthesis."))); + } +} + /* * Collect a sample from segments. * @@ -2358,15 +2510,20 @@ acquire_sample_rows_dispatcher(Relation onerel, bool inh, int elevel, */ Bitmapset **colLargeRowIndexes = acquire_func_colLargeRowIndexes; TupleDesc relDesc = RelationGetDescr(onerel); - TupleDesc newDesc; + TupleDesc funcTupleDesc; + TupleDesc sampleTupleDesc; AttInMetadata *attinmeta; StringInfoData str; int sampleTuples; /* 32 bit - assume that number of tuples will not > 2B */ - char **values; + char **funcRetValues; + bool *funcRetNulls; + char **values; int numLiveColumns; int perseg_targrows; + int ncolumns; CdbPgResults cdb_pgresults = {NULL, 0}; int i; + int index = 0; Assert(targrows > 0.0); @@ -2406,35 +2563,18 @@ acquire_sample_rows_dispatcher(Relation onerel, bool inh, int elevel, /* * Construct SQL command to dispatch to segments. + * + * Did not use 'select * from pg_catalog.gp_acquire_sample_rows(...) as (..);' + * here. Because it requires to specify columns explicitly which leads to + * permission check on each columns. This is not consistent with GPDB5 and + * may result in different behaviour under different acl configuration. */ initStringInfo(&str); - appendStringInfo(&str, "select * from pg_catalog.gp_acquire_sample_rows(%u, %d, '%s')", + appendStringInfo(&str, "select pg_catalog.gp_acquire_sample_rows(%u, %d, '%s');", RelationGetRelid(onerel), perseg_targrows, inh ? "t" : "f"); - /* special columns */ - appendStringInfoString(&str, " as ("); - appendStringInfoString(&str, "totalrows pg_catalog.float8, "); - appendStringInfoString(&str, "totaldeadrows pg_catalog.float8, "); - appendStringInfoString(&str, "oversized_cols_bitmap pg_catalog.text"); - - /* table columns */ - for (i = 0; i < relDesc->natts; i++) - { - Form_pg_attribute attr = relDesc->attrs[i]; - Oid typid = gp_acquire_sample_rows_col_type(attr->atttypid); - - if (attr->attisdropped) - continue; - - appendStringInfo(&str, ", %s %s", - quote_identifier(NameStr(attr->attname)), - format_type_be(typid)); - } - - appendStringInfoString(&str, ")"); - /* * Execute it. */ @@ -2446,21 +2586,46 @@ acquire_sample_rows_dispatcher(Relation onerel, bool inh, int elevel, * * Some datatypes need special treatment, so we cannot use the relation's * original tupledesc. + * + * Also create tupledesc of return record of function gp_acquire_sample_rows. */ - newDesc = CreateTupleDescCopy(relDesc); + sampleTupleDesc = CreateTupleDescCopy(relDesc); + ncolumns = numLiveColumns + FIX_ATTR_NUM; + + funcTupleDesc = CreateTemplateTupleDesc(ncolumns, false); + TupleDescInitEntry(funcTupleDesc, (AttrNumber) 1, "", FLOAT8OID, -1, 0); + TupleDescInitEntry(funcTupleDesc, (AttrNumber) 2, "", FLOAT8OID, -1, 0); + TupleDescInitEntry(funcTupleDesc, (AttrNumber) 3, "", TEXTOID, -1, 0); + for (i = 0; i < relDesc->natts; i++) { Form_pg_attribute attr = relDesc->attrs[i]; + Oid typid = gp_acquire_sample_rows_col_type(attr->atttypid); - newDesc->attrs[i]->atttypid = typid; + sampleTupleDesc->attrs[i]->atttypid = typid; + + if (!attr->attisdropped) + { + TupleDescInitEntry(funcTupleDesc, (AttrNumber) 4 + index, "", + typid, attr->atttypmod, attr->attndims); + + index++; + } } - attinmeta = TupleDescGetAttInMetadata(newDesc); + + /* For RECORD results, make sure a typmod has been assigned */ + Assert(funcTupleDesc->tdtypeid == RECORDOID && funcTupleDesc->tdtypmod < 0); + assign_record_type_typmod(funcTupleDesc); + + attinmeta = TupleDescGetAttInMetadata(sampleTupleDesc); /* * Read the result set from each segment. Gather the sample rows *rows, * and sum up the summary rows for grand 'totalrows' and 'totaldeadrows'. */ + funcRetValues = (char **) palloc0(funcTupleDesc->natts * sizeof(char *)); + funcRetNulls = (bool *) palloc(funcTupleDesc->natts * sizeof(bool)); values = (char **) palloc0(relDesc->natts * sizeof(char *)); sampleTuples = 0; *totalrows = 0; @@ -2495,32 +2660,42 @@ acquire_sample_rows_dispatcher(Relation onerel, bool inh, int elevel, for (int rowno = 0; rowno < PQntuples(pgresult); rowno++) { - if (!PQgetisnull(pgresult, rowno, 0)) + /* + * We cannot use record_in function to get row record here. + * Since the result row may contain just the totalrows info where the data columns + * are NULLs. Consider domain: 'create domain dnotnull varchar(15) NOT NULL;' + * NULLs are not allowed in data columns. + */ + char * rowStr = PQgetvalue(pgresult, rowno, 0); + + if (rowStr == NULL) + elog(ERROR, "got NULL pointer from return value of gp_acquire_sample_rows"); + + parse_record_to_string(rowStr, funcTupleDesc, funcRetValues, funcRetNulls); + + if (!funcRetNulls[0]) { /* This is a summary row. */ if (got_summary) elog(ERROR, "got duplicate summary row from gp_acquire_sample_rows"); this_totalrows = DatumGetFloat8(DirectFunctionCall1(float8in, - CStringGetDatum(PQgetvalue(pgresult, rowno, 0)))); + CStringGetDatum(funcRetValues[0]))); this_totaldeadrows = DatumGetFloat8(DirectFunctionCall1(float8in, - CStringGetDatum(PQgetvalue(pgresult, rowno, 1)))); + CStringGetDatum(funcRetValues[1]))); got_summary = true; } else { /* This is a sample row. */ - int index; - if (sampleTuples >= targrows) elog(ERROR, "too many sample rows received from gp_acquire_sample_rows"); /* Read the 'toolarge' bitmap, if any */ - if (colLargeRowIndexes && !PQgetisnull(pgresult, rowno, 2)) + if (colLargeRowIndexes && !funcRetNulls[2]) { char *toolarge; - - toolarge = PQgetvalue(pgresult, rowno, 2); + toolarge = funcRetValues[2]; if (strlen(toolarge) != numLiveColumns) elog(ERROR, "'toolarge' bitmap has incorrect length"); @@ -2547,10 +2722,10 @@ acquire_sample_rows_dispatcher(Relation onerel, bool inh, int elevel, if (attr->attisdropped) continue; - if (PQgetisnull(pgresult, rowno, 3 + index)) + if (funcRetNulls[3 + index]) values[i] = NULL; else - values[i] = PQgetvalue(pgresult, rowno, 3 + index); + values[i] = funcRetValues[3 + index]; index++; /* Move index to the next result set attribute */ } @@ -2581,6 +2756,14 @@ acquire_sample_rows_dispatcher(Relation onerel, bool inh, int elevel, (*totalrows) += this_totalrows; (*totaldeadrows) += this_totaldeadrows; } + for (i = 0; i < funcTupleDesc->natts; i++) + { + if (funcRetValues[i]) + pfree(funcRetValues[i]); + } + pfree(funcRetValues); + pfree(funcRetNulls); + pfree(values); cdbdisp_clearCdbPgResults(&cdb_pgresults); diff --git a/src/test/regress/expected/analyze.out b/src/test/regress/expected/analyze.out index 53c02c636188..08f7f8476638 100644 --- a/src/test/regress/expected/analyze.out +++ b/src/test/regress/expected/analyze.out @@ -918,6 +918,10 @@ select relname, reltuples from pg_class where relname like 'aocs_analyze_test%' (2 rows) reset default_statistics_target; +-- Test column name called totalrows +create table test_tr (totalrows int4); +analyze test_tr; +drop table test_tr; -- -- Test with both a dropped column and an oversized column -- (github issue https://github.com/greenplum-db/gpdb/issues/9503) @@ -934,3 +938,17 @@ select attname, null_frac, avg_width, n_distinct from pg_stats where tablename = d | 0 | 5 | -1 (3 rows) +-- Test analyze without USAGE privilege on schema +create schema test_ns; +revoke all on schema test_ns from public; +create role nsuser1; +grant create on schema test_ns to nsuser1; +set search_path to 'test_ns'; +create extension citext; +create table testid (id int , test citext); +alter table testid owner to nsuser1; +analyze testid; +drop table testid; +drop extension citext; +drop schema test_ns; +drop role nsuser1; diff --git a/src/test/regress/expected/incremental_analyze.out b/src/test/regress/expected/incremental_analyze.out index cc856d59ca4d..5853fa474d3b 100644 --- a/src/test/regress/expected/incremental_analyze.out +++ b/src/test/regress/expected/incremental_analyze.out @@ -1696,7 +1696,7 @@ INSERT INTO foo SELECT i, i%9, i%100 FROM generate_series(1,500)i; ANALYZE VERBOSE rootpartition foo; INFO: analyzing "public.foo" inheritance tree INFO: column c of partition foo_1_prt_1 is not analyzed, so ANALYZE will collect sample for stats calculation -INFO: Executing SQL: select * from pg_catalog.gp_acquire_sample_rows(17861, 400, 't') as (totalrows pg_catalog.float8, totaldeadrows pg_catalog.float8, oversized_cols_bitmap pg_catalog.text, a integer, b integer, c integer) +INFO: Executing SQL: select pg_catalog.gp_acquire_sample_rows(17861, 400, 't') -- Testing auto merging root statistics for all columns -- where column attnums are differents due to dropped columns -- and split partitions. diff --git a/src/test/regress/sql/analyze.sql b/src/test/regress/sql/analyze.sql index 3f67ae6502c6..7d24e4a276cc 100644 --- a/src/test/regress/sql/analyze.sql +++ b/src/test/regress/sql/analyze.sql @@ -450,6 +450,11 @@ select relname, reltuples from pg_class where relname like 'aocs_analyze_test%' reset default_statistics_target; +-- Test column name called totalrows +create table test_tr (totalrows int4); +analyze test_tr; +drop table test_tr; + -- -- Test with both a dropped column and an oversized column -- (github issue https://github.com/greenplum-db/gpdb/issues/9503) @@ -459,3 +464,17 @@ insert into analyze_dropped_col values('a','bbb', repeat('x', 5000), 'dddd'); alter table analyze_dropped_col drop column b; analyze analyze_dropped_col; select attname, null_frac, avg_width, n_distinct from pg_stats where tablename ='analyze_dropped_col'; +-- Test analyze without USAGE privilege on schema +create schema test_ns; +revoke all on schema test_ns from public; +create role nsuser1; +grant create on schema test_ns to nsuser1; +set search_path to 'test_ns'; +create extension citext; +create table testid (id int , test citext); +alter table testid owner to nsuser1; +analyze testid; +drop table testid; +drop extension citext; +drop schema test_ns; +drop role nsuser1; From 6d5195e7d675e4463ab3878ae4b6c97106e73919 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Wed, 5 Feb 2020 15:26:34 -0800 Subject: [PATCH 008/102] Make gpcheckcat independent of master dbid gpcheckcat hard-coded master dbid to 1 for various queries. This assumption is flawed. There is no restriction master can only have dbid 1, it can be any value. For example, failover to standby and gpcheckat is not usable with that assumption. Hence, run-time find the value of master's dbid using the info that it's content-id is always -1 and use the same. Co-authored-by: Alexandra Wang (cherry picked from commit d1f19ca96ccf0b4bd20fa34d9cafc499a49ac2ad) --- gpMgmt/bin/gpcheckcat | 54 +++++++++++-------- .../gppylib/test/unit/test_unit_gpcheckcat.py | 1 + 2 files changed, 34 insertions(+), 21 deletions(-) diff --git a/gpMgmt/bin/gpcheckcat b/gpMgmt/bin/gpcheckcat index c0e2d56a60f0..56a739421241 100755 --- a/gpMgmt/bin/gpcheckcat +++ b/gpMgmt/bin/gpcheckcat @@ -123,6 +123,7 @@ class Global(): self.opt['-E'] = False + self.master_dbid = None self.cfg = None self.dbname = None self.firstdb = None @@ -469,12 +470,12 @@ def checkDistribPolicy(): where pk.contype in('p', 'u') and d.policytype = 'p' and d.distkey = '' ''' - db = connect2(GV.cfg[1]) + db = connect2(GV.cfg[GV.master_dbid]) try: curs = db.query(qry) err = [] for row in curs.dictresult(): - err.append([GV.cfg[1], ('nspname', 'relname', 'constraint'), row]) + err.append([GV.cfg[GV.master_dbid], ('nspname', 'relname', 'constraint'), row]) if not err: logger.info('[OK] randomly distributed tables') @@ -515,7 +516,7 @@ def checkDistribPolicy(): err = [] for row in curs.dictresult(): - err.append([GV.cfg[1], ('nspname', 'relname', 'constraint'), row]) + err.append([GV.cfg[GV.master_dbid], ('nspname', 'relname', 'constraint'), row]) if not err: logger.info('[OK] unique constraints') @@ -597,7 +598,7 @@ def checkPartitionIntegrity(): curs = db.query(qry) err = [] for row in curs.dictresult(): - err.append([GV.cfg[1], ('nspname', 'relname', 'oid'), row]) + err.append([GV.cfg[GV.master_dbid], ('nspname', 'relname', 'oid'), row]) if not err: logger.info('[OK] partition with oids check') @@ -702,7 +703,7 @@ def checkPartitionIntegrity(): err = [] for row in curs.dictresult(): - err.append([GV.cfg[1], cols, row]) + err.append([GV.cfg[GV.master_dbid], cols, row]) if not err: logger.info('[OK] partition distribution policy check') @@ -1904,7 +1905,7 @@ def checkDepend(): logger.info('-----------------------------------') logger.info('Checking Object Dependencies') - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) # Catalogs that link up to pg_depend/pg_shdepend qry = "select relname from pg_class where relnamespace=%d and relhasoids" % PG_CATALOG_OID @@ -2066,7 +2067,7 @@ def checkOwners(): # # - Between 3.3 and 4.0 the ao segment columns migrated from pg_class # to pg_appendonly. - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) qry = ''' select distinct n.nspname, coalesce(o.relname, c.relname) as relname, a.rolname, m.rolname as master_rolname @@ -2120,7 +2121,7 @@ def checkOwners(): # - Ignore implementation types of pg_class entries - they should be # in the check above since ALTER TABLE is required to fix them, not # ALTER TYPE. - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) qry = ''' select distinct n.nspname, t.typname, a.rolname, m.rolname as master_rolname from gp_dist_random('pg_type') r @@ -2183,7 +2184,7 @@ def closeDbs(): # ------------------------------------------------------------------------------- def getCatObj(namestr): - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) try: cat = GV.catalog.getCatalogTable(namestr) except Exception, e: @@ -2253,7 +2254,7 @@ def checkTableACL(cat): # Execute the query try: - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) curs = db.query(qry) nrows = curs.ntuples() @@ -2292,7 +2293,7 @@ def checkForeignKey(cat_tables=None): if not cat_tables: cat_tables = GV.catalog.getCatalogTables() - db_connection = connect2(GV.cfg[1], utilityMode=False) + db_connection = connect2(GV.cfg[GV.master_dbid], utilityMode=False) try: foreign_key_check = ForeignKeyCheck(db_connection, logger, GV.opt['-S'], autoCast) foreign_key_issues = foreign_key_check.runCheck(cat_tables) @@ -2380,7 +2381,7 @@ def checkTableMissingEntry(cat): # Execute the query try: - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) curs = db.query(qry) nrows = curs.ntuples() @@ -2540,7 +2541,7 @@ def checkTableInconsistentEntry(cat): # Execute the query try: - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) curs = db.query(qry) nrows = curs.ntuples() @@ -2680,7 +2681,7 @@ def checkTableDuplicateEntry(cat): # Execute the query try: - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) curs = db.query(qry) nrows = curs.ntuples() @@ -2743,7 +2744,7 @@ def duplicateEntryQuery(catname, pkey): def checkUniqueIndexViolation(): logger.info('-----------------------------------') logger.info('Performing check: checking for violated unique indexes') - db_connection = connect2(GV.cfg[1], utilityMode=False) + db_connection = connect2(GV.cfg[GV.master_dbid], utilityMode=False) violations = UniqueIndexViolationCheck().runCheck(db_connection) @@ -2782,7 +2783,7 @@ def checkOrphanedToastTables(): logger.info('-----------------------------------') logger.info('Performing check: checking for orphaned TOAST tables') - db_connection = connect2(GV.cfg[1], utilityMode=False) + db_connection = connect2(GV.cfg[GV.master_dbid], utilityMode=False) checker = OrphanedToastTablesCheck() check_passed = checker.runCheck(db_connection) @@ -3195,7 +3196,7 @@ def checkSegmentRepair(maybeRemove, catmod_guc, seg): # ------------------------------------------------------------------------------- def getCatalog(): # Establish a connection to the master & looks up info in the catalog - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) return GPCatalog(db) @@ -3302,7 +3303,7 @@ def getOidFromPK(catname, pkeys): pkeystr=pkeystr) try: - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) curs = db.query(qry) if (len(curs.dictresult()) == 0): raise QueryException("No such entry '%s' in %s" % (pkeystr, catname)) @@ -3917,7 +3918,7 @@ class GPObject: # Collect all tables with missing issues for later reporting if len(self.missingIssues): - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) oid_query = "select (select nspname from pg_namespace where oid=relnamespace) || '.' || relname from pg_class where oid=%d" type_query = "select (select nspname from pg_namespace where oid=relnamespace) || '.' || relname from pg_class where reltype=%d" for issues in self.missingIssues.values() : @@ -4086,7 +4087,7 @@ def getRelInfo(objects): """.format(oids=','.join(map(str, oids))) try: - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) curs = db.query(qry) for row in curs.getresult(): (oid, relname, nspname, relkind, paroid) = row @@ -4170,7 +4171,7 @@ def checkcatReport(): # Report tables with missing attributes in a more usable format if len(GV.missing_attr_tables) or len(GV.extra_attr_tables): # Expand partition tables - db = connect2(GV.cfg[1], utilityMode=False) + db = connect2(GV.cfg[GV.master_dbid], utilityMode=False) parent_tables = [t[0] for t in db.query("SELECT DISTINCT (schemaname || '.' || tablename) FROM pg_partitions").getresult()] partition_leaves_sql = """ SELECT x.partitionschemaname || '.' || x.partitiontablename @@ -4277,6 +4278,17 @@ def main(): GV.report_cfg = getReportConfiguration() GV.max_content = max([GV.cfg[dbid]['content'] for dbid in GV.cfg]) + + for dbid in GV.cfg: + if (GV.cfg[dbid]['content'] == -1): + GV.master_dbid = dbid + break + + if GV.master_dbid is None: + myprint("Error: master configuration info not found in gp_segment_configuration\n") + setError(ERROR_NOREPAIR) + sys.exit(GV.retcode) + GV.catalog = getCatalog() leaked_schema_dropper = LeakedSchemaDropper() diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckcat.py b/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckcat.py index fbadbfbeec46..b0209da34eb4 100755 --- a/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckcat.py +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckcat.py @@ -41,6 +41,7 @@ def setUp(self): ('arbitrary_catalog_table', ['pkey1', 'pkey2'], [('r1', 'r2'), ('r3', 'r4')])] self.foreign_key_check.runCheck.return_value = issues_list + self.subject.GV.master_dbid = 0 self.subject.GV.cfg = {0:dict(hostname='host0', port=123, id=1, address='123', datadir='dir', content=-1, dbid=0), 1:dict(hostname='host1', port=123, id=1, address='123', datadir='dir', content=1, dbid=1)} self.subject.GV.checkStatus = True From 4d917e3b417fb79e2d932799f7dab3894cabc384 Mon Sep 17 00:00:00 2001 From: David Yozie Date: Fri, 7 Feb 2020 17:30:48 -0800 Subject: [PATCH 009/102] Docs analytics fixes (#9550) * Add gpss link * Correct madlib typo * Remove broken link (unneeded) * Fix link to gptext/fts comparison --- gpdb-doc/dita/analytics/madlib.xml | 5 ++--- gpdb-doc/dita/analytics/overview.xml | 11 ++++++----- gpdb-doc/dita/analytics/text.xml | 10 +++++----- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/gpdb-doc/dita/analytics/madlib.xml b/gpdb-doc/dita/analytics/madlib.xml index b70ad0df5815..bc223705b83b 100644 --- a/gpdb-doc/dita/analytics/madlib.xml +++ b/gpdb-doc/dita/analytics/madlib.xml @@ -20,7 +20,7 @@ methods on structured and unstructured data. For Greenplum and MADlib version compatibility, refer to MADbib FAQ.

    + format="html" scope="external">MADlib FAQ.

    MADlib’s suite of SQL-based algorithms run at scale within a single Greenplum Database engine without needing to transfer data between the database and other tools.

    MADlib is part of the database fabric with no changes to the Greenplum architecture. This @@ -66,8 +66,7 @@

    MADlib can be used with PivotalR, an R client package that enables users to interact with data resident in the Greenplum Database. PivotalR can be considered as a wrapper around MADlib that translates R code into SQL to run on MPP databases and is designed for users - familiar with R but with data sets that are too large for R. For more details see .

    + familiar with R but with data sets that are too large for R.

    The R language is an open-source language that is used for statistical computing. PivotalR is an R package that enables users to interact with data resident in Greenplum Database using the R client. Using PivotalR requires that MADlib is installed on the Greenplum diff --git a/gpdb-doc/dita/analytics/overview.xml b/gpdb-doc/dita/analytics/overview.xml index 7c1d7831c16c..932002489a6b 100644 --- a/gpdb-doc/dita/analytics/overview.xml +++ b/gpdb-doc/dita/analytics/overview.xml @@ -84,11 +84,12 @@

  • Access the data where it lives, therefore integrate data and analytics in one place. Pivotal Greenplum is infrastructure-agnostic and runs on bare metal, private cloud, and public cloud deployments.
  • -
  • Use a multitude of data extensions. Greenplum supports Apache Kafka connector, extensions - for HDFS, Hive, and HBase as well as reading/writing data from/to cloud storage, including - Amazon S3 objects. Review the capabilities of Greenplum Platform Extension Framework (PXF), which provides +
  • Use a multitude of data extensions. Greenplum supports Apache Kafka integration, extensions for HDFS, Hive, and HBase as well as + reading/writing data from/to cloud storage, including Amazon S3 objects. Review the + capabilities of Greenplum Platform Extension Framework (PXF), which provides connectors that enable you to access data stored in sources external to your Greenplum Database deployment.
  • Use familiar and leading BI and advanced analytics software that are ODBC/JDBC compatible, or have native integrations, diff --git a/gpdb-doc/dita/analytics/text.xml b/gpdb-doc/dita/analytics/text.xml index 7a7d9c30546f..83bd312dd89d 100644 --- a/gpdb-doc/dita/analytics/text.xml +++ b/gpdb-doc/dita/analytics/text.xml @@ -24,11 +24,11 @@ cluster and provides Greenplum Database functions you can use to create Solr indexes, query them, and receive results in the database session.

    Both of these systems provide powerful, enterprise-quality document indexing and searching - services. GPText, with Solr, has many capabilities that are not available with Greenplum - Database text search. In particular, GPText is better for advanced text analysis applications. - For a comparative between these methods, see Comparing Greenplum - Database Text Search with Pivotal GPText.

    + services. GPText, with Solr, has many capabilities that are not available with Greenplum + Database text search. In particular, GPText is better for advanced text analysis + applications. For a comparative between these methods, see Comparing Greenplum Database Text Search + with Pivotal GPText.

  • From ac468b85cca3642ec072b7ed150c35579fa07686 Mon Sep 17 00:00:00 2001 From: "Huiliang.liu" Date: Tue, 11 Feb 2020 09:05:51 +0800 Subject: [PATCH 010/102] packaging gpversion.py into gpdb clients tarball (#9534) (#9556) gpload uses gpversion.py to parse gpdb version. So that it can compatible with gpdb5 and gpdb6. Then we can only maintain one gpload version and some new features or bug fix could be used by gpdb5 customers. so we package gppylib.gpversion into gpdb clients tarball --- concourse/scripts/compile_gpdb.bash | 3 +++ 1 file changed, 3 insertions(+) diff --git a/concourse/scripts/compile_gpdb.bash b/concourse/scripts/compile_gpdb.bash index 465baf25cd89..3f5bca671ca3 100755 --- a/concourse/scripts/compile_gpdb.bash +++ b/concourse/scripts/compile_gpdb.bash @@ -194,6 +194,9 @@ function export_gpdb_clients() { TARBALL="${GPDB_ARTIFACTS_DIR}/${GPDB_CL_FILENAME}" pushd ${GREENPLUM_CL_INSTALL_DIR} source ./greenplum_clients_path.sh + mkdir -p bin/ext/gppylib + cp ${GREENPLUM_INSTALL_DIR}/lib/python/gppylib/__init__.py ./bin/ext/gppylib + cp ${GREENPLUM_INSTALL_DIR}/lib/python/gppylib/gpversion.py ./bin/ext/gppylib python -m compileall -q -x test . chmod -R 755 . tar -czf "${TARBALL}" ./* From e5f8487c6387a93b66c9842471ce754e59f50739 Mon Sep 17 00:00:00 2001 From: Kalen Krempely Date: Thu, 6 Feb 2020 14:17:37 -0800 Subject: [PATCH 011/102] Fix typo in ExecutionError __str__ Co-authored-by: Mark Sliva --- gpMgmt/bin/gppylib/commands/base.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gpMgmt/bin/gppylib/commands/base.py b/gpMgmt/bin/gppylib/commands/base.py index b9fae1ecd139..e605eef452a1 100755 --- a/gpMgmt/bin/gppylib/commands/base.py +++ b/gpMgmt/bin/gppylib/commands/base.py @@ -379,7 +379,7 @@ def __init__(self, summary, cmd): def __str__(self): # TODO: improve dumping of self.cmd - return "ExecutionError: '%s' occured. Details: '%s' %s" % \ + return "ExecutionError: '%s' occurred. Details: '%s' %s" % \ (self.summary, self.cmd.cmdStr, self.cmd.get_results().printResult()) From aa2cef04123f53f9392c67c06e9bfb4923c41fe9 Mon Sep 17 00:00:00 2001 From: Jamie McAtamney Date: Mon, 3 Feb 2020 10:06:59 -0800 Subject: [PATCH 012/102] Allow cluster to start when standby host is down Previously, gpstart could not start the cluster if a standby master host was configured but currently down. In order to check whether the standby was supposed to be the acting master (and prevent the master from being started if that was the case), gpstart needed to access the standby host to retrieve the TimeLineID of the standby, and if the standby host was down the master would not start. This commit modifies gpstart to assume that the master host is the acting master if the standby is unreachable, so that it never gets into a state where neither the master nor the standby can be started. Co-authored-by: Kalen Krempely Co-authored-by: Mark Sliva Co-authored-by: Adam Berlin (cherry picked from commit 29c759ab8c1f4179e46b51c91a808e76f6747075) --- .../gppylib/test/unit/test_unit_gpstart.py | 94 ++++++++++++++----- gpMgmt/bin/gpstart | 37 +++++--- 2 files changed, 97 insertions(+), 34 deletions(-) diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_gpstart.py b/gpMgmt/bin/gppylib/test/unit/test_unit_gpstart.py index 403585b6d9c0..409aecafcacf 100644 --- a/gpMgmt/bin/gppylib/test/unit/test_unit_gpstart.py +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_gpstart.py @@ -5,10 +5,13 @@ from mock import Mock, patch -from gparray import Segment, GpArray +from gppylib.gparray import Segment, GpArray from gppylib.operations.startSegments import StartSegmentsResult from gppylib.test.unit.gp_unittest import GpTestCase, run_tests from gppylib.commands import gp +from gppylib.commands.base import ExecutionError +from gppylib.commands.pg import PgControlData +from gppylib.mainUtils import UserAbortedException class GpStart(GpTestCase): @@ -52,7 +55,6 @@ def setUp(self): patch("gpstart.gp.MasterStart.local"), patch("gpstart.pg.DbStatus.local"), patch("gpstart.TableLogger"), - patch('gpstart.PgControlData'), ]) self.mock_start_result = self.get_mock_from_apply_patch('StartSegmentsOperation') @@ -78,6 +80,12 @@ def setUp(self): def tearDown(self): super(GpStart, self).tearDown() + def setup_gpstart(self): + parser = self.subject.GpStart.createParser() + options, args = parser.parse_args() + gpstart = self.subject.GpStart.createProgram(options, args) + return gpstart + def test_option_master_success_without_auto_accept(self): sys.argv = ["gpstart", "-m"] self.mock_userinput.ask_yesno.return_value = True @@ -85,10 +93,7 @@ def test_option_master_success_without_auto_accept(self): self.mock_os_path_exists.side_effect = os_exists_check - parser = self.subject.GpStart.createParser() - options, args = parser.parse_args() - - gpstart = self.subject.GpStart.createProgram(options, args) + gpstart = self.setup_gpstart() return_code = gpstart.run() self.assertEqual(self.mock_userinput.ask_yesno.call_count, 1) @@ -104,10 +109,7 @@ def test_option_master_success_with_auto_accept(self): self.mock_os_path_exists.side_effect = os_exists_check - parser = self.subject.GpStart.createParser() - options, args = parser.parse_args() - - gpstart = self.subject.GpStart.createProgram(options, args) + gpstart = self.setup_gpstart() return_code = gpstart.run() self.assertEqual(self.mock_userinput.ask_yesno.call_count, 0) @@ -120,9 +122,7 @@ def test_output_to_stdout_and_log_for_master_only_happens_before_heap_checksum(s self.mock_userinput.ask_yesno.return_value = True self.subject.unix.PgPortIsActive.local.return_value = False self.mock_os_path_exists.side_effect = os_exists_check - parser = self.subject.GpStart.createParser() - options, args = parser.parse_args() - gpstart = self.subject.GpStart.createProgram(options, args) + gpstart = self.setup_gpstart() return_code = gpstart.run() @@ -139,9 +139,7 @@ def test_skip_checksum_validation_succeeds(self): self.mock_heap_checksum.return_value.get_segments_checksum_settings.return_value = ([1], [1]) self.subject.unix.PgPortIsActive.local.return_value = False self.mock_os_path_exists.side_effect = os_exists_check - parser = self.subject.GpStart.createParser() - options, args = parser.parse_args() - gpstart = self.subject.GpStart.createProgram(options, args) + gpstart = self.setup_gpstart() return_code = gpstart.run() @@ -160,9 +158,7 @@ def test_log_when_heap_checksum_validation_fails(self): start_failure.addFailure(self.mirror1, "fictitious reason", gp.SEGSTART_ERROR_CHECKSUM_MISMATCH) self.mock_start_result.return_value.startSegments.return_value.getFailedSegmentObjs.return_value = start_failure.getFailedSegmentObjs() - parser = self.subject.GpStart.createParser() - options, args = parser.parse_args() - gpstart = self.subject.GpStart.createProgram(options, args) + gpstart = self.setup_gpstart() return_code = gpstart.run() self.assertEqual(return_code, 1) @@ -172,9 +168,7 @@ def test_log_when_heap_checksum_validation_fails(self): def test_standby_startup_skipped(self): sys.argv = ["gpstart", "-a", "-y"] - parser = self.subject.GpStart.createParser() - options, args = parser.parse_args() - gpstart = self.subject.GpStart.createProgram(options, args) + gpstart = self.setup_gpstart() return_value = gpstart._start_standby() self.assertFalse(return_value) @@ -205,6 +199,58 @@ def test_prepare_segment_start_returns_up_and_down_segments(self): self.assertItemsEqual(up, [primary1, mirror0]) self.assertItemsEqual(down, [primary0, mirror1]) + @patch("gppylib.commands.pg.PgControlData.run") + @patch("gppylib.commands.pg.PgControlData.get_value", return_value="2") + def test_fetch_tli_returns_TimeLineID_when_standby_is_accessible(self, mock1, mock2): + gpstart = self.setup_gpstart() + + self.assertEqual(gpstart.fetch_tli("", "foo"), 2) + + @patch("gpstart.GpStart.shutdown_master_only") + @patch("gppylib.commands.pg.PgControlData.run") + @patch("gppylib.commands.pg.PgControlData.get_value", side_effect=ExecutionError("foobar", Mock())) + def test_fetch_tli_returns_0_when_standby_is_not_accessible_and_user_proceeds(self, mock_value, mock_run, mock_shutdown): + gpstart = self.setup_gpstart() + self.mock_userinput.ask_yesno.return_value = True + + self.assertEqual(gpstart.fetch_tli("", "foo"), 0) + self.assertFalse(mock_shutdown.called) + + @patch("gpstart.GpStart.shutdown_master_only") + @patch("gppylib.commands.pg.PgControlData.run") + @patch("gppylib.commands.pg.PgControlData.get_value", side_effect=ExecutionError("foobar", Mock())) + def test_fetch_tli_raises_exception_when_standby_is_not_accessible_and_user_aborts(self, mock_value, mock_run, mock_shutdown): + gpstart = self.setup_gpstart() + self.mock_userinput.ask_yesno.return_value = False + + with self.assertRaises(UserAbortedException): + gpstart.fetch_tli("", "foo") + self.assertTrue(mock_shutdown.called) + + @patch("gpstart.GpStart.shutdown_master_only") + @patch("gppylib.commands.pg.PgControlData.run") + @patch("gppylib.commands.pg.PgControlData.get_value", side_effect=ExecutionError("cmd foobar failed", Mock())) + def test_fetch_tli_logs_warning_when_standby_is_not_accessible(self, mock_value, mock_run, mock_shutdown): + gpstart = self.setup_gpstart() + self.mock_userinput.ask_yesno.return_value = False + + with self.assertRaises(UserAbortedException): + gpstart.fetch_tli("", "foo") + self.subject.logger.warning.assert_any_call(StringContains("Received error: ExecutionError: 'cmd foobar failed' occurred.")) + self.subject.logger.warning.assert_any_call("Continue only if you are certain that the standby is not acting as the master.") + + @patch("gpstart.GpStart.shutdown_master_only") + @patch("gppylib.commands.pg.PgControlData.run") + @patch("gppylib.commands.pg.PgControlData.get_value", side_effect=ExecutionError("foobar", Mock())) + def test_fetch_tli_logs_non_interactive_warning_when_standby_is_not_accessible(self, mock_value, mock_run, mock_shutdown): + gpstart = self.setup_gpstart() + gpstart.interactive = False + + with self.assertRaises(UserAbortedException): + gpstart.fetch_tli("", "foo") + self.assertTrue(mock_shutdown.called) + self.subject.logger.warning.assert_any_call("Non interactive mode detected. Not starting the cluster. Start the cluster in interactive mode.") + def _createGpArrayWith2Primary2Mirrors(self): self.master = Segment.initFromString( "1|-1|p|p|s|u|mdw|mdw|5432|/data/master") @@ -235,5 +281,9 @@ def os_exists_check(arg): return False +class StringContains(str): + def __eq__(self, other): + return self in other + if __name__ == '__main__': run_tests() diff --git a/gpMgmt/bin/gpstart b/gpMgmt/bin/gpstart index 232b0297ece1..878d8816443f 100755 --- a/gpMgmt/bin/gpstart +++ b/gpMgmt/bin/gpstart @@ -128,14 +128,7 @@ class GpStart: logger.info("Skipping Standby activation status checking.") logger.info("Shutting down master") - cmd = gp.GpStop("Shutting down master", masterOnly=True, - fast=True, quiet=logging_is_quiet(), - verbose=logging_is_verbose(), - datadir=self.master_datadir, - parallel=self.parallel, - logfileDirectory=self.logfileDirectory) - cmd.run() - logger.debug("results of forcing master shutdown: %s" % cmd) + self.shutdown_master_only() # TODO: check results of command. finally: @@ -225,14 +218,34 @@ class GpStart: self._remove_postmaster_tmpfile(self.port) + def shutdown_master_only(self): + cmd = gp.GpStop("Shutting down master", masterOnly=True, + fast=True, quiet=logging_is_quiet(), + verbose=logging_is_verbose(), + datadir=self.master_datadir, + parallel=self.parallel, + logfileDirectory=self.logfileDirectory) + cmd.run() + logger.debug("results of forcing master shutdown: %s" % cmd) + def fetch_tli(self, data_dir_path, remoteHost=None): if not remoteHost: - controldata = PgControlData("Latest checkpoint's TimeLineID", data_dir_path) + controldata = PgControlData("fetching pg_controldata locally", data_dir_path) else: - controldata = PgControlData("Latest checkpoint's TimeLineID", data_dir_path, REMOTE, remoteHost) + controldata = PgControlData("fetching pg_controldata remotely", data_dir_path, REMOTE, remoteHost) - controldata.run(validateAfter=True) - return int(controldata.get_value("Latest checkpoint's TimeLineID")) + try: + controldata.run(validateAfter=True) + return int(controldata.get_value("Latest checkpoint's TimeLineID")) + except base.ExecutionError as err: + logger.warning("Standby host is unreachable, cannot determine whether the standby is currently acting as the master. Received error: %s" % err) + logger.warning("Continue only if you are certain that the standby is not acting as the master.") + if not self.interactive or not userinput.ask_yesno(None, "\nContinue with startup", 'N'): + if not self.interactive: + logger.warning("Non interactive mode detected. Not starting the cluster. Start the cluster in interactive mode.") + self.shutdown_master_only() + raise UserAbortedException() + return 0 # a 0 won't lead to standby promotion, as TimeLineIDs start at 1 def _check_standby_activated(self): logger.debug("Checking if standby has been activated...") From 133c0c9e615ffda019d2db46273a65ac3c824519 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 12 Feb 2020 14:02:18 +0200 Subject: [PATCH 013/102] Allocate temporary hash tables in right memory context. We were passing the parent memory context as NULL, which caused them to be allocated permanently. That was surely not intended. --- src/backend/cdb/cdbpartindex.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/backend/cdb/cdbpartindex.c b/src/backend/cdb/cdbpartindex.c index c881f7b0a30c..1f61ef36b9e3 100644 --- a/src/backend/cdb/cdbpartindex.c +++ b/src/backend/cdb/cdbpartindex.c @@ -479,7 +479,6 @@ constructIndexHashKey(Oid partOid, static void createIndexHashTables() { - MemoryContext context = NULL; HASHCTL hash_ctl; /* @@ -493,7 +492,7 @@ createIndexHashTables() hash_ctl.hash = key_string_hash; hash_ctl.match = key_string_compare; hash_ctl.keycopy = key_string_copy; - hash_ctl.hcxt = context; + hash_ctl.hcxt = CurrentMemoryContext; PartitionIndexHash = hash_create("Partition Index Hash", INITIAL_NUM_LOGICAL_INDEXES_ESTIMATE, &hash_ctl, @@ -508,7 +507,7 @@ createIndexHashTables() hash_ctl.keysize = sizeof(uint32); hash_ctl.entrysize = sizeof(LogicalIndexInfoHashEntry); hash_ctl.hash = tag_hash; - hash_ctl.hcxt = context; + hash_ctl.hcxt = CurrentMemoryContext; LogicalIndexInfoHash = hash_create("Logical Index Info Hash", INITIAL_NUM_LOGICAL_INDEXES_ESTIMATE, &hash_ctl, From 3ddbed8eccb27dddd397848a800778c16cbcdc90 Mon Sep 17 00:00:00 2001 From: Asim R P Date: Thu, 13 Feb 2020 10:50:09 +0530 Subject: [PATCH 014/102] Incremental recovery and rebalance should run pg_rewind in parallel Incremental recovery and rebalance operations involve running pg_rewind against failed primaries. This patch changes gprecoverseg such that pg_rewind is invoked in parallel, using the WorkerPool interface, for each affected segment in the cluster. There is no reason to rewind segments one after the other. Fixes Github issue #9466 Reviewed by: Mark Sliva and Paul Guo (cherry picked from commit 43ad9d05fef11ef491305ea1f7b24404974860b5) --- .../gppylib/operations/buildMirrorSegments.py | 70 +++++++++++++------ .../unit/test_unit_buildmirrorsegments.py | 64 +++++++++++++++++ 2 files changed, 111 insertions(+), 23 deletions(-) create mode 100644 gpMgmt/bin/gppylib/test/unit/test_unit_buildmirrorsegments.py diff --git a/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py b/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py index 14142b0e3f9b..bb62c5058369 100644 --- a/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py +++ b/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py @@ -191,6 +191,16 @@ def getAdditionalWarnings(self): """ return self.__additionalWarnings + class RewindSegmentInfo: + """ + Which segments to run pg_rewind during incremental recovery. The + targetSegment is of type gparray.Segment. + """ + def __init__(self, targetSegment, sourceHostname, sourcePort): + self.targetSegment = targetSegment + self.sourceHostname = sourceHostname + self.sourcePort = sourcePort + def buildMirrors(self, actionName, gpEnv, gpArray): """ Build the mirrors. @@ -270,7 +280,8 @@ def buildMirrors(self, actionName, gpEnv, gpArray): # figure out what needs to be started or transitioned mirrorsToStart = [] - rewindInfo = [] + # Map of mirror dbid to GpMirrorListToBuild.RewindSegmentInfo objects + rewindInfo = {} primariesToConvert = [] convertPrimaryUsingFullResync = [] fullResyncMirrorDbIds = {} @@ -281,14 +292,15 @@ def buildMirrors(self, actionName, gpEnv, gpArray): mirrorsToStart.append(seg) primarySeg = toRecover.getLiveSegment() - # Append to rewindInfo to execute pg_rewind later if we are not + # Add to rewindInfo to execute pg_rewind later if we are not # using full recovery. We will run pg_rewind on incremental recovery # if the target mirror does not have recovery.conf file because # segment failover happened. The check for recovery.conf file will # happen in the same remote SegmentRewind Command call. if not toRecover.isFullSynchronization() \ and seg.getSegmentRole() == gparray.ROLE_MIRROR: - rewindInfo.append((seg, primarySeg.getSegmentHostName(), primarySeg.getSegmentPort())) + rewindInfo[seg.getSegmentDbId()] = GpMirrorListToBuild.RewindSegmentInfo( + seg, primarySeg.getSegmentHostName(), primarySeg.getSegmentPort()) # The change in configuration to of the mirror to down requires that # the primary also be marked as unsynchronized. @@ -342,12 +354,12 @@ def run_pg_rewind(self, rewindInfo): rewindFailedSegments = [] # Run pg_rewind on all the targets - for targetSegment, sourceHostName, sourcePort in rewindInfo: + for rewindSeg in rewindInfo.values(): # Do CHECKPOINT on source to force TimeLineID to be updated in pg_control. # pg_rewind wants that to make incremental recovery successful finally. - self.__logger.debug('Do CHECKPOINT on %s (port: %d) before running pg_rewind.' % (sourceHostName, sourcePort)) - dburl = dbconn.DbURL(hostname=sourceHostName, - port=sourcePort, + self.__logger.debug('Do CHECKPOINT on %s (port: %d) before running pg_rewind.' % (rewindSeg.sourceHostname, rewindSeg.sourcePort)) + dburl = dbconn.DbURL(hostname=rewindSeg.sourceHostname, + port=rewindSeg.sourcePort, dbname='template1') conn = dbconn.connect(dburl, utility=True) dbconn.execSQL(conn, "CHECKPOINT") @@ -358,23 +370,35 @@ def run_pg_rewind(self, rewindInfo): # mode. It should be safe to remove the postmaster.pid # file since we do not expect the failed segment to be up. self.remove_postmaster_pid_from_remotehost( - targetSegment.getSegmentHostName(), - targetSegment.getSegmentDataDirectory()) - - # Run pg_rewind to do incremental recovery. - cmd = gp.SegmentRewind('segment rewind', - targetSegment.getSegmentHostName(), - targetSegment.getSegmentDataDirectory(), - sourceHostName, - sourcePort, + rewindSeg.targetSegment.getSegmentHostName(), + rewindSeg.targetSegment.getSegmentDataDirectory()) + + # Note the command name, we use the dbid later to + # correlate the command results with GpMirrorToBuild + # object. + cmd = gp.SegmentRewind('rewind dbid: %s' % + rewindSeg.targetSegment.getSegmentDbId(), + rewindSeg.targetSegment.getSegmentHostName(), + rewindSeg.targetSegment.getSegmentDataDirectory(), + rewindSeg.sourceHostname, + rewindSeg.sourcePort, verbose=gplog.logging_is_verbose()) - try: - cmd.run(True) - self.__logger.debug('pg_rewind results: %s' % cmd.results) - except base.ExecutionError as e: - self.__logger.debug("pg_rewind failed for target directory %s." % targetSegment.getSegmentDataDirectory()) - self.__logger.warning("Incremental recovery failed for dbid %s. You must use gprecoverseg -F to recover the segment." % targetSegment.getSegmentDbId()) - rewindFailedSegments.append(targetSegment) + self.__pool.addCommand(cmd) + + if self.__quiet: + self.__pool.join() + else: + base.join_and_indicate_progress(self.__pool) + + for cmd in self.__pool.getCompletedItems(): + self.__logger.debug('pg_rewind results: %s' % cmd.results) + if not cmd.was_successful(): + dbid = int(cmd.name.split(':')[1].strip()) + self.__logger.debug("%s failed" % cmd.name) + self.__logger.warning("Incremental recovery failed for dbid %d. You must use gprecoverseg -F to recover the segment." % dbid) + rewindFailedSegments.append(rewindInfo[dbid].targetSegment) + + self.__pool.empty_completed_items() return rewindFailedSegments diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_buildmirrorsegments.py b/gpMgmt/bin/gppylib/test/unit/test_unit_buildmirrorsegments.py new file mode 100644 index 000000000000..ac0c8c07f6c1 --- /dev/null +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_buildmirrorsegments.py @@ -0,0 +1,64 @@ +from mock import * +from gp_unittest import * +from gppylib.gparray import GpArray, Segment +from gppylib.commands.base import WorkerPool + +class GpMirrorListToBuildTestCase(GpTestCase): + + def setUp(self): + self.pool = WorkerPool() + + def tearDown(self): + # All background threads must be stopped, or else the test runner will + # hang waiting. Join the stopped threads to make sure we're completely + # clean for the next test. + self.pool.haltWork() + self.pool.joinWorkers() + super(GpMirrorListToBuildTestCase, self).tearDown() + + def test_pg_rewind_parallel_execution(self): + self.apply_patches([ + # Mock CHECKPOINT command in run_pg_rewind() as successful + patch('gppylib.db.dbconn.connect', return_value=Mock()), + patch('gppylib.db.dbconn.execSQL', return_value=Mock()), + # Mock the command to remove postmaster.pid as successful + patch('gppylib.commands.base.Command.run', return_value=Mock()), + patch('gppylib.commands.base.Command.get_return_code', return_value=0), + # Mock all pg_rewind commands to be not successful + patch('gppylib.commands.base.Command.was_successful', return_value=False) + ]) + from gppylib.operations.buildMirrorSegments import GpMirrorListToBuild + # WorkerPool is the only valid parameter required in this test + # case. The test expects the workers to get a pg_rewind + # command to run (and the command should fail to run). + g = GpMirrorListToBuild(1, self.pool, 1,1) + rewindInfo = {} + p0 = Segment.initFromString("2|0|p|p|s|u|sdw1|sdw1|40000|/data/primary0") + p1 = Segment.initFromString("3|1|p|p|s|u|sdw2|sdw2|40001|/data/primary1") + m0 = Segment.initFromString("4|0|m|m|s|u|sdw2|sdw2|50000|/data/mirror0") + m1 = Segment.initFromString("5|1|m|m|s|u|sdw1|sdw1|50001|/data/mirror1") + rewindInfo[p0.dbid] = GpMirrorListToBuild.RewindSegmentInfo( + p0, p0.address, p0.port) + rewindInfo[p1.dbid] = GpMirrorListToBuild.RewindSegmentInfo( + p1, p1.address, p1.port) + rewindInfo[m0.dbid] = GpMirrorListToBuild.RewindSegmentInfo( + m0, m0.address, m0.port) + rewindInfo[m1.dbid] = GpMirrorListToBuild.RewindSegmentInfo( + m1, m1.address, m1.port) + + # Test1: all 4 pg_rewind commands should fail due the "was_successful" patch + failedSegments = g.run_pg_rewind(rewindInfo) + self.assertEqual(len(failedSegments), 4) + # The returned list of failed segments should contain items of + # type gparray.Segment + failedSegments.remove(p0) + self.assertTrue(failedSegments[0].getSegmentDbId() > 0) + + # Test2: patch it such that no failures this time + patch('gppylib.commands.base.Command.was_successful', return_value=True).start() + failedSegments = g.run_pg_rewind(rewindInfo) + self.assertEqual(len(failedSegments), 0) + +if __name__ == '__main__': + run_tests() + From 1c46340a22d8aa3a476c251ac5dc40de8abf8bc3 Mon Sep 17 00:00:00 2001 From: Paul Guo Date: Thu, 13 Feb 2020 15:56:52 +0800 Subject: [PATCH 015/102] Various dtm related code cleanup. (#9543) Main changes are: - Merge isQDContext() and isQEContext() since the later is a bit buggy and there is no need to separate them in gpdb master now. - Remove an incorrect or unnecessary switch in notifyCommittedDtxTransaction(). - Rename some two phase variables or functions since they could be used in one phase also. - Remove some unnecessary Assert code (some are because previous code logic has judged; some are due to obvious reasons). - Rename DTX_STATE_PERFORMING_ONE_PHASE_COMMIT to DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT to make code more align with 2PC code. - Remove useless state DTX_STATE_FORCED_COMMITTED. Reviewed-by: Hubert Zhang Reviewed-by: Gang Xiong Cherry-picked from 83da7ddfcfc9d9e1c5a918260c863231c9f84f53 --- src/backend/cdb/cdbdistributedsnapshot.c | 6 +- src/backend/cdb/cdbdtxcontextinfo.c | 6 +- src/backend/cdb/cdblocaldistribxact.c | 8 +- src/backend/cdb/cdbtm.c | 167 ++++++++------------- src/backend/cdb/cdbtmutils.c | 2 - src/backend/cdb/dispatcher/cdbdisp_dtx.c | 6 +- src/backend/cdb/dispatcher/cdbdisp_query.c | 16 +- src/backend/executor/execMain.c | 10 +- src/backend/executor/nodeSubplan.c | 6 +- src/backend/utils/error/elog.c | 3 +- src/backend/utils/gpmon/gpmon.c | 2 +- src/include/cdb/cdbdisp_dtx.h | 2 +- src/include/cdb/cdblocaldistribxact.h | 2 - src/include/cdb/cdbtm.h | 22 ++- 14 files changed, 103 insertions(+), 155 deletions(-) diff --git a/src/backend/cdb/cdbdistributedsnapshot.c b/src/backend/cdb/cdbdistributedsnapshot.c index 49f31da824f7..ff2b0471189a 100644 --- a/src/backend/cdb/cdbdistributedsnapshot.c +++ b/src/backend/cdb/cdbdistributedsnapshot.c @@ -84,7 +84,6 @@ DistributedSnapshotWithLocalMapping_CommittedTest( * Is this local xid in a process-local cache we maintain? */ if (LocalDistribXactCache_CommittedFind(localXid, - ds->distribTransactionTimeStamp, &distribXid)) { /* @@ -132,9 +131,7 @@ DistributedSnapshotWithLocalMapping_CommittedTest( /* * Since we did not find it in our process local cache, add it. */ - LocalDistribXactCache_AddCommitted( - localXid, - ds->distribTransactionTimeStamp, + LocalDistribXactCache_AddCommitted(localXid, distribXid); } else @@ -145,7 +142,6 @@ DistributedSnapshotWithLocalMapping_CommittedTest( * transaction, it must be local-only. */ LocalDistribXactCache_AddCommitted(localXid, - ds->distribTransactionTimeStamp, /* distribXid */ InvalidDistributedTransactionId); return DISTRIBUTEDSNAPSHOT_COMMITTED_IGNORE; diff --git a/src/backend/cdb/cdbdtxcontextinfo.c b/src/backend/cdb/cdbdtxcontextinfo.c index 5e38155c924a..4a38616929c7 100644 --- a/src/backend/cdb/cdbdtxcontextinfo.c +++ b/src/backend/cdb/cdbdtxcontextinfo.c @@ -49,7 +49,7 @@ DtxContextInfo_CreateOnMaster(DtxContextInfo *dtxContextInfo, bool inCursor, dtxContextInfo->distributedXid = getDistributedTransactionId(); if (dtxContextInfo->distributedXid != InvalidDistributedTransactionId) { - dtxContextInfo->distributedTimeStamp = getDtxStartTime(); + dtxContextInfo->distributedTimeStamp = getDtmStartTime(); getDistributedTransactionIdentifier(dtxContextInfo->distributedId); dtxContextInfo->curcid = curcid; @@ -121,9 +121,9 @@ DtxContextInfo_CreateOnMaster(DtxContextInfo *dtxContextInfo, bool inCursor, dtxContextInfo->curcid); elog((Debug_print_full_dtm ? LOG : DEBUG5), - "DtxContextInfo_CreateOnMaster txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", + "DtxContextInfo_CreateOnMaster txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", txnOptions, - (isMppTxOptions_NeedTwoPhase(txnOptions) ? "true" : "false"), + (isMppTxOptions_NeedDtx(txnOptions) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(txnOptions) ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(txnOptions)), (isMppTxOptions_ReadOnly(txnOptions) ? "true" : "false")); diff --git a/src/backend/cdb/cdblocaldistribxact.c b/src/backend/cdb/cdblocaldistribxact.c index 462f82d5c7de..a44c6057d08a 100644 --- a/src/backend/cdb/cdblocaldistribxact.c +++ b/src/backend/cdb/cdblocaldistribxact.c @@ -197,9 +197,7 @@ static struct LocalDistribXactCache bool -LocalDistribXactCache_CommittedFind( - TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, +LocalDistribXactCache_CommittedFind(TransactionId localXid, DistributedTransactionId *distribXid) { LocalDistribXactCacheEntry *entry; @@ -265,9 +263,7 @@ LocalDistribXactCache_CommittedFind( } void -LocalDistribXactCache_AddCommitted( - TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, +LocalDistribXactCache_AddCommitted(TransactionId localXid, DistributedTransactionId distribXid) { LocalDistribXactCacheEntry *entry; diff --git a/src/backend/cdb/cdbtm.c b/src/backend/cdb/cdbtm.c index 74dbe746c70c..85d502834240 100644 --- a/src/backend/cdb/cdbtm.c +++ b/src/backend/cdb/cdbtm.c @@ -96,7 +96,7 @@ int max_tm_gxacts = 100; * bits 2-4 for iso level * bit 5 is for read-only */ -#define GP_OPT_NEED_TWO_PHASE 0x0001 +#define GP_OPT_NEED_DTX 0x0001 #define GP_OPT_ISOLATION_LEVEL_MASK 0x000E #define GP_OPT_READ_UNCOMMITTED (1 << 1) @@ -119,7 +119,7 @@ static void doNotifyingCommitPrepared(void); static void doNotifyingAbort(void); static void retryAbortPrepared(void); static void doQEDistributedExplicitBegin(); -static void currentDtxActivateTwoPhase(void); +static void currentDtxActivate(void); static void setCurrentDtxState(DtxState state); static bool isDtxQueryDispatcher(void); @@ -155,39 +155,10 @@ requireDistributedTransactionContext(DtxContext requiredCurrentContext) } } -/** - * Does DistributedTransactionContext indicate that this is acting as a QD? - */ static bool -isQDContext(void) +isDtxContext(void) { - switch (DistributedTransactionContext) - { - case DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE: - case DTX_CONTEXT_QD_RETRY_PHASE_2: - return true; - default: - return false; - } -} - -/** - * Does DistributedTransactionContext indicate that this is acting as a QE? - */ -static bool -isQEContext() -{ - switch (DistributedTransactionContext) - { - case DTX_CONTEXT_QE_ENTRY_DB_SINGLETON: - case DTX_CONTEXT_QE_AUTO_COMMIT_IMPLICIT: - case DTX_CONTEXT_QE_TWO_PHASE_EXPLICIT_WRITER: - case DTX_CONTEXT_QE_TWO_PHASE_IMPLICIT_WRITER: - case DTX_CONTEXT_QE_READER: - return true; - default: - return false; - } + return DistributedTransactionContext != DTX_CONTEXT_LOCAL_ONLY; } /*========================================================================= @@ -195,7 +166,7 @@ isQEContext() */ DistributedTransactionTimeStamp -getDtxStartTime(void) +getDtmStartTime(void) { if (shmDistribTimeStamp != NULL) return *shmDistribTimeStamp; @@ -206,7 +177,7 @@ getDtxStartTime(void) DistributedTransactionId getDistributedTransactionId(void) { - if (isQDContext() || isQEContext()) + if (isDtxContext()) return MyTmGxact->gxid; else return InvalidDistributedTransactionId; @@ -215,7 +186,7 @@ getDistributedTransactionId(void) DistributedTransactionTimeStamp getDistributedTransactionTimestamp(void) { - if (isQDContext() || isQEContext()) + if (isDtxContext()) return MyTmGxact->distribTimeStamp; else return 0; @@ -226,8 +197,7 @@ getDistributedTransactionIdentifier(char *id) { Assert(MyTmGxactLocal != NULL); - if ((isQDContext() || isQEContext()) && - MyTmGxact->gxid != InvalidDistributedTransactionId) + if (isDtxContext() && MyTmGxact->gxid != InvalidDistributedTransactionId) { /* * The length check here requires the identifer have a trailing @@ -256,13 +226,13 @@ isPreparedDtxTransaction(void) * the current dtx is clean and we aren't in a user-started global transaction. */ bool -isCurrentDtxTwoPhaseActivated(void) +isCurrentDtxActivated(void) { return MyTmGxactLocal->state != DTX_STATE_NONE; } static void -currentDtxActivateTwoPhase(void) +currentDtxActivate(void) { /* * Bump 'shmGIDSeq' and assign it to 'MyTmGxact->gxid', this needs to be atomic. @@ -295,7 +265,7 @@ currentDtxActivateTwoPhase(void) (errmsg("reached the limit of %u global transactions per start", LastDistributedTransactionId))); - MyTmGxact->distribTimeStamp = getDtxStartTime(); + MyTmGxact->distribTimeStamp = getDtmStartTime(); MyTmGxact->sessionId = gp_session_id; setCurrentDtxState(DTX_STATE_ACTIVE_DISTRIBUTED); } @@ -322,7 +292,7 @@ notifyCommittedDtxTransactionIsNeeded(void) return false; } - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) { elog(DTM_DEBUG5, "notifyCommittedDtxTransaction nothing to do (two phase not activated)"); return false; @@ -332,7 +302,7 @@ notifyCommittedDtxTransactionIsNeeded(void) } /* - * Notify commited a global transaction, called by user commit + * Notify committed a global transaction, called by user commit * or by CommitTransaction */ void @@ -340,11 +310,10 @@ notifyCommittedDtxTransaction(void) { Assert(Gp_role == GP_ROLE_DISPATCH); Assert(DistributedTransactionContext == DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE); - Assert(isCurrentDtxTwoPhaseActivated()); + Assert(isCurrentDtxActivated()); switch(MyTmGxactLocal->state) { - case DTX_STATE_PREPARED: case DTX_STATE_INSERTED_COMMITTED: doNotifyingCommitPrepared(); break; @@ -359,13 +328,13 @@ notifyCommittedDtxTransaction(void) } void -setupTwoPhaseTransaction(void) +setupDtxTransaction(void) { if (!IsTransactionState()) elog(ERROR, "DTM transaction is not active"); - if (!isCurrentDtxTwoPhaseActivated()) - currentDtxActivateTwoPhase(); + if (!isCurrentDtxActivated()) + currentDtxActivate(); if (MyTmGxactLocal->state != DTX_STATE_ACTIVE_DISTRIBUTED) elog(ERROR, "DTM transaction state (%s) is invalid", DtxStateToString(MyTmGxactLocal->state)); @@ -393,9 +362,9 @@ doDispatchSubtransactionInternalCmd(DtxProtocolCommand cmdType) } if (cmdType == DTX_PROTOCOL_COMMAND_SUBTRANSACTION_BEGIN_INTERNAL && - !isCurrentDtxTwoPhaseActivated()) + !isCurrentDtxActivated()) { - currentDtxActivateTwoPhase(); + currentDtxActivate(); } serializedDtxContextInfo = qdSerializeDtxContextInfo(&serializedDtxContextInfoLen, @@ -442,7 +411,7 @@ doPrepareTransaction(void) elog(DTM_DEBUG5, "doPrepareTransaction moved to state = %s", DtxStateToString(MyTmGxactLocal->state)); - Assert(MyTmGxactLocal->twophaseSegments != NIL); + Assert(MyTmGxactLocal->dtxSegments != NIL); succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_PREPARE, true); /* @@ -538,22 +507,20 @@ doNotifyingOnePhaseCommit(void) bool succeeded; volatile int savedInterruptHoldoffCount; - if (MyTmGxactLocal->twophaseSegments == NULL) + if (MyTmGxactLocal->dtxSegments == NIL) return; elog(DTM_DEBUG5, "doNotifyingOnePhaseCommit entering in state = %s", DtxStateToString(MyTmGxactLocal->state)); Assert(MyTmGxactLocal->state == DTX_STATE_ONE_PHASE_COMMIT); - setCurrentDtxState(DTX_STATE_PERFORMING_ONE_PHASE_COMMIT); + setCurrentDtxState(DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT); savedInterruptHoldoffCount = InterruptHoldoffCount; - Assert(MyTmGxactLocal->twophaseSegments != NIL); - succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_COMMIT_ONEPHASE, true); if (!succeeded) { - Assert(MyTmGxactLocal->state == DTX_STATE_PERFORMING_ONE_PHASE_COMMIT); + Assert(MyTmGxactLocal->state == DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT); elog(ERROR, "one phase commit failed"); } } @@ -574,7 +541,7 @@ doNotifyingCommitPrepared(void) SIMPLE_FAULT_INJECTOR("dtm_broadcast_commit_prepared"); savedInterruptHoldoffCount = InterruptHoldoffCount; - Assert(MyTmGxactLocal->twophaseSegments != NIL); + Assert(MyTmGxactLocal->dtxSegments != NIL); PG_TRY(); { succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_COMMIT_PREPARED, true); @@ -699,7 +666,7 @@ retryAbortPrepared(void) PG_TRY(); { - MyTmGxactLocal->twophaseSegments = cdbcomponent_getCdbComponentsList(); + MyTmGxactLocal->dtxSegments = cdbcomponent_getCdbComponentsList(); succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_RETRY_ABORT_PREPARED, true); if (!succeeded) ereport(WARNING, @@ -751,7 +718,7 @@ doNotifyingAbort(void) * occur before the command is actually dispatched, no need to dispatch DTX for * such cases. */ - if (!MyTmGxactLocal->writerGangLost && MyTmGxactLocal->twophaseSegments) + if (!MyTmGxactLocal->writerGangLost && MyTmGxactLocal->dtxSegments) { succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_ABORT_NO_PREPARED, false); @@ -864,7 +831,7 @@ prepareDtxTransaction(void) return; } - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) { Assert(MyTmGxactLocal->state == DTX_STATE_NONE); Assert(Gp_role != GP_ROLE_DISPATCH || MyTmGxact->gxid == InvalidDistributedTransactionId); @@ -879,7 +846,7 @@ prepareDtxTransaction(void) * segments. */ if (!ExecutorDidWriteXLog() || - (!markXidCommitted && list_length(MyTmGxactLocal->twophaseSegments) < 2)) + (!markXidCommitted && list_length(MyTmGxactLocal->dtxSegments) < 2)) { setCurrentDtxState(DTX_STATE_ONE_PHASE_COMMIT); return; @@ -908,7 +875,7 @@ rollbackDtxTransaction(void) DtxContextToString(DistributedTransactionContext)); return; } - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) { elog(DTM_DEBUG5, "rollbackDtxTransaction nothing to do (two phase not activate)"); return; @@ -945,7 +912,7 @@ rollbackDtxTransaction(void) break; case DTX_STATE_ONE_PHASE_COMMIT: - case DTX_STATE_PERFORMING_ONE_PHASE_COMMIT: + case DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT: setCurrentDtxState(DTX_STATE_NOTIFYING_ABORT_NO_PREPARED); break; @@ -1120,7 +1087,7 @@ tmShmemInit(void) * after the statement. */ int -mppTxnOptions(bool needTwoPhase) +mppTxnOptions(bool needDtx) { int options = 0; @@ -1129,8 +1096,8 @@ mppTxnOptions(bool needTwoPhase) IsoLevelAsUpperString(DefaultXactIsoLevel), (DefaultXactReadOnly ? "true" : "false"), IsoLevelAsUpperString(XactIsoLevel), (XactReadOnly ? "true" : "false")); - if (needTwoPhase) - options |= GP_OPT_NEED_TWO_PHASE; + if (needDtx) + options |= GP_OPT_NEED_DTX; if (XactIsoLevel == XACT_READ_COMMITTED) options |= GP_OPT_READ_COMMITTED; @@ -1144,13 +1111,13 @@ mppTxnOptions(bool needTwoPhase) if (XactReadOnly) options |= GP_OPT_READ_ONLY; - if (isCurrentDtxTwoPhaseActivated() && MyTmGxactLocal->explicitBeginRemembered) + if (isCurrentDtxActivated() && MyTmGxactLocal->explicitBeginRemembered) options |= GP_OPT_EXPLICT_BEGIN; elog(DTM_DEBUG5, - "mppTxnOptions txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", + "mppTxnOptions txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", options, - (isMppTxOptions_NeedTwoPhase(options) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(options) ? "true" : "false"), + (isMppTxOptions_NeedDtx(options) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(options) ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(options)), (isMppTxOptions_ReadOnly(options) ? "true" : "false")); return options; @@ -1179,9 +1146,9 @@ isMppTxOptions_ReadOnly(int txnOptions) } bool -isMppTxOptions_NeedTwoPhase(int txnOptions) +isMppTxOptions_NeedDtx(int txnOptions) { - return ((txnOptions & GP_OPT_NEED_TWO_PHASE) != 0); + return ((txnOptions & GP_OPT_NEED_DTX) != 0); } /* isMppTxOptions_ExplicitBegin: @@ -1204,14 +1171,14 @@ currentDtxDispatchProtocolCommand(DtxProtocolCommand dtxProtocolCommand, bool ra dtxFormGID(gid, getDistributedTransactionTimestamp(), getDistributedTransactionId()); return doDispatchDtxProtocolCommand(dtxProtocolCommand, gid, badgang, raiseError, - MyTmGxactLocal->twophaseSegments, NULL, 0); + MyTmGxactLocal->dtxSegments, NULL, 0); } bool doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, char *gid, bool *badGangs, bool raiseError, - List *twophaseSegments, + List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen) { @@ -1223,7 +1190,7 @@ doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, struct pg_result **results; - if (!twophaseSegments) + if (!dtxSegments) return true; dtxProtocolCommandStr = DtxProtocolCommandToString(dtxProtocolCommand); @@ -1231,18 +1198,18 @@ doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, if (Test_print_direct_dispatch_info) elog(INFO, "Distributed transaction command '%s' to %s", dtxProtocolCommandStr, - segmentsToContentStr(twophaseSegments)); + segmentsToContentStr(dtxSegments)); ereport(DTM_DEBUG5, (errmsg("dispatchDtxProtocolCommand: %d ('%s'), direct content #: %s", dtxProtocolCommand, dtxProtocolCommandStr, - segmentsToContentStr(twophaseSegments)))); + segmentsToContentStr(dtxSegments)))); ErrorData *qeError; results = CdbDispatchDtxProtocolCommand(dtxProtocolCommand, dtxProtocolCommandStr, gid, - &qeError, &resultCount, badGangs, twophaseSegments, + &qeError, &resultCount, badGangs, dtxSegments, serializedDtxContextInfo, serializedDtxContextInfoLen); if (qeError) @@ -1392,8 +1359,8 @@ resetGxact() MyTmGxactLocal->explicitBeginRemembered = false; MyTmGxactLocal->badPrepareGangs = false; MyTmGxactLocal->writerGangLost = false; - MyTmGxactLocal->twophaseSegmentsMap = NULL; - MyTmGxactLocal->twophaseSegments = NIL; + MyTmGxactLocal->dtxSegmentsMap = NULL; + MyTmGxactLocal->dtxSegments = NIL; MyTmGxactLocal->isOnePhaseCommit = false; setCurrentDtxState(DTX_STATE_NONE); } @@ -1415,7 +1382,7 @@ getNextDistributedXactStatus(TMGALLXACTSTATUS *allDistributedXactStatus, TMGXACT static void clearAndResetGxact(void) { - Assert(isCurrentDtxTwoPhaseActivated()); + Assert(isCurrentDtxActivated()); LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); ProcArrayEndGxact(); @@ -1615,9 +1582,8 @@ setupRegularDtxContext(void) Assert(DistributedTransactionContext == DTX_CONTEXT_LOCAL_ONLY); if (isDtxQueryDispatcher()) - { setDistributedTransactionContext(DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE); - } + break; } @@ -1635,7 +1601,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) { DistributedSnapshot *distributedSnapshot; int txnOptions; - bool needTwoPhase; + bool needDtx; bool explicitBegin; bool haveDistributedSnapshot; bool isEntryDbSingleton = false; @@ -1651,7 +1617,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) distributedSnapshot = &dtxContextInfo->distributedSnapshot; txnOptions = dtxContextInfo->distributedTxnOptions; - needTwoPhase = isMppTxOptions_NeedTwoPhase(txnOptions); + needDtx = isMppTxOptions_NeedDtx(txnOptions); explicitBegin = isMppTxOptions_ExplicitBegin(txnOptions); haveDistributedSnapshot = dtxContextInfo->haveDistributedSnapshot; @@ -1661,9 +1627,9 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) { elog(DTM_DEBUG5, "setupQEDtxContext inputs (part 1): Gp_role = %s, Gp_is_writer = %s, " - "txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s, haveDistributedSnapshot = %s.", + "txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s, haveDistributedSnapshot = %s.", role_to_string(Gp_role), (Gp_is_writer ? "true" : "false"), txnOptions, - (needTwoPhase ? "true" : "false"), (explicitBegin ? "true" : "false"), + (needDtx ? "true" : "false"), (explicitBegin ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(txnOptions)), (isMppTxOptions_ReadOnly(txnOptions) ? "true" : "false"), (haveDistributedSnapshot ? "true" : "false")); elog(DTM_DEBUG5, @@ -1774,7 +1740,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) setDistributedTransactionContext(DTX_CONTEXT_QE_READER); } - else if (isWriterQE && (explicitBegin || needTwoPhase)) + else if (isWriterQE && (explicitBegin || needDtx)) { if (!haveDistributedSnapshot) { @@ -1799,10 +1765,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) doQEDistributedExplicitBegin(); } else - { - Assert(needTwoPhase); setDistributedTransactionContext(DTX_CONTEXT_QE_TWO_PHASE_IMPLICIT_WRITER); - } } else if (haveDistributedSnapshot) { @@ -1902,7 +1865,7 @@ finishDistributedTransactionContext(char *debugCaller, bool aborted) * We let the 2 retry states go up to PostgresMain.c, otherwise everything * MUST be complete. */ - if (isCurrentDtxTwoPhaseActivated() && + if (isCurrentDtxActivated() && (MyTmGxactLocal->state != DTX_STATE_RETRY_COMMIT_PREPARED && MyTmGxactLocal->state != DTX_STATE_RETRY_ABORT_PREPARED)) { @@ -1928,7 +1891,7 @@ finishDistributedTransactionContext(char *debugCaller, bool aborted) static void rememberDtxExplicitBegin(void) { - Assert (isCurrentDtxTwoPhaseActivated()); + Assert (isCurrentDtxActivated()); if (!MyTmGxactLocal->explicitBeginRemembered) { @@ -1948,7 +1911,7 @@ rememberDtxExplicitBegin(void) bool isDtxExplicitBegin(void) { - return (isCurrentDtxTwoPhaseActivated() && MyTmGxactLocal->explicitBeginRemembered); + return (isCurrentDtxActivated() && MyTmGxactLocal->explicitBeginRemembered); } /* @@ -1961,7 +1924,7 @@ sendDtxExplicitBegin(void) if (Gp_role != GP_ROLE_DISPATCH) return; - setupTwoPhaseTransaction(); + setupDtxTransaction(); rememberDtxExplicitBegin(); } @@ -2312,18 +2275,18 @@ currentGxactWriterGangLost(void) * Record which segment involved in the two phase commit. */ void -addToGxactTwophaseSegments(Gang *gang) +addToGxactDtxSegments(Gang *gang) { SegmentDatabaseDescriptor *segdbDesc; MemoryContext oldContext; int segindex; int i; - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) return; /* skip if all segdbs are in the list */ - if (list_length(MyTmGxactLocal->twophaseSegments) >= getgpsegmentCount()) + if (list_length(MyTmGxactLocal->dtxSegments) >= getgpsegmentCount()) return; oldContext = MemoryContextSwitchTo(TopTransactionContext); @@ -2338,14 +2301,14 @@ addToGxactTwophaseSegments(Gang *gang) continue; /* skip if record already */ - if (bms_is_member(segindex, MyTmGxactLocal->twophaseSegmentsMap)) + if (bms_is_member(segindex, MyTmGxactLocal->dtxSegmentsMap)) continue; - MyTmGxactLocal->twophaseSegmentsMap = - bms_add_member(MyTmGxactLocal->twophaseSegmentsMap, segindex); + MyTmGxactLocal->dtxSegmentsMap = + bms_add_member(MyTmGxactLocal->dtxSegmentsMap, segindex); - MyTmGxactLocal->twophaseSegments = - lappend_int(MyTmGxactLocal->twophaseSegments, segindex); + MyTmGxactLocal->dtxSegments = + lappend_int(MyTmGxactLocal->dtxSegments, segindex); } MemoryContextSwitchTo(oldContext); } diff --git a/src/backend/cdb/cdbtmutils.c b/src/backend/cdb/cdbtmutils.c index 3eb074d9ba28..6fcb18816168 100644 --- a/src/backend/cdb/cdbtmutils.c +++ b/src/backend/cdb/cdbtmutils.c @@ -64,8 +64,6 @@ DtxStateToString(DtxState state) return "Inserting Committed"; case DTX_STATE_INSERTED_COMMITTED: return "Inserted Committed"; - case DTX_STATE_FORCED_COMMITTED: - return "Forced Committed"; case DTX_STATE_NOTIFYING_COMMIT_PREPARED: return "Notifying Commit Prepared"; case DTX_STATE_INSERTING_FORGET_COMMITTED: diff --git a/src/backend/cdb/dispatcher/cdbdisp_dtx.c b/src/backend/cdb/dispatcher/cdbdisp_dtx.c index 048996df430d..c9f6c8336ac0 100644 --- a/src/backend/cdb/dispatcher/cdbdisp_dtx.c +++ b/src/backend/cdb/dispatcher/cdbdisp_dtx.c @@ -71,7 +71,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, ErrorData **qeError, int *numresults, bool *badGangs, - List *twophaseSegments, + List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen) { @@ -102,7 +102,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, queryText = buildGpDtxProtocolCommand(&dtxProtocolParms, &queryTextLen); - primaryGang = AllocateGang(ds, GANGTYPE_PRIMARY_WRITER, twophaseSegments); + primaryGang = AllocateGang(ds, GANGTYPE_PRIMARY_WRITER, dtxSegments); Assert(primaryGang); @@ -110,7 +110,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, cdbdisp_makeDispatchParams(ds, 1, queryText, queryTextLen); cdbdisp_dispatchToGang(ds, primaryGang, -1); - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); diff --git a/src/backend/cdb/dispatcher/cdbdisp_query.c b/src/backend/cdb/dispatcher/cdbdisp_query.c index f87baa226519..9490c764c91b 100644 --- a/src/backend/cdb/dispatcher/cdbdisp_query.c +++ b/src/backend/cdb/dispatcher/cdbdisp_query.c @@ -295,11 +295,11 @@ CdbDispatchSetCommand(const char *strCommand, bool cancelOnError) cdbdisp_dispatchToGang(ds, rg, -1); } - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); /* * No need for two-phase commit, so no need to call - * addToGxactTwophaseSegments. + * addToGxactDtxSegments. */ cdbdisp_waitDispatchFinish(ds); @@ -359,7 +359,7 @@ CdbDispatchCommandToSegments(const char *strCommand, bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupTwoPhaseTransaction(); + setupDtxTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchCommand: %s (needTwoPhase = %s)", @@ -397,7 +397,7 @@ CdbDispatchUtilityStatement(struct Node *stmt, bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupTwoPhaseTransaction(); + setupDtxTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchUtilityStatement: %s (needTwoPhase = %s)", @@ -443,7 +443,7 @@ cdbdisp_dispatchCommandInternal(DispatchCommandQueryParms *pQueryParms, cdbdisp_dispatchToGang(ds, primaryGang, -1); if ((flags & DF_NEED_TWO_PHASE) != 0 || isDtxExplicitBegin()) - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); @@ -1157,7 +1157,7 @@ cdbdisp_dispatchX(QueryDesc* queryDesc, cdbdisp_dispatchToGang(ds, primaryGang, si); if (planRequiresTxn || isDtxExplicitBegin()) - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); SIMPLE_FAULT_INJECTOR("after_one_slice_dispatched"); } @@ -1417,7 +1417,7 @@ CdbDispatchCopyStart(struct CdbCopy *cdbCopy, Node *stmt, int flags) bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupTwoPhaseTransaction(); + setupDtxTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchCopyStart: %s (needTwoPhase = %s)", @@ -1443,7 +1443,7 @@ CdbDispatchCopyStart(struct CdbCopy *cdbCopy, Node *stmt, int flags) cdbdisp_dispatchToGang(ds, primaryGang, -1); if ((flags & DF_NEED_TWO_PHASE) != 0 || isDtxExplicitBegin()) - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 4a68832f20b0..c3537bb15ac1 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -269,7 +269,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) MemoryContext oldcontext; GpExecIdentity exec_identity; bool shouldDispatch; - bool needDtxTwoPhase; + bool needDtx; /* sanity checks: queryDesc must not be started already */ Assert(queryDesc != NULL); @@ -632,9 +632,9 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * ExecutorSaysTransactionDoesWrites() before any dispatch * work for this query. */ - needDtxTwoPhase = ExecutorSaysTransactionDoesWrites(); - if (needDtxTwoPhase) - setupTwoPhaseTransaction(); + needDtx = ExecutorSaysTransactionDoesWrites(); + if (needDtx) + setupDtxTransaction(); if (queryDesc->ddesc != NULL) { @@ -687,7 +687,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * Main plan is parallel, send plan to it. */ if (queryDesc->plannedstmt->planTree->dispatch == DISPATCH_PARALLEL) - CdbDispatchPlan(queryDesc, needDtxTwoPhase, true); + CdbDispatchPlan(queryDesc, needDtx, true); } /* diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index 500f704dc317..eef1076a6150 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -971,7 +971,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext, QueryDesc *queryDesc ArrayBuildState *astate = NULL; Size savepeakspace = MemoryContextGetPeakSpace(planstate->state->es_query_cxt); - bool needDtxTwoPhase; + bool needDtx; bool shouldDispatch = false; volatile bool explainRecvStats = false; @@ -999,14 +999,14 @@ PG_TRY(); { if (shouldDispatch) { - needDtxTwoPhase = isCurrentDtxTwoPhaseActivated(); + needDtx = isCurrentDtxActivated(); /* * This call returns after launching the threads that send the * command to the appropriate segdbs. It does not wait for them * to finish unless an error is detected before all are dispatched. */ - CdbDispatchPlan(queryDesc, needDtxTwoPhase, true); + CdbDispatchPlan(queryDesc, needDtx, true); /* * Set up the interconnect for execution of the initplan root slice. diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 9ec61bab8279..148313c92aa9 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -396,7 +396,6 @@ errstart(int elevel, const char *filename, int lineno, case DTX_STATE_PREPARED: case DTX_STATE_INSERTING_COMMITTED: case DTX_STATE_INSERTED_COMMITTED: - case DTX_STATE_FORCED_COMMITTED: case DTX_STATE_NOTIFYING_COMMIT_PREPARED: case DTX_STATE_NOTIFYING_ABORT_SOME_PREPARED: case DTX_STATE_NOTIFYING_ABORT_PREPARED: @@ -408,7 +407,7 @@ errstart(int elevel, const char *filename, int lineno, case DTX_STATE_NONE: case DTX_STATE_ACTIVE_DISTRIBUTED: case DTX_STATE_ONE_PHASE_COMMIT: - case DTX_STATE_PERFORMING_ONE_PHASE_COMMIT: + case DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT: case DTX_STATE_INSERTING_FORGET_COMMITTED: case DTX_STATE_INSERTED_FORGET_COMMITTED: case DTX_STATE_NOTIFYING_ABORT_NO_PREPARED: diff --git a/src/backend/utils/gpmon/gpmon.c b/src/backend/utils/gpmon/gpmon.c index f31899c5433c..7c7bb4dca39e 100644 --- a/src/backend/utils/gpmon/gpmon.c +++ b/src/backend/utils/gpmon/gpmon.c @@ -164,7 +164,7 @@ void gpmon_gettmid(int32* tmid) *tmid = (int32)QEDtxContextInfo.distributedSnapshot.distribTransactionTimeStamp; else /* On QD */ - *tmid = (int32)getDtxStartTime(); + *tmid = (int32)getDtmStartTime(); } diff --git a/src/include/cdb/cdbdisp_dtx.h b/src/include/cdb/cdbdisp_dtx.h index e922e97491ce..b433eaa78a67 100644 --- a/src/include/cdb/cdbdisp_dtx.h +++ b/src/include/cdb/cdbdisp_dtx.h @@ -37,7 +37,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, ErrorData **qeError, int *resultCount, bool* badGangs, - List *twophaseSegments, + List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen); diff --git a/src/include/cdb/cdblocaldistribxact.h b/src/include/cdb/cdblocaldistribxact.h index 6599f45f0d19..04fe8a2e29ef 100644 --- a/src/include/cdb/cdblocaldistribxact.h +++ b/src/include/cdb/cdblocaldistribxact.h @@ -51,12 +51,10 @@ extern char* LocalDistribXact_DisplayString(int pgprocno); extern bool LocalDistribXactCache_CommittedFind( TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId *distribXid); extern void LocalDistribXactCache_AddCommitted( TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId distribXid); extern void LocalDistribXactCache_ShowStats(char *nameStr); diff --git a/src/include/cdb/cdbtm.h b/src/include/cdb/cdbtm.h index 1a3173b06196..7a2a52d5dc95 100644 --- a/src/include/cdb/cdbtm.h +++ b/src/include/cdb/cdbtm.h @@ -40,7 +40,7 @@ typedef enum * For one-phase optimization commit, we haven't run the commit yet */ DTX_STATE_ONE_PHASE_COMMIT, - DTX_STATE_PERFORMING_ONE_PHASE_COMMIT, + DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT, /** * For two-phase commit, the first phase is about to run @@ -53,7 +53,6 @@ typedef enum DTX_STATE_PREPARED, DTX_STATE_INSERTING_COMMITTED, DTX_STATE_INSERTED_COMMITTED, - DTX_STATE_FORCED_COMMITTED, DTX_STATE_NOTIFYING_COMMIT_PREPARED, DTX_STATE_INSERTING_FORGET_COMMITTED, DTX_STATE_INSERTED_FORGET_COMMITTED, @@ -236,8 +235,8 @@ typedef struct TMGXACTLOCAL bool writerGangLost; - Bitmapset *twophaseSegmentsMap; - List *twophaseSegments; + Bitmapset *dtxSegmentsMap; + List *dtxSegments; } TMGXACTLOCAL; typedef struct TMGXACTSTATUS @@ -276,7 +275,7 @@ extern volatile int *shmNumCommittedGxacts; extern char *DtxStateToString(DtxState state); extern char *DtxProtocolCommandToString(DtxProtocolCommand command); extern char *DtxContextToString(DtxContext context); -extern DistributedTransactionTimeStamp getDtxStartTime(void); +extern DistributedTransactionTimeStamp getDtmStartTime(void); extern void dtxCrackOpenGid(const char *gid, DistributedTransactionTimeStamp *distribTimeStamp, DistributedTransactionId *distribXid); @@ -301,10 +300,9 @@ extern void redoDtxCheckPoint(TMGXACT_CHECKPOINT *gxact_checkpoint); extern void redoDistributedCommitRecord(TMGXACT_LOG *gxact_log); extern void redoDistributedForgetCommitRecord(TMGXACT_LOG *gxact_log); -extern void setupTwoPhaseTransaction(void); -extern bool isCurrentDtxTwoPhase(void); +extern void setupDtxTransaction(void); extern DtxState getCurrentDtxState(void); -extern bool isCurrentDtxTwoPhaseActivated(void); +extern bool isCurrentDtxActivated(void); extern void sendDtxExplicitBegin(void); extern bool isDtxExplicitBegin(void); @@ -316,10 +314,10 @@ extern int tmShmemSize(void); extern void verify_shared_snapshot_ready(int cid); -int mppTxnOptions(bool needTwoPhase); +int mppTxnOptions(bool needDtx); int mppTxOptions_IsoLevel(int txnOptions); bool isMppTxOptions_ReadOnly(int txnOptions); -bool isMppTxOptions_NeedTwoPhase(int txnOptions); +bool isMppTxOptions_NeedDtx(int txnOptions); bool isMppTxOptions_ExplicitBegin(int txnOptions); extern void getAllDistributedXactStatus(TMGALLXACTSTATUS **allDistributedXactStatus); @@ -336,14 +334,14 @@ extern void UtilityModeCloseDtmRedoFile(void); extern bool currentDtxDispatchProtocolCommand(DtxProtocolCommand dtxProtocolCommand, bool raiseError); extern bool doDispatchSubtransactionInternalCmd(DtxProtocolCommand cmdType); extern bool doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, char *gid, - bool *badGangs, bool raiseError, List *twophaseSegments, + bool *badGangs, bool raiseError, List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen); extern void markCurrentGxactWriterGangLost(void); extern bool currentGxactWriterGangLost(void); -extern void addToGxactTwophaseSegments(struct Gang* gp); +extern void addToGxactDtxSegments(struct Gang* gp); extern void ClearTransactionState(TransactionId latestXid); From 50d2c793a537836b69aa2e0efc68bac982f9cbf9 Mon Sep 17 00:00:00 2001 From: Paul Guo Date: Thu, 13 Feb 2020 18:25:32 +0800 Subject: [PATCH 016/102] Revert "Various dtm related code cleanup. (#9543)" This reverts commit 1c46340a22d8aa3a476c251ac5dc40de8abf8bc3. unit test fails (I did not test this locally). --- src/backend/cdb/cdbdistributedsnapshot.c | 6 +- src/backend/cdb/cdbdtxcontextinfo.c | 6 +- src/backend/cdb/cdblocaldistribxact.c | 8 +- src/backend/cdb/cdbtm.c | 167 +++++++++++++-------- src/backend/cdb/cdbtmutils.c | 2 + src/backend/cdb/dispatcher/cdbdisp_dtx.c | 6 +- src/backend/cdb/dispatcher/cdbdisp_query.c | 16 +- src/backend/executor/execMain.c | 10 +- src/backend/executor/nodeSubplan.c | 6 +- src/backend/utils/error/elog.c | 3 +- src/backend/utils/gpmon/gpmon.c | 2 +- src/include/cdb/cdbdisp_dtx.h | 2 +- src/include/cdb/cdblocaldistribxact.h | 2 + src/include/cdb/cdbtm.h | 22 +-- 14 files changed, 155 insertions(+), 103 deletions(-) diff --git a/src/backend/cdb/cdbdistributedsnapshot.c b/src/backend/cdb/cdbdistributedsnapshot.c index ff2b0471189a..49f31da824f7 100644 --- a/src/backend/cdb/cdbdistributedsnapshot.c +++ b/src/backend/cdb/cdbdistributedsnapshot.c @@ -84,6 +84,7 @@ DistributedSnapshotWithLocalMapping_CommittedTest( * Is this local xid in a process-local cache we maintain? */ if (LocalDistribXactCache_CommittedFind(localXid, + ds->distribTransactionTimeStamp, &distribXid)) { /* @@ -131,7 +132,9 @@ DistributedSnapshotWithLocalMapping_CommittedTest( /* * Since we did not find it in our process local cache, add it. */ - LocalDistribXactCache_AddCommitted(localXid, + LocalDistribXactCache_AddCommitted( + localXid, + ds->distribTransactionTimeStamp, distribXid); } else @@ -142,6 +145,7 @@ DistributedSnapshotWithLocalMapping_CommittedTest( * transaction, it must be local-only. */ LocalDistribXactCache_AddCommitted(localXid, + ds->distribTransactionTimeStamp, /* distribXid */ InvalidDistributedTransactionId); return DISTRIBUTEDSNAPSHOT_COMMITTED_IGNORE; diff --git a/src/backend/cdb/cdbdtxcontextinfo.c b/src/backend/cdb/cdbdtxcontextinfo.c index 4a38616929c7..5e38155c924a 100644 --- a/src/backend/cdb/cdbdtxcontextinfo.c +++ b/src/backend/cdb/cdbdtxcontextinfo.c @@ -49,7 +49,7 @@ DtxContextInfo_CreateOnMaster(DtxContextInfo *dtxContextInfo, bool inCursor, dtxContextInfo->distributedXid = getDistributedTransactionId(); if (dtxContextInfo->distributedXid != InvalidDistributedTransactionId) { - dtxContextInfo->distributedTimeStamp = getDtmStartTime(); + dtxContextInfo->distributedTimeStamp = getDtxStartTime(); getDistributedTransactionIdentifier(dtxContextInfo->distributedId); dtxContextInfo->curcid = curcid; @@ -121,9 +121,9 @@ DtxContextInfo_CreateOnMaster(DtxContextInfo *dtxContextInfo, bool inCursor, dtxContextInfo->curcid); elog((Debug_print_full_dtm ? LOG : DEBUG5), - "DtxContextInfo_CreateOnMaster txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", + "DtxContextInfo_CreateOnMaster txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", txnOptions, - (isMppTxOptions_NeedDtx(txnOptions) ? "true" : "false"), + (isMppTxOptions_NeedTwoPhase(txnOptions) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(txnOptions) ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(txnOptions)), (isMppTxOptions_ReadOnly(txnOptions) ? "true" : "false")); diff --git a/src/backend/cdb/cdblocaldistribxact.c b/src/backend/cdb/cdblocaldistribxact.c index a44c6057d08a..462f82d5c7de 100644 --- a/src/backend/cdb/cdblocaldistribxact.c +++ b/src/backend/cdb/cdblocaldistribxact.c @@ -197,7 +197,9 @@ static struct LocalDistribXactCache bool -LocalDistribXactCache_CommittedFind(TransactionId localXid, +LocalDistribXactCache_CommittedFind( + TransactionId localXid, + DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId *distribXid) { LocalDistribXactCacheEntry *entry; @@ -263,7 +265,9 @@ LocalDistribXactCache_CommittedFind(TransactionId localXid, } void -LocalDistribXactCache_AddCommitted(TransactionId localXid, +LocalDistribXactCache_AddCommitted( + TransactionId localXid, + DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId distribXid) { LocalDistribXactCacheEntry *entry; diff --git a/src/backend/cdb/cdbtm.c b/src/backend/cdb/cdbtm.c index 85d502834240..74dbe746c70c 100644 --- a/src/backend/cdb/cdbtm.c +++ b/src/backend/cdb/cdbtm.c @@ -96,7 +96,7 @@ int max_tm_gxacts = 100; * bits 2-4 for iso level * bit 5 is for read-only */ -#define GP_OPT_NEED_DTX 0x0001 +#define GP_OPT_NEED_TWO_PHASE 0x0001 #define GP_OPT_ISOLATION_LEVEL_MASK 0x000E #define GP_OPT_READ_UNCOMMITTED (1 << 1) @@ -119,7 +119,7 @@ static void doNotifyingCommitPrepared(void); static void doNotifyingAbort(void); static void retryAbortPrepared(void); static void doQEDistributedExplicitBegin(); -static void currentDtxActivate(void); +static void currentDtxActivateTwoPhase(void); static void setCurrentDtxState(DtxState state); static bool isDtxQueryDispatcher(void); @@ -155,10 +155,39 @@ requireDistributedTransactionContext(DtxContext requiredCurrentContext) } } +/** + * Does DistributedTransactionContext indicate that this is acting as a QD? + */ static bool -isDtxContext(void) +isQDContext(void) { - return DistributedTransactionContext != DTX_CONTEXT_LOCAL_ONLY; + switch (DistributedTransactionContext) + { + case DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE: + case DTX_CONTEXT_QD_RETRY_PHASE_2: + return true; + default: + return false; + } +} + +/** + * Does DistributedTransactionContext indicate that this is acting as a QE? + */ +static bool +isQEContext() +{ + switch (DistributedTransactionContext) + { + case DTX_CONTEXT_QE_ENTRY_DB_SINGLETON: + case DTX_CONTEXT_QE_AUTO_COMMIT_IMPLICIT: + case DTX_CONTEXT_QE_TWO_PHASE_EXPLICIT_WRITER: + case DTX_CONTEXT_QE_TWO_PHASE_IMPLICIT_WRITER: + case DTX_CONTEXT_QE_READER: + return true; + default: + return false; + } } /*========================================================================= @@ -166,7 +195,7 @@ isDtxContext(void) */ DistributedTransactionTimeStamp -getDtmStartTime(void) +getDtxStartTime(void) { if (shmDistribTimeStamp != NULL) return *shmDistribTimeStamp; @@ -177,7 +206,7 @@ getDtmStartTime(void) DistributedTransactionId getDistributedTransactionId(void) { - if (isDtxContext()) + if (isQDContext() || isQEContext()) return MyTmGxact->gxid; else return InvalidDistributedTransactionId; @@ -186,7 +215,7 @@ getDistributedTransactionId(void) DistributedTransactionTimeStamp getDistributedTransactionTimestamp(void) { - if (isDtxContext()) + if (isQDContext() || isQEContext()) return MyTmGxact->distribTimeStamp; else return 0; @@ -197,7 +226,8 @@ getDistributedTransactionIdentifier(char *id) { Assert(MyTmGxactLocal != NULL); - if (isDtxContext() && MyTmGxact->gxid != InvalidDistributedTransactionId) + if ((isQDContext() || isQEContext()) && + MyTmGxact->gxid != InvalidDistributedTransactionId) { /* * The length check here requires the identifer have a trailing @@ -226,13 +256,13 @@ isPreparedDtxTransaction(void) * the current dtx is clean and we aren't in a user-started global transaction. */ bool -isCurrentDtxActivated(void) +isCurrentDtxTwoPhaseActivated(void) { return MyTmGxactLocal->state != DTX_STATE_NONE; } static void -currentDtxActivate(void) +currentDtxActivateTwoPhase(void) { /* * Bump 'shmGIDSeq' and assign it to 'MyTmGxact->gxid', this needs to be atomic. @@ -265,7 +295,7 @@ currentDtxActivate(void) (errmsg("reached the limit of %u global transactions per start", LastDistributedTransactionId))); - MyTmGxact->distribTimeStamp = getDtmStartTime(); + MyTmGxact->distribTimeStamp = getDtxStartTime(); MyTmGxact->sessionId = gp_session_id; setCurrentDtxState(DTX_STATE_ACTIVE_DISTRIBUTED); } @@ -292,7 +322,7 @@ notifyCommittedDtxTransactionIsNeeded(void) return false; } - if (!isCurrentDtxActivated()) + if (!isCurrentDtxTwoPhaseActivated()) { elog(DTM_DEBUG5, "notifyCommittedDtxTransaction nothing to do (two phase not activated)"); return false; @@ -302,7 +332,7 @@ notifyCommittedDtxTransactionIsNeeded(void) } /* - * Notify committed a global transaction, called by user commit + * Notify commited a global transaction, called by user commit * or by CommitTransaction */ void @@ -310,10 +340,11 @@ notifyCommittedDtxTransaction(void) { Assert(Gp_role == GP_ROLE_DISPATCH); Assert(DistributedTransactionContext == DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE); - Assert(isCurrentDtxActivated()); + Assert(isCurrentDtxTwoPhaseActivated()); switch(MyTmGxactLocal->state) { + case DTX_STATE_PREPARED: case DTX_STATE_INSERTED_COMMITTED: doNotifyingCommitPrepared(); break; @@ -328,13 +359,13 @@ notifyCommittedDtxTransaction(void) } void -setupDtxTransaction(void) +setupTwoPhaseTransaction(void) { if (!IsTransactionState()) elog(ERROR, "DTM transaction is not active"); - if (!isCurrentDtxActivated()) - currentDtxActivate(); + if (!isCurrentDtxTwoPhaseActivated()) + currentDtxActivateTwoPhase(); if (MyTmGxactLocal->state != DTX_STATE_ACTIVE_DISTRIBUTED) elog(ERROR, "DTM transaction state (%s) is invalid", DtxStateToString(MyTmGxactLocal->state)); @@ -362,9 +393,9 @@ doDispatchSubtransactionInternalCmd(DtxProtocolCommand cmdType) } if (cmdType == DTX_PROTOCOL_COMMAND_SUBTRANSACTION_BEGIN_INTERNAL && - !isCurrentDtxActivated()) + !isCurrentDtxTwoPhaseActivated()) { - currentDtxActivate(); + currentDtxActivateTwoPhase(); } serializedDtxContextInfo = qdSerializeDtxContextInfo(&serializedDtxContextInfoLen, @@ -411,7 +442,7 @@ doPrepareTransaction(void) elog(DTM_DEBUG5, "doPrepareTransaction moved to state = %s", DtxStateToString(MyTmGxactLocal->state)); - Assert(MyTmGxactLocal->dtxSegments != NIL); + Assert(MyTmGxactLocal->twophaseSegments != NIL); succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_PREPARE, true); /* @@ -507,20 +538,22 @@ doNotifyingOnePhaseCommit(void) bool succeeded; volatile int savedInterruptHoldoffCount; - if (MyTmGxactLocal->dtxSegments == NIL) + if (MyTmGxactLocal->twophaseSegments == NULL) return; elog(DTM_DEBUG5, "doNotifyingOnePhaseCommit entering in state = %s", DtxStateToString(MyTmGxactLocal->state)); Assert(MyTmGxactLocal->state == DTX_STATE_ONE_PHASE_COMMIT); - setCurrentDtxState(DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT); + setCurrentDtxState(DTX_STATE_PERFORMING_ONE_PHASE_COMMIT); savedInterruptHoldoffCount = InterruptHoldoffCount; + Assert(MyTmGxactLocal->twophaseSegments != NIL); + succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_COMMIT_ONEPHASE, true); if (!succeeded) { - Assert(MyTmGxactLocal->state == DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT); + Assert(MyTmGxactLocal->state == DTX_STATE_PERFORMING_ONE_PHASE_COMMIT); elog(ERROR, "one phase commit failed"); } } @@ -541,7 +574,7 @@ doNotifyingCommitPrepared(void) SIMPLE_FAULT_INJECTOR("dtm_broadcast_commit_prepared"); savedInterruptHoldoffCount = InterruptHoldoffCount; - Assert(MyTmGxactLocal->dtxSegments != NIL); + Assert(MyTmGxactLocal->twophaseSegments != NIL); PG_TRY(); { succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_COMMIT_PREPARED, true); @@ -666,7 +699,7 @@ retryAbortPrepared(void) PG_TRY(); { - MyTmGxactLocal->dtxSegments = cdbcomponent_getCdbComponentsList(); + MyTmGxactLocal->twophaseSegments = cdbcomponent_getCdbComponentsList(); succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_RETRY_ABORT_PREPARED, true); if (!succeeded) ereport(WARNING, @@ -718,7 +751,7 @@ doNotifyingAbort(void) * occur before the command is actually dispatched, no need to dispatch DTX for * such cases. */ - if (!MyTmGxactLocal->writerGangLost && MyTmGxactLocal->dtxSegments) + if (!MyTmGxactLocal->writerGangLost && MyTmGxactLocal->twophaseSegments) { succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_ABORT_NO_PREPARED, false); @@ -831,7 +864,7 @@ prepareDtxTransaction(void) return; } - if (!isCurrentDtxActivated()) + if (!isCurrentDtxTwoPhaseActivated()) { Assert(MyTmGxactLocal->state == DTX_STATE_NONE); Assert(Gp_role != GP_ROLE_DISPATCH || MyTmGxact->gxid == InvalidDistributedTransactionId); @@ -846,7 +879,7 @@ prepareDtxTransaction(void) * segments. */ if (!ExecutorDidWriteXLog() || - (!markXidCommitted && list_length(MyTmGxactLocal->dtxSegments) < 2)) + (!markXidCommitted && list_length(MyTmGxactLocal->twophaseSegments) < 2)) { setCurrentDtxState(DTX_STATE_ONE_PHASE_COMMIT); return; @@ -875,7 +908,7 @@ rollbackDtxTransaction(void) DtxContextToString(DistributedTransactionContext)); return; } - if (!isCurrentDtxActivated()) + if (!isCurrentDtxTwoPhaseActivated()) { elog(DTM_DEBUG5, "rollbackDtxTransaction nothing to do (two phase not activate)"); return; @@ -912,7 +945,7 @@ rollbackDtxTransaction(void) break; case DTX_STATE_ONE_PHASE_COMMIT: - case DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT: + case DTX_STATE_PERFORMING_ONE_PHASE_COMMIT: setCurrentDtxState(DTX_STATE_NOTIFYING_ABORT_NO_PREPARED); break; @@ -1087,7 +1120,7 @@ tmShmemInit(void) * after the statement. */ int -mppTxnOptions(bool needDtx) +mppTxnOptions(bool needTwoPhase) { int options = 0; @@ -1096,8 +1129,8 @@ mppTxnOptions(bool needDtx) IsoLevelAsUpperString(DefaultXactIsoLevel), (DefaultXactReadOnly ? "true" : "false"), IsoLevelAsUpperString(XactIsoLevel), (XactReadOnly ? "true" : "false")); - if (needDtx) - options |= GP_OPT_NEED_DTX; + if (needTwoPhase) + options |= GP_OPT_NEED_TWO_PHASE; if (XactIsoLevel == XACT_READ_COMMITTED) options |= GP_OPT_READ_COMMITTED; @@ -1111,13 +1144,13 @@ mppTxnOptions(bool needDtx) if (XactReadOnly) options |= GP_OPT_READ_ONLY; - if (isCurrentDtxActivated() && MyTmGxactLocal->explicitBeginRemembered) + if (isCurrentDtxTwoPhaseActivated() && MyTmGxactLocal->explicitBeginRemembered) options |= GP_OPT_EXPLICT_BEGIN; elog(DTM_DEBUG5, - "mppTxnOptions txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", + "mppTxnOptions txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", options, - (isMppTxOptions_NeedDtx(options) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(options) ? "true" : "false"), + (isMppTxOptions_NeedTwoPhase(options) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(options) ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(options)), (isMppTxOptions_ReadOnly(options) ? "true" : "false")); return options; @@ -1146,9 +1179,9 @@ isMppTxOptions_ReadOnly(int txnOptions) } bool -isMppTxOptions_NeedDtx(int txnOptions) +isMppTxOptions_NeedTwoPhase(int txnOptions) { - return ((txnOptions & GP_OPT_NEED_DTX) != 0); + return ((txnOptions & GP_OPT_NEED_TWO_PHASE) != 0); } /* isMppTxOptions_ExplicitBegin: @@ -1171,14 +1204,14 @@ currentDtxDispatchProtocolCommand(DtxProtocolCommand dtxProtocolCommand, bool ra dtxFormGID(gid, getDistributedTransactionTimestamp(), getDistributedTransactionId()); return doDispatchDtxProtocolCommand(dtxProtocolCommand, gid, badgang, raiseError, - MyTmGxactLocal->dtxSegments, NULL, 0); + MyTmGxactLocal->twophaseSegments, NULL, 0); } bool doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, char *gid, bool *badGangs, bool raiseError, - List *dtxSegments, + List *twophaseSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen) { @@ -1190,7 +1223,7 @@ doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, struct pg_result **results; - if (!dtxSegments) + if (!twophaseSegments) return true; dtxProtocolCommandStr = DtxProtocolCommandToString(dtxProtocolCommand); @@ -1198,18 +1231,18 @@ doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, if (Test_print_direct_dispatch_info) elog(INFO, "Distributed transaction command '%s' to %s", dtxProtocolCommandStr, - segmentsToContentStr(dtxSegments)); + segmentsToContentStr(twophaseSegments)); ereport(DTM_DEBUG5, (errmsg("dispatchDtxProtocolCommand: %d ('%s'), direct content #: %s", dtxProtocolCommand, dtxProtocolCommandStr, - segmentsToContentStr(dtxSegments)))); + segmentsToContentStr(twophaseSegments)))); ErrorData *qeError; results = CdbDispatchDtxProtocolCommand(dtxProtocolCommand, dtxProtocolCommandStr, gid, - &qeError, &resultCount, badGangs, dtxSegments, + &qeError, &resultCount, badGangs, twophaseSegments, serializedDtxContextInfo, serializedDtxContextInfoLen); if (qeError) @@ -1359,8 +1392,8 @@ resetGxact() MyTmGxactLocal->explicitBeginRemembered = false; MyTmGxactLocal->badPrepareGangs = false; MyTmGxactLocal->writerGangLost = false; - MyTmGxactLocal->dtxSegmentsMap = NULL; - MyTmGxactLocal->dtxSegments = NIL; + MyTmGxactLocal->twophaseSegmentsMap = NULL; + MyTmGxactLocal->twophaseSegments = NIL; MyTmGxactLocal->isOnePhaseCommit = false; setCurrentDtxState(DTX_STATE_NONE); } @@ -1382,7 +1415,7 @@ getNextDistributedXactStatus(TMGALLXACTSTATUS *allDistributedXactStatus, TMGXACT static void clearAndResetGxact(void) { - Assert(isCurrentDtxActivated()); + Assert(isCurrentDtxTwoPhaseActivated()); LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); ProcArrayEndGxact(); @@ -1582,8 +1615,9 @@ setupRegularDtxContext(void) Assert(DistributedTransactionContext == DTX_CONTEXT_LOCAL_ONLY); if (isDtxQueryDispatcher()) + { setDistributedTransactionContext(DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE); - + } break; } @@ -1601,7 +1635,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) { DistributedSnapshot *distributedSnapshot; int txnOptions; - bool needDtx; + bool needTwoPhase; bool explicitBegin; bool haveDistributedSnapshot; bool isEntryDbSingleton = false; @@ -1617,7 +1651,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) distributedSnapshot = &dtxContextInfo->distributedSnapshot; txnOptions = dtxContextInfo->distributedTxnOptions; - needDtx = isMppTxOptions_NeedDtx(txnOptions); + needTwoPhase = isMppTxOptions_NeedTwoPhase(txnOptions); explicitBegin = isMppTxOptions_ExplicitBegin(txnOptions); haveDistributedSnapshot = dtxContextInfo->haveDistributedSnapshot; @@ -1627,9 +1661,9 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) { elog(DTM_DEBUG5, "setupQEDtxContext inputs (part 1): Gp_role = %s, Gp_is_writer = %s, " - "txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s, haveDistributedSnapshot = %s.", + "txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s, haveDistributedSnapshot = %s.", role_to_string(Gp_role), (Gp_is_writer ? "true" : "false"), txnOptions, - (needDtx ? "true" : "false"), (explicitBegin ? "true" : "false"), + (needTwoPhase ? "true" : "false"), (explicitBegin ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(txnOptions)), (isMppTxOptions_ReadOnly(txnOptions) ? "true" : "false"), (haveDistributedSnapshot ? "true" : "false")); elog(DTM_DEBUG5, @@ -1740,7 +1774,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) setDistributedTransactionContext(DTX_CONTEXT_QE_READER); } - else if (isWriterQE && (explicitBegin || needDtx)) + else if (isWriterQE && (explicitBegin || needTwoPhase)) { if (!haveDistributedSnapshot) { @@ -1765,7 +1799,10 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) doQEDistributedExplicitBegin(); } else + { + Assert(needTwoPhase); setDistributedTransactionContext(DTX_CONTEXT_QE_TWO_PHASE_IMPLICIT_WRITER); + } } else if (haveDistributedSnapshot) { @@ -1865,7 +1902,7 @@ finishDistributedTransactionContext(char *debugCaller, bool aborted) * We let the 2 retry states go up to PostgresMain.c, otherwise everything * MUST be complete. */ - if (isCurrentDtxActivated() && + if (isCurrentDtxTwoPhaseActivated() && (MyTmGxactLocal->state != DTX_STATE_RETRY_COMMIT_PREPARED && MyTmGxactLocal->state != DTX_STATE_RETRY_ABORT_PREPARED)) { @@ -1891,7 +1928,7 @@ finishDistributedTransactionContext(char *debugCaller, bool aborted) static void rememberDtxExplicitBegin(void) { - Assert (isCurrentDtxActivated()); + Assert (isCurrentDtxTwoPhaseActivated()); if (!MyTmGxactLocal->explicitBeginRemembered) { @@ -1911,7 +1948,7 @@ rememberDtxExplicitBegin(void) bool isDtxExplicitBegin(void) { - return (isCurrentDtxActivated() && MyTmGxactLocal->explicitBeginRemembered); + return (isCurrentDtxTwoPhaseActivated() && MyTmGxactLocal->explicitBeginRemembered); } /* @@ -1924,7 +1961,7 @@ sendDtxExplicitBegin(void) if (Gp_role != GP_ROLE_DISPATCH) return; - setupDtxTransaction(); + setupTwoPhaseTransaction(); rememberDtxExplicitBegin(); } @@ -2275,18 +2312,18 @@ currentGxactWriterGangLost(void) * Record which segment involved in the two phase commit. */ void -addToGxactDtxSegments(Gang *gang) +addToGxactTwophaseSegments(Gang *gang) { SegmentDatabaseDescriptor *segdbDesc; MemoryContext oldContext; int segindex; int i; - if (!isCurrentDtxActivated()) + if (!isCurrentDtxTwoPhaseActivated()) return; /* skip if all segdbs are in the list */ - if (list_length(MyTmGxactLocal->dtxSegments) >= getgpsegmentCount()) + if (list_length(MyTmGxactLocal->twophaseSegments) >= getgpsegmentCount()) return; oldContext = MemoryContextSwitchTo(TopTransactionContext); @@ -2301,14 +2338,14 @@ addToGxactDtxSegments(Gang *gang) continue; /* skip if record already */ - if (bms_is_member(segindex, MyTmGxactLocal->dtxSegmentsMap)) + if (bms_is_member(segindex, MyTmGxactLocal->twophaseSegmentsMap)) continue; - MyTmGxactLocal->dtxSegmentsMap = - bms_add_member(MyTmGxactLocal->dtxSegmentsMap, segindex); + MyTmGxactLocal->twophaseSegmentsMap = + bms_add_member(MyTmGxactLocal->twophaseSegmentsMap, segindex); - MyTmGxactLocal->dtxSegments = - lappend_int(MyTmGxactLocal->dtxSegments, segindex); + MyTmGxactLocal->twophaseSegments = + lappend_int(MyTmGxactLocal->twophaseSegments, segindex); } MemoryContextSwitchTo(oldContext); } diff --git a/src/backend/cdb/cdbtmutils.c b/src/backend/cdb/cdbtmutils.c index 6fcb18816168..3eb074d9ba28 100644 --- a/src/backend/cdb/cdbtmutils.c +++ b/src/backend/cdb/cdbtmutils.c @@ -64,6 +64,8 @@ DtxStateToString(DtxState state) return "Inserting Committed"; case DTX_STATE_INSERTED_COMMITTED: return "Inserted Committed"; + case DTX_STATE_FORCED_COMMITTED: + return "Forced Committed"; case DTX_STATE_NOTIFYING_COMMIT_PREPARED: return "Notifying Commit Prepared"; case DTX_STATE_INSERTING_FORGET_COMMITTED: diff --git a/src/backend/cdb/dispatcher/cdbdisp_dtx.c b/src/backend/cdb/dispatcher/cdbdisp_dtx.c index c9f6c8336ac0..048996df430d 100644 --- a/src/backend/cdb/dispatcher/cdbdisp_dtx.c +++ b/src/backend/cdb/dispatcher/cdbdisp_dtx.c @@ -71,7 +71,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, ErrorData **qeError, int *numresults, bool *badGangs, - List *dtxSegments, + List *twophaseSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen) { @@ -102,7 +102,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, queryText = buildGpDtxProtocolCommand(&dtxProtocolParms, &queryTextLen); - primaryGang = AllocateGang(ds, GANGTYPE_PRIMARY_WRITER, dtxSegments); + primaryGang = AllocateGang(ds, GANGTYPE_PRIMARY_WRITER, twophaseSegments); Assert(primaryGang); @@ -110,7 +110,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, cdbdisp_makeDispatchParams(ds, 1, queryText, queryTextLen); cdbdisp_dispatchToGang(ds, primaryGang, -1); - addToGxactDtxSegments(primaryGang); + addToGxactTwophaseSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); diff --git a/src/backend/cdb/dispatcher/cdbdisp_query.c b/src/backend/cdb/dispatcher/cdbdisp_query.c index 9490c764c91b..f87baa226519 100644 --- a/src/backend/cdb/dispatcher/cdbdisp_query.c +++ b/src/backend/cdb/dispatcher/cdbdisp_query.c @@ -295,11 +295,11 @@ CdbDispatchSetCommand(const char *strCommand, bool cancelOnError) cdbdisp_dispatchToGang(ds, rg, -1); } - addToGxactDtxSegments(primaryGang); + addToGxactTwophaseSegments(primaryGang); /* * No need for two-phase commit, so no need to call - * addToGxactDtxSegments. + * addToGxactTwophaseSegments. */ cdbdisp_waitDispatchFinish(ds); @@ -359,7 +359,7 @@ CdbDispatchCommandToSegments(const char *strCommand, bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupDtxTransaction(); + setupTwoPhaseTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchCommand: %s (needTwoPhase = %s)", @@ -397,7 +397,7 @@ CdbDispatchUtilityStatement(struct Node *stmt, bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupDtxTransaction(); + setupTwoPhaseTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchUtilityStatement: %s (needTwoPhase = %s)", @@ -443,7 +443,7 @@ cdbdisp_dispatchCommandInternal(DispatchCommandQueryParms *pQueryParms, cdbdisp_dispatchToGang(ds, primaryGang, -1); if ((flags & DF_NEED_TWO_PHASE) != 0 || isDtxExplicitBegin()) - addToGxactDtxSegments(primaryGang); + addToGxactTwophaseSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); @@ -1157,7 +1157,7 @@ cdbdisp_dispatchX(QueryDesc* queryDesc, cdbdisp_dispatchToGang(ds, primaryGang, si); if (planRequiresTxn || isDtxExplicitBegin()) - addToGxactDtxSegments(primaryGang); + addToGxactTwophaseSegments(primaryGang); SIMPLE_FAULT_INJECTOR("after_one_slice_dispatched"); } @@ -1417,7 +1417,7 @@ CdbDispatchCopyStart(struct CdbCopy *cdbCopy, Node *stmt, int flags) bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupDtxTransaction(); + setupTwoPhaseTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchCopyStart: %s (needTwoPhase = %s)", @@ -1443,7 +1443,7 @@ CdbDispatchCopyStart(struct CdbCopy *cdbCopy, Node *stmt, int flags) cdbdisp_dispatchToGang(ds, primaryGang, -1); if ((flags & DF_NEED_TWO_PHASE) != 0 || isDtxExplicitBegin()) - addToGxactDtxSegments(primaryGang); + addToGxactTwophaseSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index c3537bb15ac1..4a68832f20b0 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -269,7 +269,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) MemoryContext oldcontext; GpExecIdentity exec_identity; bool shouldDispatch; - bool needDtx; + bool needDtxTwoPhase; /* sanity checks: queryDesc must not be started already */ Assert(queryDesc != NULL); @@ -632,9 +632,9 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * ExecutorSaysTransactionDoesWrites() before any dispatch * work for this query. */ - needDtx = ExecutorSaysTransactionDoesWrites(); - if (needDtx) - setupDtxTransaction(); + needDtxTwoPhase = ExecutorSaysTransactionDoesWrites(); + if (needDtxTwoPhase) + setupTwoPhaseTransaction(); if (queryDesc->ddesc != NULL) { @@ -687,7 +687,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * Main plan is parallel, send plan to it. */ if (queryDesc->plannedstmt->planTree->dispatch == DISPATCH_PARALLEL) - CdbDispatchPlan(queryDesc, needDtx, true); + CdbDispatchPlan(queryDesc, needDtxTwoPhase, true); } /* diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index eef1076a6150..500f704dc317 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -971,7 +971,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext, QueryDesc *queryDesc ArrayBuildState *astate = NULL; Size savepeakspace = MemoryContextGetPeakSpace(planstate->state->es_query_cxt); - bool needDtx; + bool needDtxTwoPhase; bool shouldDispatch = false; volatile bool explainRecvStats = false; @@ -999,14 +999,14 @@ PG_TRY(); { if (shouldDispatch) { - needDtx = isCurrentDtxActivated(); + needDtxTwoPhase = isCurrentDtxTwoPhaseActivated(); /* * This call returns after launching the threads that send the * command to the appropriate segdbs. It does not wait for them * to finish unless an error is detected before all are dispatched. */ - CdbDispatchPlan(queryDesc, needDtx, true); + CdbDispatchPlan(queryDesc, needDtxTwoPhase, true); /* * Set up the interconnect for execution of the initplan root slice. diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 148313c92aa9..9ec61bab8279 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -396,6 +396,7 @@ errstart(int elevel, const char *filename, int lineno, case DTX_STATE_PREPARED: case DTX_STATE_INSERTING_COMMITTED: case DTX_STATE_INSERTED_COMMITTED: + case DTX_STATE_FORCED_COMMITTED: case DTX_STATE_NOTIFYING_COMMIT_PREPARED: case DTX_STATE_NOTIFYING_ABORT_SOME_PREPARED: case DTX_STATE_NOTIFYING_ABORT_PREPARED: @@ -407,7 +408,7 @@ errstart(int elevel, const char *filename, int lineno, case DTX_STATE_NONE: case DTX_STATE_ACTIVE_DISTRIBUTED: case DTX_STATE_ONE_PHASE_COMMIT: - case DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT: + case DTX_STATE_PERFORMING_ONE_PHASE_COMMIT: case DTX_STATE_INSERTING_FORGET_COMMITTED: case DTX_STATE_INSERTED_FORGET_COMMITTED: case DTX_STATE_NOTIFYING_ABORT_NO_PREPARED: diff --git a/src/backend/utils/gpmon/gpmon.c b/src/backend/utils/gpmon/gpmon.c index 7c7bb4dca39e..f31899c5433c 100644 --- a/src/backend/utils/gpmon/gpmon.c +++ b/src/backend/utils/gpmon/gpmon.c @@ -164,7 +164,7 @@ void gpmon_gettmid(int32* tmid) *tmid = (int32)QEDtxContextInfo.distributedSnapshot.distribTransactionTimeStamp; else /* On QD */ - *tmid = (int32)getDtmStartTime(); + *tmid = (int32)getDtxStartTime(); } diff --git a/src/include/cdb/cdbdisp_dtx.h b/src/include/cdb/cdbdisp_dtx.h index b433eaa78a67..e922e97491ce 100644 --- a/src/include/cdb/cdbdisp_dtx.h +++ b/src/include/cdb/cdbdisp_dtx.h @@ -37,7 +37,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, ErrorData **qeError, int *resultCount, bool* badGangs, - List *dtxSegments, + List *twophaseSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen); diff --git a/src/include/cdb/cdblocaldistribxact.h b/src/include/cdb/cdblocaldistribxact.h index 04fe8a2e29ef..6599f45f0d19 100644 --- a/src/include/cdb/cdblocaldistribxact.h +++ b/src/include/cdb/cdblocaldistribxact.h @@ -51,10 +51,12 @@ extern char* LocalDistribXact_DisplayString(int pgprocno); extern bool LocalDistribXactCache_CommittedFind( TransactionId localXid, + DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId *distribXid); extern void LocalDistribXactCache_AddCommitted( TransactionId localXid, + DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId distribXid); extern void LocalDistribXactCache_ShowStats(char *nameStr); diff --git a/src/include/cdb/cdbtm.h b/src/include/cdb/cdbtm.h index 7a2a52d5dc95..1a3173b06196 100644 --- a/src/include/cdb/cdbtm.h +++ b/src/include/cdb/cdbtm.h @@ -40,7 +40,7 @@ typedef enum * For one-phase optimization commit, we haven't run the commit yet */ DTX_STATE_ONE_PHASE_COMMIT, - DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT, + DTX_STATE_PERFORMING_ONE_PHASE_COMMIT, /** * For two-phase commit, the first phase is about to run @@ -53,6 +53,7 @@ typedef enum DTX_STATE_PREPARED, DTX_STATE_INSERTING_COMMITTED, DTX_STATE_INSERTED_COMMITTED, + DTX_STATE_FORCED_COMMITTED, DTX_STATE_NOTIFYING_COMMIT_PREPARED, DTX_STATE_INSERTING_FORGET_COMMITTED, DTX_STATE_INSERTED_FORGET_COMMITTED, @@ -235,8 +236,8 @@ typedef struct TMGXACTLOCAL bool writerGangLost; - Bitmapset *dtxSegmentsMap; - List *dtxSegments; + Bitmapset *twophaseSegmentsMap; + List *twophaseSegments; } TMGXACTLOCAL; typedef struct TMGXACTSTATUS @@ -275,7 +276,7 @@ extern volatile int *shmNumCommittedGxacts; extern char *DtxStateToString(DtxState state); extern char *DtxProtocolCommandToString(DtxProtocolCommand command); extern char *DtxContextToString(DtxContext context); -extern DistributedTransactionTimeStamp getDtmStartTime(void); +extern DistributedTransactionTimeStamp getDtxStartTime(void); extern void dtxCrackOpenGid(const char *gid, DistributedTransactionTimeStamp *distribTimeStamp, DistributedTransactionId *distribXid); @@ -300,9 +301,10 @@ extern void redoDtxCheckPoint(TMGXACT_CHECKPOINT *gxact_checkpoint); extern void redoDistributedCommitRecord(TMGXACT_LOG *gxact_log); extern void redoDistributedForgetCommitRecord(TMGXACT_LOG *gxact_log); -extern void setupDtxTransaction(void); +extern void setupTwoPhaseTransaction(void); +extern bool isCurrentDtxTwoPhase(void); extern DtxState getCurrentDtxState(void); -extern bool isCurrentDtxActivated(void); +extern bool isCurrentDtxTwoPhaseActivated(void); extern void sendDtxExplicitBegin(void); extern bool isDtxExplicitBegin(void); @@ -314,10 +316,10 @@ extern int tmShmemSize(void); extern void verify_shared_snapshot_ready(int cid); -int mppTxnOptions(bool needDtx); +int mppTxnOptions(bool needTwoPhase); int mppTxOptions_IsoLevel(int txnOptions); bool isMppTxOptions_ReadOnly(int txnOptions); -bool isMppTxOptions_NeedDtx(int txnOptions); +bool isMppTxOptions_NeedTwoPhase(int txnOptions); bool isMppTxOptions_ExplicitBegin(int txnOptions); extern void getAllDistributedXactStatus(TMGALLXACTSTATUS **allDistributedXactStatus); @@ -334,14 +336,14 @@ extern void UtilityModeCloseDtmRedoFile(void); extern bool currentDtxDispatchProtocolCommand(DtxProtocolCommand dtxProtocolCommand, bool raiseError); extern bool doDispatchSubtransactionInternalCmd(DtxProtocolCommand cmdType); extern bool doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, char *gid, - bool *badGangs, bool raiseError, List *dtxSegments, + bool *badGangs, bool raiseError, List *twophaseSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen); extern void markCurrentGxactWriterGangLost(void); extern bool currentGxactWriterGangLost(void); -extern void addToGxactDtxSegments(struct Gang* gp); +extern void addToGxactTwophaseSegments(struct Gang* gp); extern void ClearTransactionState(TransactionId latestXid); From cd43e75eadaeb6ee0db5a1c911fada253196ff79 Mon Sep 17 00:00:00 2001 From: Shreedhar Hardikar Date: Thu, 13 Feb 2020 12:15:16 -0600 Subject: [PATCH 017/102] Bump ORCA version to v3.91.0 Includes ORCA-side refactors & renames in preparation for supporting opfamilies in ORCA. --- concourse/tasks/compile_gpdb.yml | 2 +- config/orca.m4 | 4 ++-- configure | 4 ++-- depends/conanfile_orca.txt | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/concourse/tasks/compile_gpdb.yml b/concourse/tasks/compile_gpdb.yml index 686178ddff10..ce0820051db9 100644 --- a/concourse/tasks/compile_gpdb.yml +++ b/concourse/tasks/compile_gpdb.yml @@ -19,5 +19,5 @@ params: BLD_TARGETS: OUTPUT_ARTIFACT_DIR: gpdb_artifacts CONFIGURE_FLAGS: - ORCA_TAG: v3.90.0 + ORCA_TAG: v3.91.0 RC_BUILD_TYPE_GCS: diff --git a/config/orca.m4 b/config/orca.m4 index 2eb2195e674d..3db91ead13f5 100644 --- a/config/orca.m4 +++ b/config/orca.m4 @@ -40,10 +40,10 @@ AC_RUN_IFELSE([AC_LANG_PROGRAM([[ #include ]], [ -return strncmp("3.90.", GPORCA_VERSION_STRING, 5); +return strncmp("3.91.", GPORCA_VERSION_STRING, 5); ])], [AC_MSG_RESULT([[ok]])], -[AC_MSG_ERROR([Your ORCA version is expected to be 3.90.XXX])] +[AC_MSG_ERROR([Your ORCA version is expected to be 3.91.XXX])] ) AC_LANG_POP([C++]) ])# PGAC_CHECK_ORCA_VERSION diff --git a/configure b/configure index 0324cee4fea6..1e36a0f11c31 100755 --- a/configure +++ b/configure @@ -14948,7 +14948,7 @@ int main () { -return strncmp("3.90.", GPORCA_VERSION_STRING, 5); +return strncmp("3.91.", GPORCA_VERSION_STRING, 5); ; return 0; @@ -14958,7 +14958,7 @@ if ac_fn_cxx_try_run "$LINENO"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 $as_echo "ok" >&6; } else - as_fn_error $? "Your ORCA version is expected to be 3.90.XXX" "$LINENO" 5 + as_fn_error $? "Your ORCA version is expected to be 3.91.XXX" "$LINENO" 5 fi rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ diff --git a/depends/conanfile_orca.txt b/depends/conanfile_orca.txt index 46371b2a1212..2e73d47f0c89 100644 --- a/depends/conanfile_orca.txt +++ b/depends/conanfile_orca.txt @@ -1,5 +1,5 @@ [requires] -orca/v3.90.0@gpdb/stable +orca/v3.91.0@gpdb/stable [imports] include, * -> build/include From d2f1b48d76483d473c968d65b319e6a7914567fb Mon Sep 17 00:00:00 2001 From: Paul Guo Date: Thu, 13 Feb 2020 15:56:52 +0800 Subject: [PATCH 018/102] Various dtm related code cleanup. (#9543) Main changes are: - Merge isQDContext() and isQEContext() since the later is a bit buggy and there is no need to separate them in gpdb master now. - Remove an incorrect or unnecessary switch in notifyCommittedDtxTransaction(). - Rename some two phase variables or functions since they could be used in one phase also. - Remove some unnecessary Assert code (some are because previous code logic has judged; some are due to obvious reasons). - Rename DTX_STATE_PERFORMING_ONE_PHASE_COMMIT to DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT to make code more align with 2PC code. - Remove useless state DTX_STATE_FORCED_COMMITTED. Reviewed-by: Hubert Zhang Reviewed-by: Gang Xiong Cherry-picked from 83da7ddfcfc9d9e1c5a918260c863231c9f84f53 --- src/backend/cdb/cdbdistributedsnapshot.c | 6 +- src/backend/cdb/cdbdtxcontextinfo.c | 6 +- src/backend/cdb/cdblocaldistribxact.c | 8 +- src/backend/cdb/cdbtm.c | 167 ++++++++----------- src/backend/cdb/cdbtmutils.c | 2 - src/backend/cdb/dispatcher/cdbdisp_dtx.c | 6 +- src/backend/cdb/dispatcher/cdbdisp_query.c | 16 +- src/backend/executor/execMain.c | 10 +- src/backend/executor/nodeSubplan.c | 6 +- src/backend/executor/test/nodeSubplan_test.c | 2 +- src/backend/utils/error/elog.c | 3 +- src/backend/utils/gpmon/gpmon.c | 2 +- src/include/cdb/cdbdisp_dtx.h | 2 +- src/include/cdb/cdblocaldistribxact.h | 2 - src/include/cdb/cdbtm.h | 22 ++- 15 files changed, 104 insertions(+), 156 deletions(-) diff --git a/src/backend/cdb/cdbdistributedsnapshot.c b/src/backend/cdb/cdbdistributedsnapshot.c index 49f31da824f7..ff2b0471189a 100644 --- a/src/backend/cdb/cdbdistributedsnapshot.c +++ b/src/backend/cdb/cdbdistributedsnapshot.c @@ -84,7 +84,6 @@ DistributedSnapshotWithLocalMapping_CommittedTest( * Is this local xid in a process-local cache we maintain? */ if (LocalDistribXactCache_CommittedFind(localXid, - ds->distribTransactionTimeStamp, &distribXid)) { /* @@ -132,9 +131,7 @@ DistributedSnapshotWithLocalMapping_CommittedTest( /* * Since we did not find it in our process local cache, add it. */ - LocalDistribXactCache_AddCommitted( - localXid, - ds->distribTransactionTimeStamp, + LocalDistribXactCache_AddCommitted(localXid, distribXid); } else @@ -145,7 +142,6 @@ DistributedSnapshotWithLocalMapping_CommittedTest( * transaction, it must be local-only. */ LocalDistribXactCache_AddCommitted(localXid, - ds->distribTransactionTimeStamp, /* distribXid */ InvalidDistributedTransactionId); return DISTRIBUTEDSNAPSHOT_COMMITTED_IGNORE; diff --git a/src/backend/cdb/cdbdtxcontextinfo.c b/src/backend/cdb/cdbdtxcontextinfo.c index 5e38155c924a..4a38616929c7 100644 --- a/src/backend/cdb/cdbdtxcontextinfo.c +++ b/src/backend/cdb/cdbdtxcontextinfo.c @@ -49,7 +49,7 @@ DtxContextInfo_CreateOnMaster(DtxContextInfo *dtxContextInfo, bool inCursor, dtxContextInfo->distributedXid = getDistributedTransactionId(); if (dtxContextInfo->distributedXid != InvalidDistributedTransactionId) { - dtxContextInfo->distributedTimeStamp = getDtxStartTime(); + dtxContextInfo->distributedTimeStamp = getDtmStartTime(); getDistributedTransactionIdentifier(dtxContextInfo->distributedId); dtxContextInfo->curcid = curcid; @@ -121,9 +121,9 @@ DtxContextInfo_CreateOnMaster(DtxContextInfo *dtxContextInfo, bool inCursor, dtxContextInfo->curcid); elog((Debug_print_full_dtm ? LOG : DEBUG5), - "DtxContextInfo_CreateOnMaster txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", + "DtxContextInfo_CreateOnMaster txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", txnOptions, - (isMppTxOptions_NeedTwoPhase(txnOptions) ? "true" : "false"), + (isMppTxOptions_NeedDtx(txnOptions) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(txnOptions) ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(txnOptions)), (isMppTxOptions_ReadOnly(txnOptions) ? "true" : "false")); diff --git a/src/backend/cdb/cdblocaldistribxact.c b/src/backend/cdb/cdblocaldistribxact.c index 462f82d5c7de..a44c6057d08a 100644 --- a/src/backend/cdb/cdblocaldistribxact.c +++ b/src/backend/cdb/cdblocaldistribxact.c @@ -197,9 +197,7 @@ static struct LocalDistribXactCache bool -LocalDistribXactCache_CommittedFind( - TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, +LocalDistribXactCache_CommittedFind(TransactionId localXid, DistributedTransactionId *distribXid) { LocalDistribXactCacheEntry *entry; @@ -265,9 +263,7 @@ LocalDistribXactCache_CommittedFind( } void -LocalDistribXactCache_AddCommitted( - TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, +LocalDistribXactCache_AddCommitted(TransactionId localXid, DistributedTransactionId distribXid) { LocalDistribXactCacheEntry *entry; diff --git a/src/backend/cdb/cdbtm.c b/src/backend/cdb/cdbtm.c index 74dbe746c70c..85d502834240 100644 --- a/src/backend/cdb/cdbtm.c +++ b/src/backend/cdb/cdbtm.c @@ -96,7 +96,7 @@ int max_tm_gxacts = 100; * bits 2-4 for iso level * bit 5 is for read-only */ -#define GP_OPT_NEED_TWO_PHASE 0x0001 +#define GP_OPT_NEED_DTX 0x0001 #define GP_OPT_ISOLATION_LEVEL_MASK 0x000E #define GP_OPT_READ_UNCOMMITTED (1 << 1) @@ -119,7 +119,7 @@ static void doNotifyingCommitPrepared(void); static void doNotifyingAbort(void); static void retryAbortPrepared(void); static void doQEDistributedExplicitBegin(); -static void currentDtxActivateTwoPhase(void); +static void currentDtxActivate(void); static void setCurrentDtxState(DtxState state); static bool isDtxQueryDispatcher(void); @@ -155,39 +155,10 @@ requireDistributedTransactionContext(DtxContext requiredCurrentContext) } } -/** - * Does DistributedTransactionContext indicate that this is acting as a QD? - */ static bool -isQDContext(void) +isDtxContext(void) { - switch (DistributedTransactionContext) - { - case DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE: - case DTX_CONTEXT_QD_RETRY_PHASE_2: - return true; - default: - return false; - } -} - -/** - * Does DistributedTransactionContext indicate that this is acting as a QE? - */ -static bool -isQEContext() -{ - switch (DistributedTransactionContext) - { - case DTX_CONTEXT_QE_ENTRY_DB_SINGLETON: - case DTX_CONTEXT_QE_AUTO_COMMIT_IMPLICIT: - case DTX_CONTEXT_QE_TWO_PHASE_EXPLICIT_WRITER: - case DTX_CONTEXT_QE_TWO_PHASE_IMPLICIT_WRITER: - case DTX_CONTEXT_QE_READER: - return true; - default: - return false; - } + return DistributedTransactionContext != DTX_CONTEXT_LOCAL_ONLY; } /*========================================================================= @@ -195,7 +166,7 @@ isQEContext() */ DistributedTransactionTimeStamp -getDtxStartTime(void) +getDtmStartTime(void) { if (shmDistribTimeStamp != NULL) return *shmDistribTimeStamp; @@ -206,7 +177,7 @@ getDtxStartTime(void) DistributedTransactionId getDistributedTransactionId(void) { - if (isQDContext() || isQEContext()) + if (isDtxContext()) return MyTmGxact->gxid; else return InvalidDistributedTransactionId; @@ -215,7 +186,7 @@ getDistributedTransactionId(void) DistributedTransactionTimeStamp getDistributedTransactionTimestamp(void) { - if (isQDContext() || isQEContext()) + if (isDtxContext()) return MyTmGxact->distribTimeStamp; else return 0; @@ -226,8 +197,7 @@ getDistributedTransactionIdentifier(char *id) { Assert(MyTmGxactLocal != NULL); - if ((isQDContext() || isQEContext()) && - MyTmGxact->gxid != InvalidDistributedTransactionId) + if (isDtxContext() && MyTmGxact->gxid != InvalidDistributedTransactionId) { /* * The length check here requires the identifer have a trailing @@ -256,13 +226,13 @@ isPreparedDtxTransaction(void) * the current dtx is clean and we aren't in a user-started global transaction. */ bool -isCurrentDtxTwoPhaseActivated(void) +isCurrentDtxActivated(void) { return MyTmGxactLocal->state != DTX_STATE_NONE; } static void -currentDtxActivateTwoPhase(void) +currentDtxActivate(void) { /* * Bump 'shmGIDSeq' and assign it to 'MyTmGxact->gxid', this needs to be atomic. @@ -295,7 +265,7 @@ currentDtxActivateTwoPhase(void) (errmsg("reached the limit of %u global transactions per start", LastDistributedTransactionId))); - MyTmGxact->distribTimeStamp = getDtxStartTime(); + MyTmGxact->distribTimeStamp = getDtmStartTime(); MyTmGxact->sessionId = gp_session_id; setCurrentDtxState(DTX_STATE_ACTIVE_DISTRIBUTED); } @@ -322,7 +292,7 @@ notifyCommittedDtxTransactionIsNeeded(void) return false; } - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) { elog(DTM_DEBUG5, "notifyCommittedDtxTransaction nothing to do (two phase not activated)"); return false; @@ -332,7 +302,7 @@ notifyCommittedDtxTransactionIsNeeded(void) } /* - * Notify commited a global transaction, called by user commit + * Notify committed a global transaction, called by user commit * or by CommitTransaction */ void @@ -340,11 +310,10 @@ notifyCommittedDtxTransaction(void) { Assert(Gp_role == GP_ROLE_DISPATCH); Assert(DistributedTransactionContext == DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE); - Assert(isCurrentDtxTwoPhaseActivated()); + Assert(isCurrentDtxActivated()); switch(MyTmGxactLocal->state) { - case DTX_STATE_PREPARED: case DTX_STATE_INSERTED_COMMITTED: doNotifyingCommitPrepared(); break; @@ -359,13 +328,13 @@ notifyCommittedDtxTransaction(void) } void -setupTwoPhaseTransaction(void) +setupDtxTransaction(void) { if (!IsTransactionState()) elog(ERROR, "DTM transaction is not active"); - if (!isCurrentDtxTwoPhaseActivated()) - currentDtxActivateTwoPhase(); + if (!isCurrentDtxActivated()) + currentDtxActivate(); if (MyTmGxactLocal->state != DTX_STATE_ACTIVE_DISTRIBUTED) elog(ERROR, "DTM transaction state (%s) is invalid", DtxStateToString(MyTmGxactLocal->state)); @@ -393,9 +362,9 @@ doDispatchSubtransactionInternalCmd(DtxProtocolCommand cmdType) } if (cmdType == DTX_PROTOCOL_COMMAND_SUBTRANSACTION_BEGIN_INTERNAL && - !isCurrentDtxTwoPhaseActivated()) + !isCurrentDtxActivated()) { - currentDtxActivateTwoPhase(); + currentDtxActivate(); } serializedDtxContextInfo = qdSerializeDtxContextInfo(&serializedDtxContextInfoLen, @@ -442,7 +411,7 @@ doPrepareTransaction(void) elog(DTM_DEBUG5, "doPrepareTransaction moved to state = %s", DtxStateToString(MyTmGxactLocal->state)); - Assert(MyTmGxactLocal->twophaseSegments != NIL); + Assert(MyTmGxactLocal->dtxSegments != NIL); succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_PREPARE, true); /* @@ -538,22 +507,20 @@ doNotifyingOnePhaseCommit(void) bool succeeded; volatile int savedInterruptHoldoffCount; - if (MyTmGxactLocal->twophaseSegments == NULL) + if (MyTmGxactLocal->dtxSegments == NIL) return; elog(DTM_DEBUG5, "doNotifyingOnePhaseCommit entering in state = %s", DtxStateToString(MyTmGxactLocal->state)); Assert(MyTmGxactLocal->state == DTX_STATE_ONE_PHASE_COMMIT); - setCurrentDtxState(DTX_STATE_PERFORMING_ONE_PHASE_COMMIT); + setCurrentDtxState(DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT); savedInterruptHoldoffCount = InterruptHoldoffCount; - Assert(MyTmGxactLocal->twophaseSegments != NIL); - succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_COMMIT_ONEPHASE, true); if (!succeeded) { - Assert(MyTmGxactLocal->state == DTX_STATE_PERFORMING_ONE_PHASE_COMMIT); + Assert(MyTmGxactLocal->state == DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT); elog(ERROR, "one phase commit failed"); } } @@ -574,7 +541,7 @@ doNotifyingCommitPrepared(void) SIMPLE_FAULT_INJECTOR("dtm_broadcast_commit_prepared"); savedInterruptHoldoffCount = InterruptHoldoffCount; - Assert(MyTmGxactLocal->twophaseSegments != NIL); + Assert(MyTmGxactLocal->dtxSegments != NIL); PG_TRY(); { succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_COMMIT_PREPARED, true); @@ -699,7 +666,7 @@ retryAbortPrepared(void) PG_TRY(); { - MyTmGxactLocal->twophaseSegments = cdbcomponent_getCdbComponentsList(); + MyTmGxactLocal->dtxSegments = cdbcomponent_getCdbComponentsList(); succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_RETRY_ABORT_PREPARED, true); if (!succeeded) ereport(WARNING, @@ -751,7 +718,7 @@ doNotifyingAbort(void) * occur before the command is actually dispatched, no need to dispatch DTX for * such cases. */ - if (!MyTmGxactLocal->writerGangLost && MyTmGxactLocal->twophaseSegments) + if (!MyTmGxactLocal->writerGangLost && MyTmGxactLocal->dtxSegments) { succeeded = currentDtxDispatchProtocolCommand(DTX_PROTOCOL_COMMAND_ABORT_NO_PREPARED, false); @@ -864,7 +831,7 @@ prepareDtxTransaction(void) return; } - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) { Assert(MyTmGxactLocal->state == DTX_STATE_NONE); Assert(Gp_role != GP_ROLE_DISPATCH || MyTmGxact->gxid == InvalidDistributedTransactionId); @@ -879,7 +846,7 @@ prepareDtxTransaction(void) * segments. */ if (!ExecutorDidWriteXLog() || - (!markXidCommitted && list_length(MyTmGxactLocal->twophaseSegments) < 2)) + (!markXidCommitted && list_length(MyTmGxactLocal->dtxSegments) < 2)) { setCurrentDtxState(DTX_STATE_ONE_PHASE_COMMIT); return; @@ -908,7 +875,7 @@ rollbackDtxTransaction(void) DtxContextToString(DistributedTransactionContext)); return; } - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) { elog(DTM_DEBUG5, "rollbackDtxTransaction nothing to do (two phase not activate)"); return; @@ -945,7 +912,7 @@ rollbackDtxTransaction(void) break; case DTX_STATE_ONE_PHASE_COMMIT: - case DTX_STATE_PERFORMING_ONE_PHASE_COMMIT: + case DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT: setCurrentDtxState(DTX_STATE_NOTIFYING_ABORT_NO_PREPARED); break; @@ -1120,7 +1087,7 @@ tmShmemInit(void) * after the statement. */ int -mppTxnOptions(bool needTwoPhase) +mppTxnOptions(bool needDtx) { int options = 0; @@ -1129,8 +1096,8 @@ mppTxnOptions(bool needTwoPhase) IsoLevelAsUpperString(DefaultXactIsoLevel), (DefaultXactReadOnly ? "true" : "false"), IsoLevelAsUpperString(XactIsoLevel), (XactReadOnly ? "true" : "false")); - if (needTwoPhase) - options |= GP_OPT_NEED_TWO_PHASE; + if (needDtx) + options |= GP_OPT_NEED_DTX; if (XactIsoLevel == XACT_READ_COMMITTED) options |= GP_OPT_READ_COMMITTED; @@ -1144,13 +1111,13 @@ mppTxnOptions(bool needTwoPhase) if (XactReadOnly) options |= GP_OPT_READ_ONLY; - if (isCurrentDtxTwoPhaseActivated() && MyTmGxactLocal->explicitBeginRemembered) + if (isCurrentDtxActivated() && MyTmGxactLocal->explicitBeginRemembered) options |= GP_OPT_EXPLICT_BEGIN; elog(DTM_DEBUG5, - "mppTxnOptions txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", + "mppTxnOptions txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s.", options, - (isMppTxOptions_NeedTwoPhase(options) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(options) ? "true" : "false"), + (isMppTxOptions_NeedDtx(options) ? "true" : "false"), (isMppTxOptions_ExplicitBegin(options) ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(options)), (isMppTxOptions_ReadOnly(options) ? "true" : "false")); return options; @@ -1179,9 +1146,9 @@ isMppTxOptions_ReadOnly(int txnOptions) } bool -isMppTxOptions_NeedTwoPhase(int txnOptions) +isMppTxOptions_NeedDtx(int txnOptions) { - return ((txnOptions & GP_OPT_NEED_TWO_PHASE) != 0); + return ((txnOptions & GP_OPT_NEED_DTX) != 0); } /* isMppTxOptions_ExplicitBegin: @@ -1204,14 +1171,14 @@ currentDtxDispatchProtocolCommand(DtxProtocolCommand dtxProtocolCommand, bool ra dtxFormGID(gid, getDistributedTransactionTimestamp(), getDistributedTransactionId()); return doDispatchDtxProtocolCommand(dtxProtocolCommand, gid, badgang, raiseError, - MyTmGxactLocal->twophaseSegments, NULL, 0); + MyTmGxactLocal->dtxSegments, NULL, 0); } bool doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, char *gid, bool *badGangs, bool raiseError, - List *twophaseSegments, + List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen) { @@ -1223,7 +1190,7 @@ doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, struct pg_result **results; - if (!twophaseSegments) + if (!dtxSegments) return true; dtxProtocolCommandStr = DtxProtocolCommandToString(dtxProtocolCommand); @@ -1231,18 +1198,18 @@ doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, if (Test_print_direct_dispatch_info) elog(INFO, "Distributed transaction command '%s' to %s", dtxProtocolCommandStr, - segmentsToContentStr(twophaseSegments)); + segmentsToContentStr(dtxSegments)); ereport(DTM_DEBUG5, (errmsg("dispatchDtxProtocolCommand: %d ('%s'), direct content #: %s", dtxProtocolCommand, dtxProtocolCommandStr, - segmentsToContentStr(twophaseSegments)))); + segmentsToContentStr(dtxSegments)))); ErrorData *qeError; results = CdbDispatchDtxProtocolCommand(dtxProtocolCommand, dtxProtocolCommandStr, gid, - &qeError, &resultCount, badGangs, twophaseSegments, + &qeError, &resultCount, badGangs, dtxSegments, serializedDtxContextInfo, serializedDtxContextInfoLen); if (qeError) @@ -1392,8 +1359,8 @@ resetGxact() MyTmGxactLocal->explicitBeginRemembered = false; MyTmGxactLocal->badPrepareGangs = false; MyTmGxactLocal->writerGangLost = false; - MyTmGxactLocal->twophaseSegmentsMap = NULL; - MyTmGxactLocal->twophaseSegments = NIL; + MyTmGxactLocal->dtxSegmentsMap = NULL; + MyTmGxactLocal->dtxSegments = NIL; MyTmGxactLocal->isOnePhaseCommit = false; setCurrentDtxState(DTX_STATE_NONE); } @@ -1415,7 +1382,7 @@ getNextDistributedXactStatus(TMGALLXACTSTATUS *allDistributedXactStatus, TMGXACT static void clearAndResetGxact(void) { - Assert(isCurrentDtxTwoPhaseActivated()); + Assert(isCurrentDtxActivated()); LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); ProcArrayEndGxact(); @@ -1615,9 +1582,8 @@ setupRegularDtxContext(void) Assert(DistributedTransactionContext == DTX_CONTEXT_LOCAL_ONLY); if (isDtxQueryDispatcher()) - { setDistributedTransactionContext(DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE); - } + break; } @@ -1635,7 +1601,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) { DistributedSnapshot *distributedSnapshot; int txnOptions; - bool needTwoPhase; + bool needDtx; bool explicitBegin; bool haveDistributedSnapshot; bool isEntryDbSingleton = false; @@ -1651,7 +1617,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) distributedSnapshot = &dtxContextInfo->distributedSnapshot; txnOptions = dtxContextInfo->distributedTxnOptions; - needTwoPhase = isMppTxOptions_NeedTwoPhase(txnOptions); + needDtx = isMppTxOptions_NeedDtx(txnOptions); explicitBegin = isMppTxOptions_ExplicitBegin(txnOptions); haveDistributedSnapshot = dtxContextInfo->haveDistributedSnapshot; @@ -1661,9 +1627,9 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) { elog(DTM_DEBUG5, "setupQEDtxContext inputs (part 1): Gp_role = %s, Gp_is_writer = %s, " - "txnOptions = 0x%x, needTwoPhase = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s, haveDistributedSnapshot = %s.", + "txnOptions = 0x%x, needDtx = %s, explicitBegin = %s, isoLevel = %s, readOnly = %s, haveDistributedSnapshot = %s.", role_to_string(Gp_role), (Gp_is_writer ? "true" : "false"), txnOptions, - (needTwoPhase ? "true" : "false"), (explicitBegin ? "true" : "false"), + (needDtx ? "true" : "false"), (explicitBegin ? "true" : "false"), IsoLevelAsUpperString(mppTxOptions_IsoLevel(txnOptions)), (isMppTxOptions_ReadOnly(txnOptions) ? "true" : "false"), (haveDistributedSnapshot ? "true" : "false")); elog(DTM_DEBUG5, @@ -1774,7 +1740,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) setDistributedTransactionContext(DTX_CONTEXT_QE_READER); } - else if (isWriterQE && (explicitBegin || needTwoPhase)) + else if (isWriterQE && (explicitBegin || needDtx)) { if (!haveDistributedSnapshot) { @@ -1799,10 +1765,7 @@ setupQEDtxContext(DtxContextInfo *dtxContextInfo) doQEDistributedExplicitBegin(); } else - { - Assert(needTwoPhase); setDistributedTransactionContext(DTX_CONTEXT_QE_TWO_PHASE_IMPLICIT_WRITER); - } } else if (haveDistributedSnapshot) { @@ -1902,7 +1865,7 @@ finishDistributedTransactionContext(char *debugCaller, bool aborted) * We let the 2 retry states go up to PostgresMain.c, otherwise everything * MUST be complete. */ - if (isCurrentDtxTwoPhaseActivated() && + if (isCurrentDtxActivated() && (MyTmGxactLocal->state != DTX_STATE_RETRY_COMMIT_PREPARED && MyTmGxactLocal->state != DTX_STATE_RETRY_ABORT_PREPARED)) { @@ -1928,7 +1891,7 @@ finishDistributedTransactionContext(char *debugCaller, bool aborted) static void rememberDtxExplicitBegin(void) { - Assert (isCurrentDtxTwoPhaseActivated()); + Assert (isCurrentDtxActivated()); if (!MyTmGxactLocal->explicitBeginRemembered) { @@ -1948,7 +1911,7 @@ rememberDtxExplicitBegin(void) bool isDtxExplicitBegin(void) { - return (isCurrentDtxTwoPhaseActivated() && MyTmGxactLocal->explicitBeginRemembered); + return (isCurrentDtxActivated() && MyTmGxactLocal->explicitBeginRemembered); } /* @@ -1961,7 +1924,7 @@ sendDtxExplicitBegin(void) if (Gp_role != GP_ROLE_DISPATCH) return; - setupTwoPhaseTransaction(); + setupDtxTransaction(); rememberDtxExplicitBegin(); } @@ -2312,18 +2275,18 @@ currentGxactWriterGangLost(void) * Record which segment involved in the two phase commit. */ void -addToGxactTwophaseSegments(Gang *gang) +addToGxactDtxSegments(Gang *gang) { SegmentDatabaseDescriptor *segdbDesc; MemoryContext oldContext; int segindex; int i; - if (!isCurrentDtxTwoPhaseActivated()) + if (!isCurrentDtxActivated()) return; /* skip if all segdbs are in the list */ - if (list_length(MyTmGxactLocal->twophaseSegments) >= getgpsegmentCount()) + if (list_length(MyTmGxactLocal->dtxSegments) >= getgpsegmentCount()) return; oldContext = MemoryContextSwitchTo(TopTransactionContext); @@ -2338,14 +2301,14 @@ addToGxactTwophaseSegments(Gang *gang) continue; /* skip if record already */ - if (bms_is_member(segindex, MyTmGxactLocal->twophaseSegmentsMap)) + if (bms_is_member(segindex, MyTmGxactLocal->dtxSegmentsMap)) continue; - MyTmGxactLocal->twophaseSegmentsMap = - bms_add_member(MyTmGxactLocal->twophaseSegmentsMap, segindex); + MyTmGxactLocal->dtxSegmentsMap = + bms_add_member(MyTmGxactLocal->dtxSegmentsMap, segindex); - MyTmGxactLocal->twophaseSegments = - lappend_int(MyTmGxactLocal->twophaseSegments, segindex); + MyTmGxactLocal->dtxSegments = + lappend_int(MyTmGxactLocal->dtxSegments, segindex); } MemoryContextSwitchTo(oldContext); } diff --git a/src/backend/cdb/cdbtmutils.c b/src/backend/cdb/cdbtmutils.c index 3eb074d9ba28..6fcb18816168 100644 --- a/src/backend/cdb/cdbtmutils.c +++ b/src/backend/cdb/cdbtmutils.c @@ -64,8 +64,6 @@ DtxStateToString(DtxState state) return "Inserting Committed"; case DTX_STATE_INSERTED_COMMITTED: return "Inserted Committed"; - case DTX_STATE_FORCED_COMMITTED: - return "Forced Committed"; case DTX_STATE_NOTIFYING_COMMIT_PREPARED: return "Notifying Commit Prepared"; case DTX_STATE_INSERTING_FORGET_COMMITTED: diff --git a/src/backend/cdb/dispatcher/cdbdisp_dtx.c b/src/backend/cdb/dispatcher/cdbdisp_dtx.c index 048996df430d..c9f6c8336ac0 100644 --- a/src/backend/cdb/dispatcher/cdbdisp_dtx.c +++ b/src/backend/cdb/dispatcher/cdbdisp_dtx.c @@ -71,7 +71,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, ErrorData **qeError, int *numresults, bool *badGangs, - List *twophaseSegments, + List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen) { @@ -102,7 +102,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, queryText = buildGpDtxProtocolCommand(&dtxProtocolParms, &queryTextLen); - primaryGang = AllocateGang(ds, GANGTYPE_PRIMARY_WRITER, twophaseSegments); + primaryGang = AllocateGang(ds, GANGTYPE_PRIMARY_WRITER, dtxSegments); Assert(primaryGang); @@ -110,7 +110,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, cdbdisp_makeDispatchParams(ds, 1, queryText, queryTextLen); cdbdisp_dispatchToGang(ds, primaryGang, -1); - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); diff --git a/src/backend/cdb/dispatcher/cdbdisp_query.c b/src/backend/cdb/dispatcher/cdbdisp_query.c index f87baa226519..9490c764c91b 100644 --- a/src/backend/cdb/dispatcher/cdbdisp_query.c +++ b/src/backend/cdb/dispatcher/cdbdisp_query.c @@ -295,11 +295,11 @@ CdbDispatchSetCommand(const char *strCommand, bool cancelOnError) cdbdisp_dispatchToGang(ds, rg, -1); } - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); /* * No need for two-phase commit, so no need to call - * addToGxactTwophaseSegments. + * addToGxactDtxSegments. */ cdbdisp_waitDispatchFinish(ds); @@ -359,7 +359,7 @@ CdbDispatchCommandToSegments(const char *strCommand, bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupTwoPhaseTransaction(); + setupDtxTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchCommand: %s (needTwoPhase = %s)", @@ -397,7 +397,7 @@ CdbDispatchUtilityStatement(struct Node *stmt, bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupTwoPhaseTransaction(); + setupDtxTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchUtilityStatement: %s (needTwoPhase = %s)", @@ -443,7 +443,7 @@ cdbdisp_dispatchCommandInternal(DispatchCommandQueryParms *pQueryParms, cdbdisp_dispatchToGang(ds, primaryGang, -1); if ((flags & DF_NEED_TWO_PHASE) != 0 || isDtxExplicitBegin()) - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); @@ -1157,7 +1157,7 @@ cdbdisp_dispatchX(QueryDesc* queryDesc, cdbdisp_dispatchToGang(ds, primaryGang, si); if (planRequiresTxn || isDtxExplicitBegin()) - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); SIMPLE_FAULT_INJECTOR("after_one_slice_dispatched"); } @@ -1417,7 +1417,7 @@ CdbDispatchCopyStart(struct CdbCopy *cdbCopy, Node *stmt, int flags) bool needTwoPhase = flags & DF_NEED_TWO_PHASE; if (needTwoPhase) - setupTwoPhaseTransaction(); + setupDtxTransaction(); elogif((Debug_print_full_dtm || log_min_messages <= DEBUG5), LOG, "CdbDispatchCopyStart: %s (needTwoPhase = %s)", @@ -1443,7 +1443,7 @@ CdbDispatchCopyStart(struct CdbCopy *cdbCopy, Node *stmt, int flags) cdbdisp_dispatchToGang(ds, primaryGang, -1); if ((flags & DF_NEED_TWO_PHASE) != 0 || isDtxExplicitBegin()) - addToGxactTwophaseSegments(primaryGang); + addToGxactDtxSegments(primaryGang); cdbdisp_waitDispatchFinish(ds); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 4a68832f20b0..c3537bb15ac1 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -269,7 +269,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) MemoryContext oldcontext; GpExecIdentity exec_identity; bool shouldDispatch; - bool needDtxTwoPhase; + bool needDtx; /* sanity checks: queryDesc must not be started already */ Assert(queryDesc != NULL); @@ -632,9 +632,9 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * ExecutorSaysTransactionDoesWrites() before any dispatch * work for this query. */ - needDtxTwoPhase = ExecutorSaysTransactionDoesWrites(); - if (needDtxTwoPhase) - setupTwoPhaseTransaction(); + needDtx = ExecutorSaysTransactionDoesWrites(); + if (needDtx) + setupDtxTransaction(); if (queryDesc->ddesc != NULL) { @@ -687,7 +687,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * Main plan is parallel, send plan to it. */ if (queryDesc->plannedstmt->planTree->dispatch == DISPATCH_PARALLEL) - CdbDispatchPlan(queryDesc, needDtxTwoPhase, true); + CdbDispatchPlan(queryDesc, needDtx, true); } /* diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index 500f704dc317..eef1076a6150 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -971,7 +971,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext, QueryDesc *queryDesc ArrayBuildState *astate = NULL; Size savepeakspace = MemoryContextGetPeakSpace(planstate->state->es_query_cxt); - bool needDtxTwoPhase; + bool needDtx; bool shouldDispatch = false; volatile bool explainRecvStats = false; @@ -999,14 +999,14 @@ PG_TRY(); { if (shouldDispatch) { - needDtxTwoPhase = isCurrentDtxTwoPhaseActivated(); + needDtx = isCurrentDtxActivated(); /* * This call returns after launching the threads that send the * command to the appropriate segdbs. It does not wait for them * to finish unless an error is detected before all are dispatched. */ - CdbDispatchPlan(queryDesc, needDtxTwoPhase, true); + CdbDispatchPlan(queryDesc, needDtx, true); /* * Set up the interconnect for execution of the initplan root slice. diff --git a/src/backend/executor/test/nodeSubplan_test.c b/src/backend/executor/test/nodeSubplan_test.c index 41bfcee53a27..94fca67e7c10 100644 --- a/src/backend/executor/test/nodeSubplan_test.c +++ b/src/backend/executor/test/nodeSubplan_test.c @@ -90,7 +90,7 @@ test__ExecSetParamPlan__Check_Dispatch_Results(void **state) Gp_role = GP_ROLE_DISPATCH; ((SubPlan*)(plan->xprstate.expr))->initPlanParallel = true; - will_be_called(isCurrentDtxTwoPhaseActivated); + will_be_called(isCurrentDtxActivated); expect_any(CdbDispatchPlan,queryDesc); expect_any(CdbDispatchPlan,planRequiresTxn); diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 9ec61bab8279..148313c92aa9 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -396,7 +396,6 @@ errstart(int elevel, const char *filename, int lineno, case DTX_STATE_PREPARED: case DTX_STATE_INSERTING_COMMITTED: case DTX_STATE_INSERTED_COMMITTED: - case DTX_STATE_FORCED_COMMITTED: case DTX_STATE_NOTIFYING_COMMIT_PREPARED: case DTX_STATE_NOTIFYING_ABORT_SOME_PREPARED: case DTX_STATE_NOTIFYING_ABORT_PREPARED: @@ -408,7 +407,7 @@ errstart(int elevel, const char *filename, int lineno, case DTX_STATE_NONE: case DTX_STATE_ACTIVE_DISTRIBUTED: case DTX_STATE_ONE_PHASE_COMMIT: - case DTX_STATE_PERFORMING_ONE_PHASE_COMMIT: + case DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT: case DTX_STATE_INSERTING_FORGET_COMMITTED: case DTX_STATE_INSERTED_FORGET_COMMITTED: case DTX_STATE_NOTIFYING_ABORT_NO_PREPARED: diff --git a/src/backend/utils/gpmon/gpmon.c b/src/backend/utils/gpmon/gpmon.c index f31899c5433c..7c7bb4dca39e 100644 --- a/src/backend/utils/gpmon/gpmon.c +++ b/src/backend/utils/gpmon/gpmon.c @@ -164,7 +164,7 @@ void gpmon_gettmid(int32* tmid) *tmid = (int32)QEDtxContextInfo.distributedSnapshot.distribTransactionTimeStamp; else /* On QD */ - *tmid = (int32)getDtxStartTime(); + *tmid = (int32)getDtmStartTime(); } diff --git a/src/include/cdb/cdbdisp_dtx.h b/src/include/cdb/cdbdisp_dtx.h index e922e97491ce..b433eaa78a67 100644 --- a/src/include/cdb/cdbdisp_dtx.h +++ b/src/include/cdb/cdbdisp_dtx.h @@ -37,7 +37,7 @@ CdbDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, ErrorData **qeError, int *resultCount, bool* badGangs, - List *twophaseSegments, + List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen); diff --git a/src/include/cdb/cdblocaldistribxact.h b/src/include/cdb/cdblocaldistribxact.h index 6599f45f0d19..04fe8a2e29ef 100644 --- a/src/include/cdb/cdblocaldistribxact.h +++ b/src/include/cdb/cdblocaldistribxact.h @@ -51,12 +51,10 @@ extern char* LocalDistribXact_DisplayString(int pgprocno); extern bool LocalDistribXactCache_CommittedFind( TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId *distribXid); extern void LocalDistribXactCache_AddCommitted( TransactionId localXid, - DistributedTransactionTimeStamp distribTransactionTimeStamp, DistributedTransactionId distribXid); extern void LocalDistribXactCache_ShowStats(char *nameStr); diff --git a/src/include/cdb/cdbtm.h b/src/include/cdb/cdbtm.h index 1a3173b06196..7a2a52d5dc95 100644 --- a/src/include/cdb/cdbtm.h +++ b/src/include/cdb/cdbtm.h @@ -40,7 +40,7 @@ typedef enum * For one-phase optimization commit, we haven't run the commit yet */ DTX_STATE_ONE_PHASE_COMMIT, - DTX_STATE_PERFORMING_ONE_PHASE_COMMIT, + DTX_STATE_NOTIFYING_ONE_PHASE_COMMIT, /** * For two-phase commit, the first phase is about to run @@ -53,7 +53,6 @@ typedef enum DTX_STATE_PREPARED, DTX_STATE_INSERTING_COMMITTED, DTX_STATE_INSERTED_COMMITTED, - DTX_STATE_FORCED_COMMITTED, DTX_STATE_NOTIFYING_COMMIT_PREPARED, DTX_STATE_INSERTING_FORGET_COMMITTED, DTX_STATE_INSERTED_FORGET_COMMITTED, @@ -236,8 +235,8 @@ typedef struct TMGXACTLOCAL bool writerGangLost; - Bitmapset *twophaseSegmentsMap; - List *twophaseSegments; + Bitmapset *dtxSegmentsMap; + List *dtxSegments; } TMGXACTLOCAL; typedef struct TMGXACTSTATUS @@ -276,7 +275,7 @@ extern volatile int *shmNumCommittedGxacts; extern char *DtxStateToString(DtxState state); extern char *DtxProtocolCommandToString(DtxProtocolCommand command); extern char *DtxContextToString(DtxContext context); -extern DistributedTransactionTimeStamp getDtxStartTime(void); +extern DistributedTransactionTimeStamp getDtmStartTime(void); extern void dtxCrackOpenGid(const char *gid, DistributedTransactionTimeStamp *distribTimeStamp, DistributedTransactionId *distribXid); @@ -301,10 +300,9 @@ extern void redoDtxCheckPoint(TMGXACT_CHECKPOINT *gxact_checkpoint); extern void redoDistributedCommitRecord(TMGXACT_LOG *gxact_log); extern void redoDistributedForgetCommitRecord(TMGXACT_LOG *gxact_log); -extern void setupTwoPhaseTransaction(void); -extern bool isCurrentDtxTwoPhase(void); +extern void setupDtxTransaction(void); extern DtxState getCurrentDtxState(void); -extern bool isCurrentDtxTwoPhaseActivated(void); +extern bool isCurrentDtxActivated(void); extern void sendDtxExplicitBegin(void); extern bool isDtxExplicitBegin(void); @@ -316,10 +314,10 @@ extern int tmShmemSize(void); extern void verify_shared_snapshot_ready(int cid); -int mppTxnOptions(bool needTwoPhase); +int mppTxnOptions(bool needDtx); int mppTxOptions_IsoLevel(int txnOptions); bool isMppTxOptions_ReadOnly(int txnOptions); -bool isMppTxOptions_NeedTwoPhase(int txnOptions); +bool isMppTxOptions_NeedDtx(int txnOptions); bool isMppTxOptions_ExplicitBegin(int txnOptions); extern void getAllDistributedXactStatus(TMGALLXACTSTATUS **allDistributedXactStatus); @@ -336,14 +334,14 @@ extern void UtilityModeCloseDtmRedoFile(void); extern bool currentDtxDispatchProtocolCommand(DtxProtocolCommand dtxProtocolCommand, bool raiseError); extern bool doDispatchSubtransactionInternalCmd(DtxProtocolCommand cmdType); extern bool doDispatchDtxProtocolCommand(DtxProtocolCommand dtxProtocolCommand, char *gid, - bool *badGangs, bool raiseError, List *twophaseSegments, + bool *badGangs, bool raiseError, List *dtxSegments, char *serializedDtxContextInfo, int serializedDtxContextInfoLen); extern void markCurrentGxactWriterGangLost(void); extern bool currentGxactWriterGangLost(void); -extern void addToGxactTwophaseSegments(struct Gang* gp); +extern void addToGxactDtxSegments(struct Gang* gp); extern void ClearTransactionState(TransactionId latestXid); From 05add7f47c1bb6be7fef676a8fc470cfcdf257e6 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Thu, 19 Dec 2019 17:59:51 -0800 Subject: [PATCH 019/102] Revert "Revert "Only parse enough of each COPY row, to compute which segment to send it to."" This reverts commit d90ac1a1b983b913b3950430d4d9e47ee8827fd4. --- src/backend/commands/copy.c | 739 ++++++++++++++---- src/backend/utils/misc/guc_gp.c | 12 + src/include/commands/copy.h | 4 + src/include/utils/sync_guc_name.h | 1 + src/test/regress/expected/gpcopy_dispatch.out | 82 ++ src/test/regress/greenplum_schedule | 2 +- src/test/regress/output/sreh.source | 4 +- src/test/regress/sql/gpcopy_dispatch.sql | 99 +++ 8 files changed, 785 insertions(+), 158 deletions(-) create mode 100644 src/test/regress/expected/gpcopy_dispatch.out create mode 100644 src/test/regress/sql/gpcopy_dispatch.sql diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index c77517f575bc..59aa5e93b922 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -169,8 +169,8 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate, uint64 firstBufferedLineNo); static bool CopyReadLine(CopyState cstate); static bool CopyReadLineText(CopyState cstate); -static int CopyReadAttributesText(CopyState cstate); -static int CopyReadAttributesCSV(CopyState cstate); +static int CopyReadAttributesText(CopyState cstate, int stop_processing_at_field); +static int CopyReadAttributesCSV(CopyState cstate, int stop_processing_at_field); static Datum CopyReadBinaryAttribute(CopyState cstate, int column_no, FmgrInfo *flinfo, Oid typioparam, int32 typmod, @@ -205,17 +205,18 @@ static void SendCopyFromForwardedTuple(CopyState cstate, Oid tuple_oid, Datum *values, bool *nulls); -static void SendCopyFromForwardedHeader(CopyState cstate, CdbCopy *cdbCopy, bool file_has_oids); +static void SendCopyFromForwardedHeader(CopyState cstate, CdbCopy *cdbCopy); static void SendCopyFromForwardedError(CopyState cstate, CdbCopy *cdbCopy, char *errmsg); static bool NextCopyFromDispatch(CopyState cstate, ExprContext *econtext, Datum *values, bool *nulls, Oid *tupleOid); static TupleTableSlot *NextCopyFromExecute(CopyState cstate, ExprContext *econtext, EState *estate, Oid *tupleOid); +static bool NextCopyFromRawFieldsX(CopyState cstate, char ***fields, int *nfields, int stop_processing_at_field); static bool NextCopyFromX(CopyState cstate, ExprContext *econtext, - Datum *values, bool *nulls, Oid *tupleOid); + Datum *values, bool *nulls, Oid *tupleOid); static void HandleCopyError(CopyState cstate); -static void HandleQDErrorFrame(CopyState cstate); +static void HandleQDErrorFrame(CopyState cstate, char *p, int len); static void CopyInitDataParser(CopyState cstate); static void setEncodingConversionProc(CopyState cstate, int encoding, bool iswritable); @@ -223,6 +224,8 @@ static void CopyEolStrToType(CopyState cstate); static GpDistributionData *InitDistributionData(CopyState cstate, EState *estate); static void FreeDistributionData(GpDistributionData *distData); +static void InitCopyFromDispatchSplit(CopyState cstate, GpDistributionData *distData, EState *estate); +static Bitmapset *GetTargetKeyCols(Oid relid, PartitionNode *children, Bitmapset *needed_cols, bool distkeys, EState *estate); static GpDistributionData *GetDistributionPolicyForPartition(CopyState cstate, EState *estate, GpDistributionData *mainDistData, @@ -274,44 +277,40 @@ static volatile CopyState glob_cstate = NULL; /* GPDB_91_MERGE_FIXME: passing through a global variable like this is ugly */ static CopyStmt *glob_copystmt = NULL; +/* + * Testing GUC: When enabled, COPY FROM prints an INFO line to indicate which + * fields are processed in the QD, and which in the QE. + */ +extern bool Test_copy_qd_qe_split; /* * When doing a COPY FROM through the dispatcher, the QD reads the input from * the input file (or stdin or program), and forwards the data to the QE nodes, - * where they will actually be inserted - * - * - Ideally, the QD would just pass through each line to the QE as is, and let - * the QEs to do all the processing. Because the more processing the QD has - * to do, the more likely it is to become a bottleneck. + * where they will actually be inserted. * - * - However, the QD needs to figure out which QE to send each row to. For that, - * it needs to at least parse the distribution key. The distribution key might - * also be a DEFAULTed column, in which case the DEFAULT value needs to be - * evaluated in the QD. In that case, the QD must send the computed value - * to the QE - we cannot assume that the QE can re-evaluate the expression and - * arrive at the same value, at least not if the DEFAULT expression is volatile. + * Ideally, the QD would just pass through each line to the QE as is, and let + * the QEs to do all the processing. Because the more processing the QD has + * to do, the more likely it is to become a bottleneck. * - * - Therefore, we need a flexible format between the QD and QE, where the QD - * processes just enough of each input line to figure out where to send it. - * It must send the values it had to parse and evaluate to the QE, as well - * as the rest of the original input line, so that the QE can parse the rest - * of it. + * However, the QD needs to figure out which QE to send each row to. For that, + * it needs to at least parse the distribution key. The distribution key might + * also be a DEFAULTed column, in which case the DEFAULT value needs to be + * evaluated in the QD. In that case, the QD must send the computed value + * to the QE - we cannot assume that the QE can re-evaluate the expression and + * arrive at the same value, at least not if the DEFAULT expression is volatile. * - * GPDB_91_MERGE_FIXME: that's a nice theory, but the current implementation - * is a lot more dumb: The QD parses every row fully, and sends all - * precomputed values to each QE. Therefore, with the current implementation, - * the QD will easily become a bottleneck, if the input functions are - * expensive. Before the refactoring during the 9.1 merge, there was no - * special QD->QE protocol. Instead, the QD reconstructed each line in the - * same format as the original file had, interjecting any DEFAULT values into - * it. That was fast when only a few columns needed to be evaluated in the QD, - * but it was not optimal, but it was pretty complicated, and required some - * majore surgery to the upstream NextCopyFrom and other functions. + * Therefore, we need a flexible format between the QD and QE, where the QD + * processes just enough of each input line to figure out where to send it. + * It must send the values it had to parse and evaluate to the QE, as well + * as the rest of the original input line, so that the QE can parse the rest + * of it. * - * The 'copy_from_dispatch_frame' struct is used in the QD->QE stream. For each - * input line, the QD constructs a 'copy_from_dispatch_frame' struct, and sends + * The 'copy_from_dispatch_*' structs are used in the QD->QE stream. For each + * input line, the QD constructs a 'copy_from_dispatch_row' struct, and sends * it to the QE. Before any rows, a QDtoQESignature is sent first, followed by - * a 'copy_from_dispatch_header'. + * a 'copy_from_dispatch_header'. When QD encounters a recoverable error that + * needs to be logged in the error log (LOG ERRORS SEGMENT REJECT LIMIT), it + * sends the erroneous raw to a QE, in a 'copy_from_dispatch_error' struct. * * * COPY TO is simpler: The QEs form the output rows in the final form, and the QD @@ -320,49 +319,83 @@ static CopyStmt *glob_copystmt = NULL; */ static const char QDtoQESignature[] = "PGCOPY-QD-TO-QE\n\377\r\n"; +/* Header contains information that applies to all the rows that follow. */ typedef struct { + /* + * First field that should be processed in the QE. Any fields before + * this will be included as Datums in the rows that follow. + */ + int16 first_qe_processed_field; bool file_has_oids; } copy_from_dispatch_header; typedef struct { /* - * target relation OID. Normally, the same as cstate->relid, but for - * a partitioned relation, it indicate the target partition. + * Information about this input line. + * + * 'relid' is the target relation's OID. Normally, the same as + * cstate->relid, but for a partitioned relation, it indicates the target + * partition. Note: this must be the first field, because InvalidOid means + * that this is actually a 'copy_from_dispatch_error' struct. + * + * 'lineno' is the input line number, for error reporting. */ - Oid relid; - Oid loaded_oid; int64 lineno; - int16 fld_count; + Oid relid; + + uint32 line_len; /* size of the included input line */ + uint32 residual_off; /* offset in the line, where QE should + * process remaining fields */ + bool delim_seen_at_end; /* conveys to QE if QD saw a delim at end + * of its processing */ + uint16 fld_count; /* # of fields that were processed in the + * QD. */ + + /* If 'file_has_oids' was true, a tuple OID follows. */ + + /* The input line follows. */ /* - * Default values. For each default value: + * For each field that was parsed in the QD already, the following data follows: + * + * int16 fieldnum; * * - * The data is the raw Datum. + * NULL values are not included, any attributes that are not included in + * the message are implicitly NULL. + * + * For pass-by-value datatypes, the is the raw Datum. For + * simplicity, it is always sent as a full-width 8-byte Datum, regardless + * of the datatype's length. + * + * For other fixed width datatypes, is the datatype's value. + * + * For variable-length datatypes, begins with a 4-byte length field, + * followed by the data. Cstrings (typlen = -2) are also sent in this + * format. */ - - /* data follows */ } copy_from_dispatch_row; +/* Size of the struct, without padding at the end. */ +#define SizeOfCopyFromDispatchRow (offsetof(copy_from_dispatch_row, fld_count) + sizeof(uint16)) + typedef struct { - /* - * target relation OID. Normally, the same as cstate->relid, but for - * a partitioned relation, it indicate the target partition. - */ - Oid error_marker; /* InvalidOid, to distinguish this from row. */ + int64 error_marker; /* constant -1, to mark that this is an error + * frame rather than 'copy_from_dispatch_row' */ int64 lineno; - bool line_buf_converted; uint32 errmsg_len; uint32 line_len; + bool line_buf_converted; /* 'errmsg' follows */ /* 'line' follows */ } copy_from_dispatch_error; - +/* Size of the struct, without padding at the end. */ +#define SizeOfCopyFromDispatchError (offsetof(copy_from_dispatch_error, line_buf_converted) + sizeof(bool)) /* @@ -3689,6 +3722,41 @@ CopyFrom(CopyState cstate) is_check_distkey = false; } + /* Determine which fields we need to parse in the QD. */ + if (cstate->dispatch_mode == COPY_DISPATCH) + InitCopyFromDispatchSplit(cstate, distData, estate); + + if (cstate->dispatch_mode == COPY_DISPATCH || + cstate->dispatch_mode == COPY_EXECUTOR) + { + /* + * Now split the attnumlist into the parts that are parsed in the QD, and + * in QE. + */ + ListCell *lc; + int i = 0; + List *qd_attnumlist = NIL; + List *qe_attnumlist = NIL; + int first_qe_processed_field; + + first_qe_processed_field = cstate->first_qe_processed_field; + if (cstate->file_has_oids) + first_qe_processed_field--; + + foreach(lc, cstate->attnumlist) + { + int attnum = lfirst_int(lc); + + if (i < first_qe_processed_field) + qd_attnumlist = lappend_int(qd_attnumlist, attnum); + else + qe_attnumlist = lappend_int(qe_attnumlist, attnum); + i++; + } + cstate->qd_attnumlist = qd_attnumlist; + cstate->qe_attnumlist = qe_attnumlist; + } + if (cstate->dispatch_mode == COPY_DISPATCH) { /* @@ -3702,7 +3770,7 @@ CopyFrom(CopyState cstate) * pre-allocate buffer for constructing a message. */ cstate->dispatch_msgbuf = makeStringInfo(); - enlargeStringInfo(cstate->dispatch_msgbuf, sizeof(copy_from_dispatch_row)); + enlargeStringInfo(cstate->dispatch_msgbuf, SizeOfCopyFromDispatchRow); /* * prepare to COPY data into segDBs: @@ -3746,9 +3814,7 @@ CopyFrom(CopyState cstate) * dummy file on master for COPY FROM ON SEGMENT */ if (!cstate->on_segment) - { - SendCopyFromForwardedHeader(cstate, cdbCopy, cstate->file_has_oids); - } + SendCopyFromForwardedHeader(cstate, cdbCopy); } } @@ -4421,10 +4487,12 @@ BeginCopyFrom(Relation rel, /* * Determine the mode */ - if (Gp_role == GP_ROLE_DISPATCH && !cstate->on_segment && - cstate->rel && cstate->rel->rd_cdbpolicy) + if (cstate->on_segment || data_source_cb) + cstate->dispatch_mode = COPY_DIRECT; + else if (Gp_role == GP_ROLE_DISPATCH && + cstate->rel && cstate->rel->rd_cdbpolicy) cstate->dispatch_mode = COPY_DISPATCH; - else if (Gp_role == GP_ROLE_EXECUTE && !cstate->on_segment) + else if (Gp_role == GP_ROLE_EXECUTE) cstate->dispatch_mode = COPY_EXECUTOR; else cstate->dispatch_mode = COPY_DIRECT; @@ -4667,6 +4735,7 @@ BeginCopyFrom(Relation rel, errmsg("invalid QD->QD COPY communication header"))); cstate->file_has_oids = header_frame.file_has_oids; + cstate->first_qe_processed_field = header_frame.first_qe_processed_field; } else if (!cstate->binary) { @@ -4747,6 +4816,13 @@ BeginCopyFrom(Relation rel, */ bool NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields) +{ + return NextCopyFromRawFieldsX(cstate, fields, nfields, -1); +} + +static bool +NextCopyFromRawFieldsX(CopyState cstate, char ***fields, int *nfields, + int stop_processing_at_field) { int fldct; bool done; @@ -4777,9 +4853,9 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields) /* Parse the line into de-escaped field values */ if (cstate->csv_mode) - fldct = CopyReadAttributesCSV(cstate); + fldct = CopyReadAttributesCSV(cstate, stop_processing_at_field); else - fldct = CopyReadAttributesText(cstate); + fldct = CopyReadAttributesText(cstate, stop_processing_at_field); *fields = cstate->raw_fields; *nfields = fldct; @@ -4866,7 +4942,7 @@ HandleCopyError(CopyState cstate) * ErrorIfRejectLimit() below will use this information in the error message, * if the error count is reached. */ - cdbsreh->rawdata = cstate->line_buf.data + cstate->line_buf.cursor; + cdbsreh->rawdata = cstate->line_buf.data; cdbsreh->is_server_enc = cstate->line_buf_converted; cdbsreh->linenumber = cstate->cur_lineno; @@ -4889,7 +4965,7 @@ HandleCopyError(CopyState cstate) SendCopyFromForwardedError(cstate, cstate->cdbCopy, errormsg); } - else + else { /* after all the prep work let cdbsreh do the real work */ if (Gp_role == GP_ROLE_DISPATCH) @@ -4924,7 +5000,7 @@ HandleCopyError(CopyState cstate) * relation passed to BeginCopyFrom. This function fills the arrays. * Oid of the tuple is returned with 'tupleOid' separately. */ -bool +static bool NextCopyFromX(CopyState cstate, ExprContext *econtext, Datum *values, bool *nulls, Oid *tupleOid) { @@ -4938,14 +5014,46 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, int i; int nfields; bool isnull; - bool file_has_oids = cstate->file_has_oids; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; + List *attnumlist; + bool file_has_oids; + int stop_processing_at_field; + + /* + * Figure out what fields we're going to process in this process. + * + * In the QD, set 'stop_processing_at_field' so that we only those + * fields that are needed in the QD. + */ + switch (cstate->dispatch_mode) + { + case COPY_DIRECT: + stop_processing_at_field = -1; + attnumlist = cstate->attnumlist; + file_has_oids = cstate->file_has_oids; + break; + + case COPY_DISPATCH: + stop_processing_at_field = cstate->first_qe_processed_field; + attnumlist = cstate->qd_attnumlist; + file_has_oids = cstate->file_has_oids; + break; + + case COPY_EXECUTOR: + stop_processing_at_field = -1; + attnumlist = cstate->qe_attnumlist; + file_has_oids = false; /* already handled in QD. */ + break; + + default: + elog(ERROR, "unexpected COPY dispatch mode %d", cstate->dispatch_mode); + } tupDesc = RelationGetDescr(cstate->rel); attr = tupDesc->attrs; num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); + attr_count = list_length(attnumlist); nfields = file_has_oids ? (attr_count + 1) : attr_count; /* Initialize all values for row to NULL */ @@ -4961,8 +5069,30 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, char *string; /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; + if (cstate->dispatch_mode != COPY_EXECUTOR) + { + if (!NextCopyFromRawFieldsX(cstate, &field_strings, &fldct, + stop_processing_at_field)) + return false; + } + else + { + /* + * We have received the raw line from the QD, and we just + * need to split it into raw fields. + */ + if (cstate->stopped_processing_at_delim && + cstate->line_buf.cursor <= cstate->line_buf.len) + { + if (cstate->csv_mode) + fldct = CopyReadAttributesCSV(cstate, -1); + else + fldct = CopyReadAttributesText(cstate, -1); + } + else + fldct = 0; + field_strings = cstate->raw_fields; + } /* check for overflowing fields */ if (nfields > 0 && fldct > nfields) @@ -5017,7 +5147,7 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, } /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) + foreach(cur, attnumlist) { int attnum = lfirst_int(cur); int m = attnum - 1; @@ -5084,7 +5214,7 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, Assert(fieldno == nfields); } - else + else if (attr_count) { /* binary */ int16 fld_count; @@ -5151,7 +5281,7 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, } i = 0; - foreach(cur, cstate->attnumlist) + foreach(cur, attnumlist) { int attnum = lfirst_int(cur); int m = attnum - 1; @@ -5173,7 +5303,15 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, * Now compute and insert any defaults available for the columns not * provided by the input data. Anything not processed here or above will * remain NULL. + * + * GPDB: The defaults are always computed in the QD, and are included + * in the QD->QE stream as pre-computed Datums. Funny indentation, to + * keep the indentation of the code inside the same as in upstream. + * (We could improve this, and compute immutable defaults that don't + * affect which segment the row belongs to, in the QE.) */ + if (cstate->dispatch_mode != COPY_EXECUTOR) + { for (i = 0; i < num_defaults; i++) { /* @@ -5186,6 +5324,7 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, values[defmap[i]] = ExecEvalExpr(defexprs[i], econtext, &nulls[defmap[i]], NULL); } + } return true; } @@ -5195,21 +5334,14 @@ NextCopyFromX(CopyState cstate, ExprContext *econtext, * Like NextCopyFrom(), but used in the QD, when we want to parse the * input line only partially. We only want to parse enough fields needed * to determine which target segment to forward the row to. + * + * (The logic is actually within NextCopyFrom(). This is a separate + * function just for symmetry with NextCopyFromExecute()). */ static bool NextCopyFromDispatch(CopyState cstate, ExprContext *econtext, Datum *values, bool *nulls, Oid *tupleOid) { - /* - * GPDB_91_MERGE_FIXME: The idea here would be to only call the - * input function for the fields we need in the QD. But for now, - * screw performance. - * - * Note: There used to be code in InitDistributionData(), to compute - * the last field number that's needed for to determine which partition - * a row belongs to. If you resurrect this optimization, you'll probably - * need to resurrect that, too. - */ return NextCopyFrom(cstate, econtext, values, nulls, tupleOid); } @@ -5228,61 +5360,58 @@ NextCopyFromExecute(CopyState cstate, ExprContext *econtext, AttrNumber num_phys_attrs; copy_from_dispatch_row frame; int r; - Oid header; ResultRelInfo *resultRelInfo; + TupleTableSlot *baseSlot; TupleTableSlot *slot; + Datum *baseValues; + bool *baseNulls; Datum *values; bool *nulls; - MemoryContext oldcxt; + bool got_error; + + /* + * The code below reads the 'copy_from_dispatch_row' struct, and only + * then checks if it was actually a 'copy_from_dispatch_error' struct. + * That only works when 'copy_from_dispatch_error' is larger than + *'copy_from_dispatch_row'. + */ + StaticAssertStmt(SizeOfCopyFromDispatchError >= SizeOfCopyFromDispatchRow, + "copy_from_dispatch_error must be larger than copy_from_dispatch_row"); + /* + * If we encounter an error while parsing the row (or we receive a row from + * the QD that was already marked as an erroneous row), we loop back here + * until we get a good row. + */ retry: - /* sneak peek at the first Oid field to see if it's a row or an error */ - r = CopyGetData(cstate, &header, sizeof(Oid)); + got_error = false; + + r = CopyGetData(cstate, (char *) &frame, SizeOfCopyFromDispatchRow); if (r == 0) return NULL; - if (r != sizeof(Oid)) + if (r != SizeOfCopyFromDispatchRow) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("unexpected EOF in COPY data"))); - - if (header == InvalidOid) + if (frame.lineno == -1) { - HandleQDErrorFrame(cstate); + HandleQDErrorFrame(cstate, (char *) &frame, SizeOfCopyFromDispatchRow); goto retry; } - frame.relid = header; - r = CopyGetData(cstate, ((char *) &frame) + sizeof(Oid), sizeof(frame) - sizeof(Oid)); - if (r != sizeof(frame) - sizeof(Oid)) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("unexpected EOF in COPY data"))); - - if (!OidIsValid(frame.relid)) - elog(ERROR, "invalid target relation id in tuple frame received from QD"); - - /* - * Look up the correct partition - */ - oldcxt = MemoryContextSwitchTo(estate->es_query_cxt); - + /* Prepare for parsing the input line */ resultRelInfo = estate->es_result_relation_info; - if (frame.relid != RelationGetRelid(resultRelInfo->ri_RelationDesc)) - { - resultRelInfo = targetid_get_partition(frame.relid, estate, true); - estate->es_result_relation_info = resultRelInfo; - } - - if (!resultRelInfo->ri_resultSlot) - resultRelInfo->ri_resultSlot = - MakeSingleTupleTableSlot(resultRelInfo->ri_RelationDesc->rd_att); - slot = resultRelInfo->ri_resultSlot; - + baseSlot = resultRelInfo->ri_resultSlot; tupDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); attr = tupDesc->attrs; num_phys_attrs = tupDesc->natts; - MemoryContextSwitchTo(oldcxt); + /* Initialize all values for row to NULL */ + ExecClearTuple(baseSlot); + baseValues = slot_get_values(baseSlot); + baseNulls = slot_get_isnull(baseSlot); + MemSet(baseValues, 0, num_phys_attrs * sizeof(Datum)); + MemSet(baseNulls, true, num_phys_attrs * sizeof(bool)); /* check for overflowing fields */ if (frame.fld_count < 0 || frame.fld_count > num_phys_attrs) @@ -5290,17 +5419,15 @@ NextCopyFromExecute(CopyState cstate, ExprContext *econtext, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("extra data after last expected column"))); - /* Initialize all values for row to NULL */ - ExecClearTuple(slot); - values = slot_get_values(resultRelInfo->ri_resultSlot); - nulls = slot_get_isnull(resultRelInfo->ri_resultSlot); - MemSet(values, 0, num_phys_attrs * sizeof(Datum)); - MemSet(nulls, true, num_phys_attrs * sizeof(bool)); - - /* Read the OID field if present */ + /* Read the OID field, if present */ if (file_has_oids) { - Oid loaded_oid = frame.loaded_oid; + Oid loaded_oid; + + if (CopyGetData(cstate, &loaded_oid, sizeof(Oid)) != sizeof(Oid)) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("unexpected EOF in COPY data"))); if (loaded_oid == InvalidOid) { @@ -5311,13 +5438,92 @@ NextCopyFromExecute(CopyState cstate, ExprContext *econtext, } *tupleOid = loaded_oid; } - else if (frame.loaded_oid != InvalidOid) + + /* + * Read the input line into 'line_buf'. + */ + resetStringInfo(&cstate->line_buf); + enlargeStringInfo(&cstate->line_buf, frame.line_len); + if (CopyGetData(cstate, cstate->line_buf.data, frame.line_len) != frame.line_len) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("unexpected EOF in COPY data"))); + cstate->line_buf.data[frame.line_len] = '\0'; + cstate->line_buf.len = frame.line_len; + cstate->line_buf.cursor = frame.residual_off; + cstate->line_buf_valid = true; + cstate->line_buf_converted = true; + cstate->cur_lineno = frame.lineno; + cstate->stopped_processing_at_delim = frame.delim_seen_at_end; + + /* + * Parse any fields from the input line that were not processed in the + * QD already. + */ + if (!cstate->cdbsreh) + { + if (!NextCopyFromX(cstate, econtext, baseValues, baseNulls, NULL)) + { ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("unexpected OID received in COPY data"))); + errmsg("unexpected EOF in COPY data"))); + } + } + else + { + MemoryContext oldcontext = CurrentMemoryContext; + bool result; - cstate->cur_lineno = frame.lineno; + PG_TRY(); + { + result = NextCopyFromX(cstate, econtext, + baseValues, baseNulls, tupleOid); + if (!result) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("unexpected EOF in COPY data"))); + } + PG_CATCH(); + { + HandleCopyError(cstate); + got_error = true; + MemoryContextSwitchTo(oldcontext); + } + PG_END_TRY(); + } + + ExecStoreVirtualTuple(baseSlot); + + /* + * Remap the values to the form expected by the target partition. + */ + if (frame.relid != RelationGetRelid(resultRelInfo->ri_RelationDesc)) + { + MemoryContext oldcontext = MemoryContextSwitchTo(estate->es_query_cxt); + + resultRelInfo = targetid_get_partition(frame.relid, estate, true); + estate->es_result_relation_info = resultRelInfo; + + MemoryContextSwitchTo(oldcontext); + + slot = reconstructMatchingTupleSlot(baseSlot, resultRelInfo); + + /* since resultRelInfo has changed, refresh these values */ + tupDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); + attr = tupDesc->attrs; + num_phys_attrs = tupDesc->natts; + } + else + slot = baseSlot; + + /* + * Read any attributes that were processed in the QD already. The attribute + * numbers in the message are already in terms of the target partition, so + * we do this after remapping and switching to the partition slot. + */ + values = slot_get_values(slot); + nulls = slot_get_isnull(slot); for (i = 0; i < frame.fld_count; i++) { int16 attnum; @@ -5329,8 +5535,10 @@ NextCopyFromExecute(CopyState cstate, ExprContext *econtext, ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("unexpected EOF in COPY data"))); + if (attnum < 1 || attnum > num_phys_attrs) - elog(ERROR, "invalid attnum received from QD: %d", attnum); + elog(ERROR, "invalid attnum received from QD: %d (num physical attributes: %d)", + attnum, num_phys_attrs); m = attnum - 1; cstate->cur_attname = NameStr(attr[m]->attname); @@ -5401,23 +5609,31 @@ NextCopyFromExecute(CopyState cstate, ExprContext *econtext, cstate->cur_attname = NULL; values[m] = value; - /* NULLs are currently not transmitted */ nulls[m] = false; } + if (got_error) + goto retry; + + ExecStoreVirtualTuple(slot); + /* * Here we should compute defaults for any columns for which we didn't * get a default from the QD. But at the moment, all defaults are evaluated * in the QD. */ - ExecStoreVirtualTuple(slot); - return slot; } +/* + * Parse and handle an "error frame" from QD. + * + * The caller has already read part of the frame; 'p' points to that part, + * of length 'len'. + */ static void -HandleQDErrorFrame(CopyState cstate) +HandleQDErrorFrame(CopyState cstate, char *p, int len) { CdbSreh *cdbsreh = cstate->cdbsreh; MemoryContext oldcontext; @@ -5426,12 +5642,20 @@ HandleQDErrorFrame(CopyState cstate) char *line; int r; + Assert(len <= SizeOfCopyFromDispatchError); + Assert(Gp_role == GP_ROLE_EXECUTE); oldcontext = MemoryContextSwitchTo(cdbsreh->badrowcontext); - r = CopyGetData(cstate, ((char *) &errframe) + sizeof(Oid), sizeof(errframe) - sizeof(Oid)); - if (r != sizeof(errframe) - sizeof(Oid)) + /* + * Copy the part of the struct that the caller had already read, and + * read the rest. + */ + memcpy(&errframe, p, len); + + r = CopyGetData(cstate, ((char *) &errframe) + len, SizeOfCopyFromDispatchError - len); + if (r != SizeOfCopyFromDispatchError - len) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("unexpected EOF in COPY data"))); @@ -5455,8 +5679,8 @@ HandleQDErrorFrame(CopyState cstate) cdbsreh->linenumber = errframe.lineno; cdbsreh->rawdata = line; - cdbsreh->is_server_enc = errframe.line_buf_converted; cdbsreh->errmsg = errormsg; + cdbsreh->is_server_enc = errframe.line_buf_converted; HandleSingleRowError(cdbsreh); @@ -5524,11 +5748,25 @@ SendCopyFromForwardedTuple(CopyState cstate, num_phys_attrs = tupDesc->natts; /* - * Reset the message buffer, and reserve space for the frame header. + * Reset the message buffer, and reserve enough space for the header, + * the OID if any, and the residual line. */ msgbuf = cstate->dispatch_msgbuf; - Assert(msgbuf->maxlen >= sizeof(copy_from_dispatch_row)); - msgbuf->len = sizeof(copy_from_dispatch_row); + ENLARGE_MSGBUF(msgbuf, SizeOfCopyFromDispatchRow + sizeof(Oid) + cstate->line_buf.len); + + /* the header goes to the beginning of the struct, but it will be filled in later. */ + msgbuf->len = SizeOfCopyFromDispatchRow; + + /* + * After the header, the OID from the input row, if any. + */ + if (cstate->file_has_oids) + APPEND_MSGBUF_NOCHECK(msgbuf, &tuple_oid, sizeof(Oid)); + + /* + * Next, any residual text that we didn't process in the QD. + */ + APPEND_MSGBUF_NOCHECK(msgbuf, cstate->line_buf.data, cstate->line_buf.len); /* * Append attributes to the buffer. @@ -5618,10 +5856,12 @@ SendCopyFromForwardedTuple(CopyState cstate, * buffer. */ frame = (copy_from_dispatch_row *) msgbuf->data; - frame->relid = RelationGetRelid(rel); - frame->loaded_oid = tuple_oid; frame->lineno = lineno; + frame->relid = RelationGetRelid(rel); + frame->line_len = cstate->line_buf.len; + frame->residual_off = cstate->line_buf.cursor; frame->fld_count = num_sent_fields; + frame->delim_seen_at_end = cstate->stopped_processing_at_delim; if (toAll) cdbCopySendDataToAll(cdbCopy, msgbuf->data, msgbuf->len); @@ -5630,14 +5870,15 @@ SendCopyFromForwardedTuple(CopyState cstate, } static void -SendCopyFromForwardedHeader(CopyState cstate, CdbCopy *cdbCopy, bool file_has_oids) +SendCopyFromForwardedHeader(CopyState cstate, CdbCopy *cdbCopy) { copy_from_dispatch_header header_frame; cdbCopySendDataToAll(cdbCopy, QDtoQESignature, sizeof(QDtoQESignature)); memset(&header_frame, 0, sizeof(header_frame)); - header_frame.file_has_oids = file_has_oids; + header_frame.file_has_oids = cstate->file_has_oids; + header_frame.first_qe_processed_field = cstate->first_qe_processed_field; cdbCopySendDataToAll(cdbCopy, (char *) &header_frame, sizeof(header_frame)); } @@ -5652,27 +5893,27 @@ SendCopyFromForwardedError(CopyState cstate, CdbCopy *cdbCopy, char *errormsg) msgbuf = cstate->dispatch_msgbuf; resetStringInfo(msgbuf); - enlargeStringInfo(msgbuf, sizeof(copy_from_dispatch_error)); + enlargeStringInfo(msgbuf, SizeOfCopyFromDispatchError); /* allocate space for the header (we'll fill it in last). */ - msgbuf->len = sizeof(copy_from_dispatch_error); + msgbuf->len = SizeOfCopyFromDispatchError; appendBinaryStringInfo(msgbuf, errormsg, errormsg_len); appendBinaryStringInfo(msgbuf, cstate->line_buf.data, cstate->line_buf.len); errframe = (copy_from_dispatch_error *) msgbuf->data; - errframe->error_marker = InvalidOid; + errframe->error_marker = -1; errframe->lineno = cstate->cur_lineno; - errframe->line_buf_converted = cstate->line_buf_converted; errframe->line_len = cstate->line_buf.len; errframe->errmsg_len = errormsg_len; + errframe->line_buf_converted = cstate->line_buf_converted; /* send the bad data row to a random QE (via roundrobin) */ if (cstate->lastsegid == cdbCopy->total_segs) cstate->lastsegid = 0; /* start over from first segid */ target_seg = (cstate->lastsegid++ % cdbCopy->total_segs); - + cdbCopySendData(cdbCopy, target_seg, msgbuf->data, msgbuf->len); } @@ -6219,7 +6460,7 @@ GetDecimalFromHex(char hex) * The return value is the number of fields actually read. */ static int -CopyReadAttributesText(CopyState cstate) +CopyReadAttributesText(CopyState cstate, int stop_processing_at_field) { char delimc = cstate->delim[0]; char escapec = cstate->escape_off ? delimc : cstate->escape[0]; @@ -6255,7 +6496,7 @@ CopyReadAttributesText(CopyState cstate) output_ptr = cstate->attribute_buf.data; /* set pointer variables for loop */ - cur_ptr = cstate->line_buf.data; + cur_ptr = cstate->line_buf.data + cstate->line_buf.cursor; line_end_ptr = cstate->line_buf.data + cstate->line_buf.len; /* Outer loop iterates over fields */ @@ -6268,6 +6509,15 @@ CopyReadAttributesText(CopyState cstate) int input_len; bool saw_non_ascii = false; + /* + * In QD, stop once we have processed the last field we need in the QD. + */ + if (fieldno == stop_processing_at_field) + { + cstate->stopped_processing_at_delim = true; + break; + } + /* Make sure there is enough space for the next value */ if (fieldno >= cstate->max_fields) { @@ -6430,9 +6680,18 @@ CopyReadAttributesText(CopyState cstate) fieldno++; /* Done if we hit EOL instead of a delim */ if (!found_delim) + { + cstate->stopped_processing_at_delim = false; break; + } } + /* + * Make note of the stopping point in 'line_buf.cursor', so that we + * can send the rest to the QE later. + */ + cstate->line_buf.cursor = cur_ptr - cstate->line_buf.data; + /* Clean up state of attribute_buf */ output_ptr--; Assert(*output_ptr == '\0'); @@ -6448,7 +6707,7 @@ CopyReadAttributesText(CopyState cstate) * "standard" (i.e. common) CSV usage. */ static int -CopyReadAttributesCSV(CopyState cstate) +CopyReadAttributesCSV(CopyState cstate, int stop_processing_at_field) { char delimc = cstate->delim[0]; char quotec = cstate->quote[0]; @@ -6485,7 +6744,7 @@ CopyReadAttributesCSV(CopyState cstate) output_ptr = cstate->attribute_buf.data; /* set pointer variables for loop */ - cur_ptr = cstate->line_buf.data; + cur_ptr = cstate->line_buf.data + cstate->line_buf.cursor; line_end_ptr = cstate->line_buf.data + cstate->line_buf.len; /* Outer loop iterates over fields */ @@ -6498,6 +6757,15 @@ CopyReadAttributesCSV(CopyState cstate) char *end_ptr; int input_len; + /* + * In QD, stop once we have processed the last field we need in the QD. + */ + if (fieldno == stop_processing_at_field) + { + cstate->stopped_processing_at_delim = true; + break; + } + /* Make sure there is enough space for the next value */ if (fieldno >= cstate->max_fields) { @@ -6601,9 +6869,18 @@ CopyReadAttributesCSV(CopyState cstate) fieldno++; /* Done if we hit EOL instead of a delim */ if (!found_delim) + { + cstate->stopped_processing_at_delim = false; break; + } } + /* + * Make note of the stopping point in 'line_buf.cursor', so that we + * can send the rest to the QE later. + */ + cstate->line_buf.cursor = cur_ptr - cstate->line_buf.data; + /* Clean up state of attribute_buf */ output_ptr--; Assert(*output_ptr == '\0'); @@ -7286,6 +7563,162 @@ FreeDistributionData(GpDistributionData *distData) } } +/* + * Compute which fields need to be processed in the QD, and which ones can + * be delayed to the QE. + */ +static void +InitCopyFromDispatchSplit(CopyState cstate, GpDistributionData *distData, + EState *estate) +{ + int first_qe_processed_field = 0; + Bitmapset *needed_cols = NULL; + ListCell *lc; + + if (cstate->binary) + { + foreach(lc, cstate->attnumlist) + { + AttrNumber attnum = lfirst_int(lc); + needed_cols = bms_add_member(needed_cols, attnum); + first_qe_processed_field++; + } + } + else + { + int fieldno; + /* + * We need all the columns that form the distribution key. + */ + if (distData->policy) + { + for (int i = 0; i < distData->policy->nattrs; i++) + needed_cols = bms_add_member(needed_cols, distData->policy->attrs[i]); + } + + /* + * If the target is partitioned, get the columns needed for partitioning + * keys, and for distribution keys of each partition. + */ + if (estate->es_result_partitions) + needed_cols = GetTargetKeyCols(RelationGetRelid(estate->es_result_relation_info->ri_RelationDesc), + estate->es_result_partitions, needed_cols, + distData->policy == NULL, estate); + + /* Get the max fieldno that contains one of the needed attributes. */ + fieldno = 0; + foreach(lc, cstate->attnumlist) + { + AttrNumber attnum = lfirst_int(lc); + + if (bms_is_member(attnum, needed_cols)) + first_qe_processed_field = fieldno + 1; + fieldno++; + } + } + + /* If the file contains OIDs, it's the first field. */ + if (cstate->file_has_oids) + first_qe_processed_field++; + + cstate->first_qe_processed_field = first_qe_processed_field; + + if (Test_copy_qd_qe_split) + { + if (first_qe_processed_field == + list_length(cstate->attnumlist) + (cstate->file_has_oids ? 1 : 0)) + elog(INFO, "all fields will be processed in the QD"); + else + elog(INFO, "first field processed in the QE: %d", first_qe_processed_field); + } +} + +/* + * Recursive helper function for InitCopyFromDispatchSplit(), to get + * the columns needed in QD for a partition and its subpartitions. + */ +static Bitmapset * +GetTargetKeyCols(Oid relid, PartitionNode *children, Bitmapset *needed_cols, + bool distkeys, EState *estate) +{ + int i; + ListCell *lc; + + /* + * Partition key columns. + * + * Note: paratts[] stores the attribute numbers, in terms of the root + * partition. That's exactly what we want. + */ + if (children) + { + for (i = 0; i < children->part->parnatts; i++) + needed_cols = bms_add_member(needed_cols, children->part->paratts[i]); + } + + /* + * Distribution key columns + * + * These are in terms of the partition itself! We need to map them to + * the root partition's attribute numbers. + * + * (At the moment, this is more complicated than necessary, because GPDB + * doesn't support partitions that have differing distribution policies, + * except that child partitions can be randomly distributed, even though + * the parent is hash distributed.) + */ + if (distkeys) + { + ResultRelInfo *partrr; + GpPolicy *partPolicy; + AttrMap *map; + + partrr = targetid_get_partition(relid, estate, false); + map = partrr->ri_partInsertMap; + partPolicy = partrr->ri_RelationDesc->rd_cdbpolicy; + + if (partPolicy) + { + for (i = 0; i < partPolicy->nattrs; i++) + { + AttrNumber partAttNum = partPolicy->attrs[i]; + AttrNumber parentAttNum; + + /* Map this partition's attribute number to the parent's. */ + if (map) + { + for (parentAttNum = 1; parentAttNum <= map->attr_count; parentAttNum++) + { + if (map->attr_map[parentAttNum] == partAttNum) + break; + } + if (parentAttNum > map->attr_count) + elog(ERROR, "could not find mapping partition distribution key column %d in parent relation", + partAttNum); + } + else + parentAttNum = partAttNum; + + needed_cols = bms_add_member(needed_cols, parentAttNum); + } + } + } + + /* Recurse to subpartitions */ + if (children) + { + foreach(lc, children->rules) + { + PartitionRule *pr = (PartitionRule *) lfirst(lc); + + needed_cols = GetTargetKeyCols(pr->parchildrelid, pr->children, + needed_cols, distkeys, estate); + } + } + + return needed_cols; +} + /* Get distribution policy for specific part */ static GpDistributionData * GetDistributionPolicyForPartition(CopyState cstate, EState *estate, @@ -7316,11 +7749,9 @@ GetDistributionPolicyForPartition(CopyState cstate, EState *estate, d = hash_search(mainDistData->hashmap, &(relid), HASH_ENTER, &found); if (!found) { - Relation rel; + Relation rel = resultRelInfo->ri_RelationDesc; MemoryContext oldcontext; - rel = heap_open(relid, NoLock); - /* * Make sure this all persists the current iteration. */ @@ -7330,8 +7761,6 @@ GetDistributionPolicyForPartition(CopyState cstate, EState *estate, d->policy = GpPolicyCopy(rel->rd_cdbpolicy); MemoryContextSwitchTo(oldcontext); - - heap_close(rel, NoLock); } return d; diff --git a/src/backend/utils/misc/guc_gp.c b/src/backend/utils/misc/guc_gp.c index 09f7d788d723..c5db4724e192 100644 --- a/src/backend/utils/misc/guc_gp.c +++ b/src/backend/utils/misc/guc_gp.c @@ -132,6 +132,7 @@ bool Debug_appendonly_print_compaction = false; bool Debug_resource_group = false; bool Debug_bitmap_print_insert = false; bool Test_print_direct_dispatch_info = false; +bool Test_copy_qd_qe_split = false; bool gp_permit_relation_node_change = false; int gp_max_local_distributed_cache = 1024; bool gp_appendonly_verify_block_checksums = true; @@ -1545,6 +1546,17 @@ struct config_bool ConfigureNamesBool_gp[] = NULL, NULL, NULL }, + { + {"test_copy_qd_qe_split", PGC_SUSET, DEVELOPER_OPTIONS, + gettext_noop("For testing purposes, print information about which columns are parsed in QD and which in QE."), + NULL, + GUC_SUPERUSER_ONLY | GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE + }, + &Test_copy_qd_qe_split, + false, + NULL, NULL, NULL + }, + { {"debug_bitmap_print_insert", PGC_SUSET, DEVELOPER_OPTIONS, gettext_noop("Print log messages for bitmap index insert routines (caution-- generate a lot of logs!)"), diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 5e8631b601be..958495195312 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -238,6 +238,10 @@ typedef struct CopyStateData /* Greenplum Database specific variables */ bool escape_off; /* treat backslashes as non-special? */ + int first_qe_processed_field; + List *qd_attnumlist; + List *qe_attnumlist; + bool stopped_processing_at_delim; PartitionNode *partitions; /* partitioning meta data from dispatcher */ List *ao_segnos; /* AO table meta data from dispatcher */ diff --git a/src/include/utils/sync_guc_name.h b/src/include/utils/sync_guc_name.h index ba3619f53fbe..d78cb1d10837 100644 --- a/src/include/utils/sync_guc_name.h +++ b/src/include/utils/sync_guc_name.h @@ -102,6 +102,7 @@ "statement_mem", "statement_timeout", "temp_buffers", + "test_copy_qd_qe_split", "TimeZone", "verify_gpfdists_cert", "vmem_process_interrupt", diff --git a/src/test/regress/expected/gpcopy_dispatch.out b/src/test/regress/expected/gpcopy_dispatch.out new file mode 100644 index 000000000000..5a55c370a37c --- /dev/null +++ b/src/test/regress/expected/gpcopy_dispatch.out @@ -0,0 +1,82 @@ +SET test_copy_qd_qe_split = on; +-- Distributed randomly. QD doesn't need any of the cols. +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED RANDOMLY; +COPY disttest FROM stdin; +INFO: first field processed in the QE: 0 +CONTEXT: COPY disttest, line 0 +DROP TABLE disttest; +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (b); +COPY disttest FROM stdin; +INFO: first field processed in the QE: 2 +CONTEXT: COPY disttest, line 0 +DROP TABLE disttest; +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (c); +COPY disttest FROM stdin; +INFO: all fields will be processed in the QD +CONTEXT: COPY disttest, line 0 +DROP TABLE disttest; +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (c, a); +COPY disttest FROM stdin; +INFO: all fields will be processed in the QD +CONTEXT: COPY disttest, line 0 +DROP TABLE disttest; +-- With column list +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (c, b); +COPY disttest (c, b, a) FROM stdin; +INFO: first field processed in the QE: 2 +CONTEXT: COPY disttest, line 0 +DROP TABLE disttest; +-- +-- Partitioned scenarios. +-- +-- Distributed randomly, but QD needs the partitioning key. +CREATE TABLE partdisttest (a int, b int, c int) DISTRIBUTED RANDOMLY PARTITION BY RANGE (b) (START (1) END (10) EVERY (5)); +NOTICE: CREATE TABLE will create partition "partdisttest_1_prt_1" for table "partdisttest" +NOTICE: CREATE TABLE will create partition "partdisttest_1_prt_2" for table "partdisttest" +COPY partdisttest FROM stdin; +INFO: first field processed in the QE: 2 +CONTEXT: COPY partdisttest, line 0 +DROP TABLE partdisttest; +-- With a dropped column +CREATE TABLE partdisttest (a int, dropped int, b int, c int) DISTRIBUTED RANDOMLY PARTITION BY RANGE (b) (START (1) END (10) EVERY (5)); +ALTER TABLE partdisttest DROP COLUMN dropped; +COPY partdisttest FROM stdin; +INFO: first field processed in the QE: 2 +CONTEXT: COPY partdisttest, line 0 +DROP TABLE partdisttest; +-- Hash distributed, with a dropped column +CREATE TABLE partdisttest (a int, dropped int, b int, c int) + DISTRIBUTED BY (b) + PARTITION BY RANGE (a) (START (0) END (100) EVERY (50)); +ALTER TABLE partdisttest DROP COLUMN dropped; +ALTER TABLE partdisttest ADD PARTITION neg start (-10) end (0); +COPY partdisttest FROM stdin; +INFO: first field processed in the QE: 2 +CONTEXT: COPY partdisttest, line 0 +DROP TABLE partdisttest; +-- Subpartitions +CREATE TABLE partdisttest (a int, dropped int, b int, c int, d int) + DISTRIBUTED RANDOMLY + PARTITION BY RANGE (b) + SUBPARTITION BY RANGE (c) + ( + PARTITION b_low start (1) + ( + SUBPARTITION c_low start (1), + SUBPARTITION c_hi start (5) + ), + PARTITION b_hi start (5) + ( + SUBPARTITION c_low start (1), + SUBPARTITION c_hi start (5) + ) + ); +ALTER TABLE partdisttest DROP COLUMN dropped; +COPY partdisttest FROM stdin; +INFO: first field processed in the QE: 3 +CONTEXT: COPY partdisttest, line 0 +ALTER TABLE partdisttest ADD PARTITION b_negative start (-10) end (0) (subpartition c_negative start (-10) end (0)); +COPY partdisttest FROM stdin; +INFO: first field processed in the QE: 3 +CONTEXT: COPY partdisttest, line 0 +DROP TABLE partdisttest; diff --git a/src/test/regress/greenplum_schedule b/src/test/regress/greenplum_schedule index d1f3602536ba..1e49f8e791e7 100755 --- a/src/test/regress/greenplum_schedule +++ b/src/test/regress/greenplum_schedule @@ -37,7 +37,7 @@ test: gp_tablespace test: temp_tablespaces test: default_tablespace -test: leastsquares opr_sanity_gp decode_expr bitmapscan bitmapscan_ao case_gp limit_gp notin percentile join_gp union_gp gpcopy_encoding gp_create_table gp_create_view window_views create_table_like_gp matview_ao prepare_lockmode +test: leastsquares opr_sanity_gp decode_expr bitmapscan bitmapscan_ao case_gp limit_gp notin percentile join_gp union_gp gpcopy_encoding gp_create_table gp_create_view window_views create_table_like_gp matview_ao prepare_lockmode gpcopy_dispatch # below test(s) inject faults so each of them need to be in a separate group test: gpcopy diff --git a/src/test/regress/output/sreh.source b/src/test/regress/output/sreh.source index 5d5525a054c1..41076e65924f 100755 --- a/src/test/regress/output/sreh.source +++ b/src/test/regress/output/sreh.source @@ -37,8 +37,8 @@ SELECT * FROM sreh_copy ORDER BY a,b,c; -- COPY sreh_copy FROM '@abs_srcdir@/data/bad_data1.data' DELIMITER '|' SEGMENT REJECT LIMIT 2; ERROR: segment reject limit reached, aborting operation -DETAIL: Last error was: invalid input syntax for integer: "", column b -CONTEXT: COPY sreh_copy, line 8, column a +DETAIL: Last error was: invalid input syntax for integer: "eleven", column a +CONTEXT: COPY sreh_copy, line 11, column a: "eleven" SELECT * FROM sreh_copy ORDER BY a,b,c; a | b | c ----+----+---- diff --git a/src/test/regress/sql/gpcopy_dispatch.sql b/src/test/regress/sql/gpcopy_dispatch.sql new file mode 100644 index 000000000000..b8b13263a09e --- /dev/null +++ b/src/test/regress/sql/gpcopy_dispatch.sql @@ -0,0 +1,99 @@ + +SET test_copy_qd_qe_split = on; + +-- Distributed randomly. QD doesn't need any of the cols. +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED RANDOMLY; +COPY disttest FROM stdin; +1 2 3 +\. +DROP TABLE disttest; + +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (b); +COPY disttest FROM stdin; +1 2 3 +\. +DROP TABLE disttest; + +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (c); +COPY disttest FROM stdin; +1 2 3 +\. +DROP TABLE disttest; + +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (c, a); +COPY disttest FROM stdin; +1 2 3 +\. +DROP TABLE disttest; + +-- With column list +CREATE TABLE disttest (a int, b int, c int) DISTRIBUTED BY (c, b); +COPY disttest (c, b, a) FROM stdin; +3 2 1 +\. +DROP TABLE disttest; + + +-- +-- Partitioned scenarios. +-- + +-- Distributed randomly, but QD needs the partitioning key. +CREATE TABLE partdisttest (a int, b int, c int) DISTRIBUTED RANDOMLY PARTITION BY RANGE (b) (START (1) END (10) EVERY (5)); +COPY partdisttest FROM stdin; +1 2 3 +\. +DROP TABLE partdisttest; + +-- With a dropped column +CREATE TABLE partdisttest (a int, dropped int, b int, c int) DISTRIBUTED RANDOMLY PARTITION BY RANGE (b) (START (1) END (10) EVERY (5)); +ALTER TABLE partdisttest DROP COLUMN dropped; +COPY partdisttest FROM stdin; +1 2 3 +\. +DROP TABLE partdisttest; + +-- Hash distributed, with a dropped column +CREATE TABLE partdisttest (a int, dropped int, b int, c int) + DISTRIBUTED BY (b) + PARTITION BY RANGE (a) (START (0) END (100) EVERY (50)); +ALTER TABLE partdisttest DROP COLUMN dropped; + +ALTER TABLE partdisttest ADD PARTITION neg start (-10) end (0); + +COPY partdisttest FROM stdin; +-1 2 3 +\. +DROP TABLE partdisttest; + + + +-- Subpartitions +CREATE TABLE partdisttest (a int, dropped int, b int, c int, d int) + DISTRIBUTED RANDOMLY + PARTITION BY RANGE (b) + SUBPARTITION BY RANGE (c) + ( + PARTITION b_low start (1) + ( + SUBPARTITION c_low start (1), + SUBPARTITION c_hi start (5) + ), + PARTITION b_hi start (5) + ( + SUBPARTITION c_low start (1), + SUBPARTITION c_hi start (5) + ) + ); +ALTER TABLE partdisttest DROP COLUMN dropped; +COPY partdisttest FROM stdin; +1 2 3 4 +\. + +ALTER TABLE partdisttest ADD PARTITION b_negative start (-10) end (0) (subpartition c_negative start (-10) end (0)); + +COPY partdisttest FROM stdin; +100 -1 -1 1 +\. + +DROP TABLE partdisttest; From 0490cebea221509f71e1643fd57aa27b360eb3c1 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Wed, 18 Dec 2019 10:53:47 -0800 Subject: [PATCH 020/102] COPY: Simplify partition distribution policy lookup Commit a8aa1c4acc95ea42 introduced a bug when a partition has a different distribution policy (in terms of column numbers) from the base table: we would perform a partition lookup -- using a tuple table slot for the partition but (incorrectly) with a tuple descriptor for the base table. This would sometimes lead to an error of "no partition for partitioning key". Upon closer inspection, we didn't even need to look up the partition because the caller already knew! This commit fixes that, and simplifies the logic in GetDistributionPolicyForPartition to just take a resultRelInfo instead of the tuple values. Co-authored-by: Jesse Zhang Co-authored-by: Ashwin Agrawal Reviewed-by: Heikki Linnakangas --- src/backend/commands/copy.c | 44 ++++++++----------- src/test/regress/expected/gpcopy_dispatch.out | 22 ++++++++-- src/test/regress/sql/gpcopy_dispatch.sql | 23 +++++++--- 3 files changed, 54 insertions(+), 35 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 59aa5e93b922..2a1727a30203 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -226,11 +226,9 @@ static GpDistributionData *InitDistributionData(CopyState cstate, EState *estate static void FreeDistributionData(GpDistributionData *distData); static void InitCopyFromDispatchSplit(CopyState cstate, GpDistributionData *distData, EState *estate); static Bitmapset *GetTargetKeyCols(Oid relid, PartitionNode *children, Bitmapset *needed_cols, bool distkeys, EState *estate); -static GpDistributionData *GetDistributionPolicyForPartition(CopyState cstate, - EState *estate, - GpDistributionData *mainDistData, - TupleDesc tupDesc, - Datum *values, bool *nulls); +static GpDistributionData *GetDistributionPolicyForPartition(GpDistributionData *mainDistData, + ResultRelInfo *resultRelInfo, + MemoryContext context); static unsigned int GetTargetSeg(GpDistributionData *distData, Datum *baseValues, bool *baseNulls); static ProgramPipes *open_program_pipes(char *command, bool forwrite); @@ -3950,21 +3948,22 @@ CopyFrom(CopyState cstate) { /* In QD, compute the target segment to send this row to. */ part_distData = GetDistributionPolicyForPartition( - cstate, estate, - distData, - tupDesc, - slot_get_values(slot), slot_get_isnull(slot)); + distData, + resultRelInfo, + cstate->copycontext); target_seg = GetTargetSeg(part_distData, slot_get_values(slot), slot_get_isnull(slot)); } else if (is_check_distkey) { - /* In COPY FROM ON SEGMENT, check the distribution key in the QE. */ + /* + * In COPY FROM ON SEGMENT, check the distribution key in the + * QE. + */ part_distData = GetDistributionPolicyForPartition( - cstate, estate, - distData, - tupDesc, - slot_get_values(slot), slot_get_isnull(slot)); + distData, + resultRelInfo, + cstate->copycontext); if (part_distData->policy->nattrs != 0) { @@ -7721,10 +7720,9 @@ GetTargetKeyCols(Oid relid, PartitionNode *children, Bitmapset *needed_cols, /* Get distribution policy for specific part */ static GpDistributionData * -GetDistributionPolicyForPartition(CopyState cstate, EState *estate, - GpDistributionData *mainDistData, - TupleDesc tupDesc, - Datum *values, bool *nulls) +GetDistributionPolicyForPartition(GpDistributionData *mainDistData, + ResultRelInfo *resultRelInfo, + MemoryContext context) { /* @@ -7735,27 +7733,21 @@ GetDistributionPolicyForPartition(CopyState cstate, EState *estate, if (mainDistData->hashmap) { Oid relid; - ResultRelInfo *resultRelInfo; GpDistributionData *d; bool found; - resultRelInfo = values_get_partition(values, - nulls, - tupDesc, - estate, - false); /* don't need indices in QD */ relid = resultRelInfo->ri_RelationDesc->rd_id; d = hash_search(mainDistData->hashmap, &(relid), HASH_ENTER, &found); if (!found) { - Relation rel = resultRelInfo->ri_RelationDesc; + Relation rel = resultRelInfo->ri_RelationDesc; MemoryContext oldcontext; /* * Make sure this all persists the current iteration. */ - oldcontext = MemoryContextSwitchTo(cstate->copycontext); + oldcontext = MemoryContextSwitchTo(context); d->cdbHash = makeCdbHashForRelation(rel); d->policy = GpPolicyCopy(rel->rd_cdbpolicy); diff --git a/src/test/regress/expected/gpcopy_dispatch.out b/src/test/regress/expected/gpcopy_dispatch.out index 5a55c370a37c..1d2a7607a9e6 100644 --- a/src/test/regress/expected/gpcopy_dispatch.out +++ b/src/test/regress/expected/gpcopy_dispatch.out @@ -45,14 +45,28 @@ INFO: first field processed in the QE: 2 CONTEXT: COPY partdisttest, line 0 DROP TABLE partdisttest; -- Hash distributed, with a dropped column -CREATE TABLE partdisttest (a int, dropped int, b int, c int) - DISTRIBUTED BY (b) - PARTITION BY RANGE (a) (START (0) END (100) EVERY (50)); -ALTER TABLE partdisttest DROP COLUMN dropped; +-- We used to have a bug where QD would pick the wrong partition and/or the +-- wrong segment due to difference between the base table and a partition: the +-- partitioing attribute(s) for the root table is column 3, but it is column 1 +-- in the leaf partition "neg". The QD would then mistakenly pick a partition +-- for the NULL value, and error out that no such a partition exists. +-- Note if the dropped columns are in a different position, a different (but +-- really similar) symptom will appear: the QD will pick another partition, +-- which potentially results in the wrong segment receiving the line / tuple. +CREATE TABLE partdisttest (dropped1 int, dropped2 int, a int, b int, c int) + DISTRIBUTED BY (a) + PARTITION BY RANGE (b) (START (0) END (100) EVERY (50)); +ALTER TABLE partdisttest DROP COLUMN dropped1, DROP COLUMN dropped2; ALTER TABLE partdisttest ADD PARTITION neg start (-10) end (0); COPY partdisttest FROM stdin; INFO: first field processed in the QE: 2 CONTEXT: COPY partdisttest, line 0 +SELECT tableoid::regclass, * FROM partdisttest; + tableoid | a | b | c +------------------------+---+----+--- + partdisttest_1_prt_neg | 2 | -1 | 3 +(1 row) + DROP TABLE partdisttest; -- Subpartitions CREATE TABLE partdisttest (a int, dropped int, b int, c int, d int) diff --git a/src/test/regress/sql/gpcopy_dispatch.sql b/src/test/regress/sql/gpcopy_dispatch.sql index b8b13263a09e..a4323c0d36e1 100644 --- a/src/test/regress/sql/gpcopy_dispatch.sql +++ b/src/test/regress/sql/gpcopy_dispatch.sql @@ -54,16 +54,29 @@ COPY partdisttest FROM stdin; DROP TABLE partdisttest; -- Hash distributed, with a dropped column -CREATE TABLE partdisttest (a int, dropped int, b int, c int) - DISTRIBUTED BY (b) - PARTITION BY RANGE (a) (START (0) END (100) EVERY (50)); -ALTER TABLE partdisttest DROP COLUMN dropped; + +-- We used to have a bug where QD would pick the wrong partition and/or the +-- wrong segment due to difference between the base table and a partition: the +-- partitioing attribute(s) for the root table is column 3, but it is column 1 +-- in the leaf partition "neg". The QD would then mistakenly pick a partition +-- for the NULL value, and error out that no such a partition exists. + +-- Note if the dropped columns are in a different position, a different (but +-- really similar) symptom will appear: the QD will pick another partition, +-- which potentially results in the wrong segment receiving the line / tuple. + +CREATE TABLE partdisttest (dropped1 int, dropped2 int, a int, b int, c int) + DISTRIBUTED BY (a) + PARTITION BY RANGE (b) (START (0) END (100) EVERY (50)); +ALTER TABLE partdisttest DROP COLUMN dropped1, DROP COLUMN dropped2; ALTER TABLE partdisttest ADD PARTITION neg start (-10) end (0); COPY partdisttest FROM stdin; --1 2 3 +2 -1 3 \. + +SELECT tableoid::regclass, * FROM partdisttest; DROP TABLE partdisttest; From 6a9371d34761fd107e8d4b158a44ad13f6b92a79 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Wed, 18 Dec 2019 14:45:59 -0800 Subject: [PATCH 021/102] Allocate tuple slot in the per query context Commit a8aa1c4acc95ea42 introduced a subtle bug: when a partition has a different distribution column number than the base table, we'd lazily construct a tuple table slot for that partition -- but in a (mistakenly) short-lived memory context. This is exposed when COPY handles more than MAX_BUFFERED_TUPLES lines, resetting the per-tuple memory context in between buffers. Resolves #9170 GitHub issue. Co-authored-by: Jesse Zhang Co-authored-by: Ashwin Agrawal Reviewed-by: Heikki Linnakangas --- src/backend/commands/copy.c | 3 +- src/test/regress/expected/gpcopy_dispatch.out | 30 +++++++++++++++++++ src/test/regress/sql/gpcopy_dispatch.sql | 25 ++++++++++++++++ 3 files changed, 56 insertions(+), 2 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 2a1727a30203..51f52af3d869 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -5503,11 +5503,10 @@ NextCopyFromExecute(CopyState cstate, ExprContext *econtext, resultRelInfo = targetid_get_partition(frame.relid, estate, true); estate->es_result_relation_info = resultRelInfo; + slot = reconstructMatchingTupleSlot(baseSlot, resultRelInfo); MemoryContextSwitchTo(oldcontext); - slot = reconstructMatchingTupleSlot(baseSlot, resultRelInfo); - /* since resultRelInfo has changed, refresh these values */ tupDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); attr = tupDesc->attrs; diff --git a/src/test/regress/expected/gpcopy_dispatch.out b/src/test/regress/expected/gpcopy_dispatch.out index 1d2a7607a9e6..6ac29ca21235 100644 --- a/src/test/regress/expected/gpcopy_dispatch.out +++ b/src/test/regress/expected/gpcopy_dispatch.out @@ -94,3 +94,33 @@ COPY partdisttest FROM stdin; INFO: first field processed in the QE: 3 CONTEXT: COPY partdisttest, line 0 DROP TABLE partdisttest; +CREATE TABLE partdisttest (dropped bool, a smallint, b smallint) + DISTRIBUTED BY (a) + PARTITION BY RANGE(a) + (PARTITION segundo START(5)); +NOTICE: CREATE TABLE will create partition "partdisttest_1_prt_segundo" for table "partdisttest" +ALTER TABLE partdisttest DROP dropped; +ALTER TABLE partdisttest ADD PARTITION primero START(0) END(5); +NOTICE: CREATE TABLE will create partition "partdisttest_1_prt_primero" for table "partdisttest" +-- We used to have bug, when a partition has a different distribution +-- column number than the base table, we'd lazily construct a tuple +-- table slot for that partition, but in a (mistakenly) short-lived +-- per query memory context. COPY FROM uses batch size of +-- MAX_BUFFERED_TUPLES tuples, and after that it resets the per query +-- context. As a result while processing second batch, segment used to +-- crash. This test exposed the bug and now validates the fix. +COPY ( + SELECT 2,1 + FROM ( + SELECT generate_series(1, MAX_BUFFERED_TUPLES + 1) + FROM (VALUES (10000)) t(MAX_BUFFERED_TUPLES) + ) t + ) TO '/tmp/ten-thousand-and-one-lines.txt'; +COPY partdisttest FROM '/tmp/ten-thousand-and-one-lines.txt'; +INFO: first field processed in the QE: 1 +SELECT tableoid::regclass, count(*) FROM partdisttest GROUP BY 1; + tableoid | count +----------------------------+------- + partdisttest_1_prt_primero | 10001 +(1 row) + diff --git a/src/test/regress/sql/gpcopy_dispatch.sql b/src/test/regress/sql/gpcopy_dispatch.sql index a4323c0d36e1..8ff20fb14bf7 100644 --- a/src/test/regress/sql/gpcopy_dispatch.sql +++ b/src/test/regress/sql/gpcopy_dispatch.sql @@ -110,3 +110,28 @@ COPY partdisttest FROM stdin; \. DROP TABLE partdisttest; + +CREATE TABLE partdisttest (dropped bool, a smallint, b smallint) + DISTRIBUTED BY (a) + PARTITION BY RANGE(a) + (PARTITION segundo START(5)); +ALTER TABLE partdisttest DROP dropped; +ALTER TABLE partdisttest ADD PARTITION primero START(0) END(5); + +-- We used to have bug, when a partition has a different distribution +-- column number than the base table, we'd lazily construct a tuple +-- table slot for that partition, but in a (mistakenly) short-lived +-- per query memory context. COPY FROM uses batch size of +-- MAX_BUFFERED_TUPLES tuples, and after that it resets the per query +-- context. As a result while processing second batch, segment used to +-- crash. This test exposed the bug and now validates the fix. +COPY ( + SELECT 2,1 + FROM ( + SELECT generate_series(1, MAX_BUFFERED_TUPLES + 1) + FROM (VALUES (10000)) t(MAX_BUFFERED_TUPLES) + ) t + ) TO '/tmp/ten-thousand-and-one-lines.txt'; +COPY partdisttest FROM '/tmp/ten-thousand-and-one-lines.txt'; + +SELECT tableoid::regclass, count(*) FROM partdisttest GROUP BY 1; From 08fa167f3e992612c5e5f089520c40ccff2b9f19 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Fri, 20 Dec 2019 15:27:28 -0800 Subject: [PATCH 022/102] Simplify a query in gpcopy_dispatch test Currently, ORCA is producing wrong result for the query used ``` SELECT 2,1 FROM ( SELECT generate_series(1, MAX_BUFFERED_TUPLES + 1) FROM (VALUES (5)) t(MAX_BUFFERED_TUPLES) ) t ; ?column? | ?column? | generate_series ----------+----------+----------------- 2 | 1 | 1 2 | 1 | 2 2 | 1 | 3 2 | 1 | 4 2 | 1 | 5 2 | 1 | 6 (6 rows) ``` Hence, avoid using that query and instead use simpler version to pass the test for ORCA enabled builds. --- src/test/regress/expected/gpcopy_dispatch.out | 5 +---- src/test/regress/sql/gpcopy_dispatch.sql | 5 +---- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/src/test/regress/expected/gpcopy_dispatch.out b/src/test/regress/expected/gpcopy_dispatch.out index 6ac29ca21235..ca80fc2e52cd 100644 --- a/src/test/regress/expected/gpcopy_dispatch.out +++ b/src/test/regress/expected/gpcopy_dispatch.out @@ -111,10 +111,7 @@ NOTICE: CREATE TABLE will create partition "partdisttest_1_prt_primero" for tab -- crash. This test exposed the bug and now validates the fix. COPY ( SELECT 2,1 - FROM ( - SELECT generate_series(1, MAX_BUFFERED_TUPLES + 1) - FROM (VALUES (10000)) t(MAX_BUFFERED_TUPLES) - ) t + FROM generate_series(1, 10001) ) TO '/tmp/ten-thousand-and-one-lines.txt'; COPY partdisttest FROM '/tmp/ten-thousand-and-one-lines.txt'; INFO: first field processed in the QE: 1 diff --git a/src/test/regress/sql/gpcopy_dispatch.sql b/src/test/regress/sql/gpcopy_dispatch.sql index 8ff20fb14bf7..e7eca7ee7f89 100644 --- a/src/test/regress/sql/gpcopy_dispatch.sql +++ b/src/test/regress/sql/gpcopy_dispatch.sql @@ -127,10 +127,7 @@ ALTER TABLE partdisttest ADD PARTITION primero START(0) END(5); -- crash. This test exposed the bug and now validates the fix. COPY ( SELECT 2,1 - FROM ( - SELECT generate_series(1, MAX_BUFFERED_TUPLES + 1) - FROM (VALUES (10000)) t(MAX_BUFFERED_TUPLES) - ) t + FROM generate_series(1, 10001) ) TO '/tmp/ten-thousand-and-one-lines.txt'; COPY partdisttest FROM '/tmp/ten-thousand-and-one-lines.txt'; From e5d2ad402313e025bb7d2db2602ea5e6a7333c78 Mon Sep 17 00:00:00 2001 From: Ashuka Xue Date: Thu, 6 Feb 2020 14:39:45 -0800 Subject: [PATCH 023/102] Bump ORCA version to v3.92.0 This commit adds a new optimizer cost model value to use for experimental features and developer testing. Setting `optimizer_cost_model=experimental` will use the new costing formula. Currently it is only used for a bitmap costing change. Co-authored-by: Chris Hajas Co-authored-by: Ashuka Xue --- concourse/tasks/compile_gpdb.yml | 2 +- config/orca.m4 | 4 +- configure | 4 +- depends/conanfile_orca.txt | 2 +- .../gpopt/config/CConfigParamMapping.cpp | 5 ++ src/backend/gpopt/utils/COptTasks.cpp | 6 +-- src/backend/utils/misc/guc_gp.c | 3 +- src/include/utils/guc.h | 2 + src/test/regress/expected/bfv_index.out | 47 +++++++++++++++++++ .../regress/expected/bfv_index_optimizer.out | 31 ++++++++++++ src/test/regress/sql/bfv_index.sql | 8 ++++ 11 files changed, 104 insertions(+), 10 deletions(-) diff --git a/concourse/tasks/compile_gpdb.yml b/concourse/tasks/compile_gpdb.yml index ce0820051db9..a3eeddbb9e0c 100644 --- a/concourse/tasks/compile_gpdb.yml +++ b/concourse/tasks/compile_gpdb.yml @@ -19,5 +19,5 @@ params: BLD_TARGETS: OUTPUT_ARTIFACT_DIR: gpdb_artifacts CONFIGURE_FLAGS: - ORCA_TAG: v3.91.0 + ORCA_TAG: v3.92.0 RC_BUILD_TYPE_GCS: diff --git a/config/orca.m4 b/config/orca.m4 index 3db91ead13f5..041b65f4c752 100644 --- a/config/orca.m4 +++ b/config/orca.m4 @@ -40,10 +40,10 @@ AC_RUN_IFELSE([AC_LANG_PROGRAM([[ #include ]], [ -return strncmp("3.91.", GPORCA_VERSION_STRING, 5); +return strncmp("3.92.", GPORCA_VERSION_STRING, 5); ])], [AC_MSG_RESULT([[ok]])], -[AC_MSG_ERROR([Your ORCA version is expected to be 3.91.XXX])] +[AC_MSG_ERROR([Your ORCA version is expected to be 3.92.XXX])] ) AC_LANG_POP([C++]) ])# PGAC_CHECK_ORCA_VERSION diff --git a/configure b/configure index 1e36a0f11c31..0975b877be31 100755 --- a/configure +++ b/configure @@ -14948,7 +14948,7 @@ int main () { -return strncmp("3.91.", GPORCA_VERSION_STRING, 5); +return strncmp("3.92.", GPORCA_VERSION_STRING, 5); ; return 0; @@ -14958,7 +14958,7 @@ if ac_fn_cxx_try_run "$LINENO"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 $as_echo "ok" >&6; } else - as_fn_error $? "Your ORCA version is expected to be 3.91.XXX" "$LINENO" 5 + as_fn_error $? "Your ORCA version is expected to be 3.92.XXX" "$LINENO" 5 fi rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ diff --git a/depends/conanfile_orca.txt b/depends/conanfile_orca.txt index 2e73d47f0c89..1fe3e26e6211 100644 --- a/depends/conanfile_orca.txt +++ b/depends/conanfile_orca.txt @@ -1,5 +1,5 @@ [requires] -orca/v3.91.0@gpdb/stable +orca/v3.92.0@gpdb/stable [imports] include, * -> build/include diff --git a/src/backend/gpopt/config/CConfigParamMapping.cpp b/src/backend/gpopt/config/CConfigParamMapping.cpp index d2eda7ce5062..3203f4e5b687 100644 --- a/src/backend/gpopt/config/CConfigParamMapping.cpp +++ b/src/backend/gpopt/config/CConfigParamMapping.cpp @@ -570,6 +570,11 @@ CConfigParamMapping::PackConfigParamInBitset traceflag_bitset->ExchangeSet(GPOPT_DISABLE_XFORM_TF(CXform::ExfJoinAssociativity)); } + if (OPTIMIZER_GPDB_EXPERIMENTAL == optimizer_cost_model) + { + traceflag_bitset->ExchangeSet(EopttraceCalibratedBitmapIndexCostModel); + } + // enable nested loop index plans using nest params // instead of outer reference as in the case with GPDB 4/5 traceflag_bitset->ExchangeSet(EopttraceIndexedNLJOuterRefAsParams); diff --git a/src/backend/gpopt/utils/COptTasks.cpp b/src/backend/gpopt/utils/COptTasks.cpp index 0f77e5dd0050..29766dfe0841 100644 --- a/src/backend/gpopt/utils/COptTasks.cpp +++ b/src/backend/gpopt/utils/COptTasks.cpp @@ -468,7 +468,7 @@ COptTasks::SetCostModelParams { // change NLJ cost factor ICostModelParams::SCostParam *cost_param = NULL; - if (OPTIMIZER_GPDB_CALIBRATED == optimizer_cost_model) + if (OPTIMIZER_GPDB_CALIBRATED >= optimizer_cost_model) { cost_param = cost_model->GetCostModelParams()->PcpLookup(CCostModelParamsGPDB::EcpNLJFactor); } @@ -484,7 +484,7 @@ COptTasks::SetCostModelParams { // change sort cost factor ICostModelParams::SCostParam *cost_param = NULL; - if (OPTIMIZER_GPDB_CALIBRATED == optimizer_cost_model) + if (OPTIMIZER_GPDB_CALIBRATED >= optimizer_cost_model) { cost_param = cost_model->GetCostModelParams()->PcpLookup(CCostModelParamsGPDB::EcpSortTupWidthCostUnit); @@ -511,7 +511,7 @@ COptTasks::GetCostModel ) { ICostModel *cost_model = NULL; - if (OPTIMIZER_GPDB_CALIBRATED == optimizer_cost_model) + if (OPTIMIZER_GPDB_CALIBRATED >= optimizer_cost_model) { cost_model = GPOS_NEW(mp) CCostModelGPDB(mp, num_segments); } diff --git a/src/backend/utils/misc/guc_gp.c b/src/backend/utils/misc/guc_gp.c index c5db4724e192..8529a074e2b8 100644 --- a/src/backend/utils/misc/guc_gp.c +++ b/src/backend/utils/misc/guc_gp.c @@ -489,6 +489,7 @@ static const struct config_enum_entry optimizer_minidump_options[] = { static const struct config_enum_entry optimizer_cost_model_options[] = { {"legacy", OPTIMIZER_GPDB_LEGACY}, {"calibrated", OPTIMIZER_GPDB_CALIBRATED}, + {"experimental", OPTIMIZER_GPDB_EXPERIMENTAL}, {NULL, 0} }; @@ -4608,7 +4609,7 @@ struct config_enum ConfigureNamesEnum_gp[] = { {"optimizer_cost_model", PGC_USERSET, DEVELOPER_OPTIONS, gettext_noop("Set optimizer cost model."), - gettext_noop("Valid values are legacy, calibrated"), + gettext_noop("Valid values are legacy, calibrated, experimental"), GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE }, &optimizer_cost_model, diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index 89f6021bd5a5..dccf714e5eb7 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -414,6 +414,8 @@ extern char *data_directory; /* optimizer cost model */ #define OPTIMIZER_GPDB_LEGACY 0 /* GPDB's legacy cost model */ #define OPTIMIZER_GPDB_CALIBRATED 1 /* GPDB's calibrated cost model */ +#define OPTIMIZER_GPDB_EXPERIMENTAL 2 /* GPDB's experimental cost model */ + /* Optimizer related gucs */ extern bool optimizer; diff --git a/src/test/regress/expected/bfv_index.out b/src/test/regress/expected/bfv_index.out index 4c0d6f38e129..cef29d5d70bf 100644 --- a/src/test/regress/expected/bfv_index.out +++ b/src/test/regress/expected/bfv_index.out @@ -187,6 +187,53 @@ AND ft.id = dt1.id; Optimizer status: Postgres query optimizer (36 rows) +-- experimental cost model guc generates bitmap scan +set optimizer_cost_model=experimental; +explain SELECT count(*) +FROM bfv_tab2_facttable1 ft, bfv_tab2_dimdate dt, bfv_tab2_dimtabl1 dt1 +WHERE ft.wk_id = dt.wk_id +AND ft.id = dt1.id; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------------------- + Aggregate (cost=27.51..27.52 rows=1 width=8) + -> Gather Motion 3:1 (slice3; segments: 3) (cost=27.45..27.50 rows=1 width=8) + -> Aggregate (cost=27.45..27.46 rows=1 width=8) + -> Hash Join (cost=6.84..27.44 rows=2 width=0) + Hash Cond: (ft.wk_id = dt.wk_id) + -> Redistribute Motion 3:3 (slice2; segments: 3) (cost=3.61..24.15 rows=3 width=2) + Hash Key: ft.wk_id + -> Hash Join (cost=3.61..24.01 rows=3 width=2) + Hash Cond: (ft.id = dt1.id) + -> Append (cost=0.00..20.20 rows=7 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_dflt ft (cost=0.00..1.00 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_2 ft_1 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_3 ft_2 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_4 ft_3 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_5 ft_4 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_6 ft_5 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_7 ft_6 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_8 ft_7 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_9 ft_8 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_10 ft_9 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_11 ft_10 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_12 ft_11 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_13 ft_12 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_14 ft_13 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_15 ft_14 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_16 ft_15 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_17 ft_16 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_18 ft_17 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_19 ft_18 (cost=0.00..1.01 rows=1 width=6) + -> Seq Scan on bfv_tab2_facttable1_1_prt_20 ft_19 (cost=0.00..1.02 rows=1 width=6) + -> Hash (cost=3.35..3.35 rows=7 width=4) + -> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..3.35 rows=7 width=4) + -> Seq Scan on bfv_tab2_dimtabl1 dt1 (cost=0.00..3.07 rows=3 width=4) + -> Hash (cost=3.10..3.10 rows=4 width=2) + -> Seq Scan on bfv_tab2_dimdate dt (cost=0.00..3.10 rows=4 width=2) + Optimizer: Postgres query optimizer +(36 rows) + +reset optimizer_cost_model; -- start_ignore create language plpythonu; ERROR: language "plpythonu" already exists diff --git a/src/test/regress/expected/bfv_index_optimizer.out b/src/test/regress/expected/bfv_index_optimizer.out index 200870d8c134..744c7cd4238a 100644 --- a/src/test/regress/expected/bfv_index_optimizer.out +++ b/src/test/regress/expected/bfv_index_optimizer.out @@ -171,6 +171,37 @@ AND ft.id = dt1.id; Optimizer: Pivotal Optimizer (GPORCA) version 3.64.0 (17 rows) +-- experimental cost model guc generates bitmap scan +set optimizer_cost_model=experimental; +explain SELECT count(*) +FROM bfv_tab2_facttable1 ft, bfv_tab2_dimdate dt, bfv_tab2_dimtabl1 dt1 +WHERE ft.wk_id = dt.wk_id +AND ft.id = dt1.id; + QUERY PLAN +-------------------------------------------------------------------------------------------------------------------------------------------------- + Aggregate (cost=0.00..7.15 rows=1 width=8) + -> Gather Motion 3:1 (slice3; segments: 3) (cost=0.00..6.13 rows=4 width=1) + -> Hash Join (cost=0.00..5.13 rows=2 width=1) + Hash Cond: (bfv_tab2_dimdate.wk_id = bfv_tab2_facttable1.wk_id) + -> Seq Scan on bfv_tab2_dimdate (cost=0.00..0.01 rows=4 width=2) + -> Hash (cost=4.06..4.06 rows=3 width=2) + -> Redistribute Motion 3:3 (slice2; segments: 3) (cost=0.00..4.06 rows=3 width=2) + Hash Key: bfv_tab2_facttable1.wk_id + -> Nested Loop (cost=0.00..3.06 rows=3 width=2) + Join Filter: true + -> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..1.04 rows=7 width=4) + -> Seq Scan on bfv_tab2_dimtabl1 (cost=0.00..0.01 rows=3 width=4) + -> Sequence (cost=0.00..0.00 rows=1 width=2) + -> Partition Selector for bfv_tab2_facttable1 (dynamic scan id: 1) (cost=10.00..100.00 rows=34 width=4) + Partitions selected: 20 (out of 20) + -> Dynamic Bitmap Heap Scan on bfv_tab2_facttable1 (dynamic scan id: 1) (cost=0.00..0.00 rows=1 width=2) + Recheck Cond: (id = bfv_tab2_dimtabl1.id) + -> Dynamic Bitmap Index Scan on idx_bfv_tab2_facttable1 (cost=0.00..0.00 rows=0 width=0) + Index Cond: (id = bfv_tab2_dimtabl1.id) + Optimizer: Pivotal Optimizer (GPORCA) version 3.90.0 +(20 rows) + +reset optimizer_cost_model; -- start_ignore create language plpythonu; -- end_ignore diff --git a/src/test/regress/sql/bfv_index.sql b/src/test/regress/sql/bfv_index.sql index b1bebe835de0..1a17341bdbb7 100644 --- a/src/test/regress/sql/bfv_index.sql +++ b/src/test/regress/sql/bfv_index.sql @@ -88,6 +88,14 @@ FROM bfv_tab2_facttable1 ft, bfv_tab2_dimdate dt, bfv_tab2_dimtabl1 dt1 WHERE ft.wk_id = dt.wk_id AND ft.id = dt1.id; +-- experimental cost model guc generates bitmap scan +set optimizer_cost_model=experimental; +explain SELECT count(*) +FROM bfv_tab2_facttable1 ft, bfv_tab2_dimdate dt, bfv_tab2_dimtabl1 dt1 +WHERE ft.wk_id = dt.wk_id +AND ft.id = dt1.id; + +reset optimizer_cost_model; -- start_ignore create language plpythonu; -- end_ignore From 6fc901387ff8a61e1dbf2644132ee451a1473d58 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Fri, 14 Feb 2020 19:24:44 -0800 Subject: [PATCH 024/102] Drop partdisttest table as can't upgrade heterogeneous partitions --- contrib/pg_upgrade/test_gpdb_pre.sql | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/pg_upgrade/test_gpdb_pre.sql b/contrib/pg_upgrade/test_gpdb_pre.sql index abba850ebbea..1b8cb32665d0 100644 --- a/contrib/pg_upgrade/test_gpdb_pre.sql +++ b/contrib/pg_upgrade/test_gpdb_pre.sql @@ -48,3 +48,4 @@ DROP TABLE IF EXISTS public.returning_parttab; DROP TABLE IF EXISTS public.parttest_t; DROP TABLE IF EXISTS public.pt_dropped_col_distkey; DROP TABLE IF EXISTS partition_pruning.sales; +DROP TABLE IF EXISTS public.partdisttest; From bc4d33a2b516741bf0bc25155d46f1ebfb0dd763 Mon Sep 17 00:00:00 2001 From: Weinan WANG Date: Mon, 17 Feb 2020 04:53:10 +0800 Subject: [PATCH 025/102] [6X backport] Revert create pathkey in `convert_subquery_pathkeys` In upstream, it does not create a new pathkey in convert_subquery_pathkeys function. It also raises an issue in gpdb, so revert it. cherry-pick from: 0138eed43680ea8c5c8d1529e29063bb17a4c63e --- src/backend/optimizer/path/pathkeys.c | 23 +------------------ src/test/regress/expected/gp_create_view.out | 4 ++-- .../regress/expected/qp_union_intersect.out | 15 ++++++++++++ .../expected/qp_union_intersect_optimizer.out | 15 ++++++++++++ src/test/regress/sql/gp_create_view.sql | 4 ++-- src/test/regress/sql/qp_union_intersect.sql | 13 +++++++++++ 6 files changed, 48 insertions(+), 26 deletions(-) diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index fde9b5067357..f1c63497cb76 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -1068,27 +1068,6 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, tle); /* See if we have a matching EC for that */ - /* - * In GPDB, we pass create_it = 'true', because even if the - * sub-pathkey doesn't seem interesting to the parent, we - * want to preserve the ordering if the result is gathered - * to a single node later on. This case comes up, if you - * e.g. create a view with an ORDER BY: - * - * CREATE VIEW v AS SELECT * FROM sourcetable ORDER BY vn; - * - * and query it: - * - * SELECT row_number() OVER(), vn FROM v_sourcetable; - * - * Although it's not required by the SQL standard, we try - * to preserve the PostgreSQL behaviour, and honor the - * ORDER BY. The parent query doesn't have an equivalence - * class for the path key (vn), but if we don't pass it - * up to the parent, it will not preserve the order when - * it adds the Gather Motion to pull together the rows, - * underneath the WindowAgg. - */ outer_ec = get_eclass_for_sort_expr(root, outer_expr, NULL, @@ -1097,7 +1076,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, sub_expr_coll, 0, rel->relids, - true); /* create_it */ + false); /* create_it */ /* * If we don't find a matching EC, this sub-pathkey isn't diff --git a/src/test/regress/expected/gp_create_view.out b/src/test/regress/expected/gp_create_view.out index 5dd7fd72dce0..cf58ebe245d8 100644 --- a/src/test/regress/expected/gp_create_view.out +++ b/src/test/regress/expected/gp_create_view.out @@ -37,7 +37,7 @@ insert into sourcetable values -- Check that the rows come out in order, if there's an ORDER BY in -- the view definition. create view v_sourcetable as select * from sourcetable order by vn; -select row_number() over(), * from v_sourcetable; +select row_number() over(), * from v_sourcetable order by vn; row_number | cn | vn | pn | dt | qty | prc ------------+----+----+-----+------------+------+------ 1 | 1 | 10 | 200 | 03-01-1401 | 10 | 0 @@ -55,7 +55,7 @@ select row_number() over(), * from v_sourcetable; (12 rows) create view v_sourcetable1 as SELECT sourcetable.qty, vn, pn FROM sourcetable union select sourcetable.qty, sourcetable.vn, sourcetable.pn from sourcetable order by qty; -select row_number() over(), * from v_sourcetable1; +select row_number() over(), * from v_sourcetable1 order by qty; row_number | qty | vn | pn ------------+------+----+----- 1 | 1 | 50 | 400 diff --git a/src/test/regress/expected/qp_union_intersect.out b/src/test/regress/expected/qp_union_intersect.out index 353c01be45b9..a7a7b6adc740 100644 --- a/src/test/regress/expected/qp_union_intersect.out +++ b/src/test/regress/expected/qp_union_intersect.out @@ -1844,3 +1844,18 @@ order by 1,2; Execution time: 2.496 ms (23 rows) +CREATE TABLE t1(c1 int, c2 int, c3 int); +CREATE TABLE t2(c1 int, c2 int, c3 int); +INSERT INTO t1 SELECT i, i ,i + 1 FROM generate_series(1,10) i; +INSERT INTO t2 SELECT i, i ,i + 1 FROM generate_series(1,10) i; +SET enable_hashagg = off; +with tcte(c1, c2, c3) as ( + SELECT c1, sum(c2) as c2, c3 FROM t1 WHERE c3 > 0 GROUP BY c1, c3 + UNION ALL + SELECT c1, sum(c2) as c2, c3 FROM t2 WHERE c3 < 0 GROUP BY c1, c3 +) +SELECT * FROM tcte WHERE c3 = 1; + c1 | c2 | c3 +----+----+---- +(0 rows) + diff --git a/src/test/regress/expected/qp_union_intersect_optimizer.out b/src/test/regress/expected/qp_union_intersect_optimizer.out index f7e7356a5a26..9e9a77a2ef92 100644 --- a/src/test/regress/expected/qp_union_intersect_optimizer.out +++ b/src/test/regress/expected/qp_union_intersect_optimizer.out @@ -1869,3 +1869,18 @@ order by 1,2; Execution time: 3.142 ms (26 rows) +CREATE TABLE t1(c1 int, c2 int, c3 int); +CREATE TABLE t2(c1 int, c2 int, c3 int); +INSERT INTO t1 SELECT i, i ,i + 1 FROM generate_series(1,10) i; +INSERT INTO t2 SELECT i, i ,i + 1 FROM generate_series(1,10) i; +SET enable_hashagg = off; +with tcte(c1, c2, c3) as ( + SELECT c1, sum(c2) as c2, c3 FROM t1 WHERE c3 > 0 GROUP BY c1, c3 + UNION ALL + SELECT c1, sum(c2) as c2, c3 FROM t2 WHERE c3 < 0 GROUP BY c1, c3 +) +SELECT * FROM tcte WHERE c3 = 1; + c1 | c2 | c3 +----+----+---- +(0 rows) + diff --git a/src/test/regress/sql/gp_create_view.sql b/src/test/regress/sql/gp_create_view.sql index 29b705377f17..c297bddf410e 100644 --- a/src/test/regress/sql/gp_create_view.sql +++ b/src/test/regress/sql/gp_create_view.sql @@ -37,10 +37,10 @@ insert into sourcetable values -- Check that the rows come out in order, if there's an ORDER BY in -- the view definition. create view v_sourcetable as select * from sourcetable order by vn; -select row_number() over(), * from v_sourcetable; +select row_number() over(), * from v_sourcetable order by vn; create view v_sourcetable1 as SELECT sourcetable.qty, vn, pn FROM sourcetable union select sourcetable.qty, sourcetable.vn, sourcetable.pn from sourcetable order by qty; -select row_number() over(), * from v_sourcetable1; +select row_number() over(), * from v_sourcetable1 order by qty; -- Check that the row-comparison operator is serialized and deserialized diff --git a/src/test/regress/sql/qp_union_intersect.sql b/src/test/regress/sql/qp_union_intersect.sql index a0fef943ebed..850b0d8eb8ee 100644 --- a/src/test/regress/sql/qp_union_intersect.sql +++ b/src/test/regress/sql/qp_union_intersect.sql @@ -690,3 +690,16 @@ explain analyze select a, b, array_dims(array_agg(x)) from mergeappend_test r gr union all select null, null, array_dims(array_agg(x)) FROM mergeappend_test r order by 1,2; + +CREATE TABLE t1(c1 int, c2 int, c3 int); +CREATE TABLE t2(c1 int, c2 int, c3 int); +INSERT INTO t1 SELECT i, i ,i + 1 FROM generate_series(1,10) i; +INSERT INTO t2 SELECT i, i ,i + 1 FROM generate_series(1,10) i; +SET enable_hashagg = off; +with tcte(c1, c2, c3) as ( + SELECT c1, sum(c2) as c2, c3 FROM t1 WHERE c3 > 0 GROUP BY c1, c3 + UNION ALL + SELECT c1, sum(c2) as c2, c3 FROM t2 WHERE c3 < 0 GROUP BY c1, c3 +) +SELECT * FROM tcte WHERE c3 = 1; + From f84dfa7f20a4b5e9a29ea43db3be8ec64b921f94 Mon Sep 17 00:00:00 2001 From: Wang Hao Date: Tue, 18 Feb 2020 14:24:18 +0800 Subject: [PATCH 026/102] Revert function name of getDtxStartTime (#9585) Commit d2f1b48d76483d473c968d65b319e6a7914567fb renamed the public function getDtxStartTime() to getDtmStartTime(). This change breaks ABI compatiblity of metrics_collector on 6X_STABLE. There is no problem to keep that change on master. --- src/backend/cdb/cdbdtxcontextinfo.c | 2 +- src/backend/cdb/cdbtm.c | 4 ++-- src/backend/utils/gpmon/gpmon.c | 2 +- src/include/cdb/cdbtm.h | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/backend/cdb/cdbdtxcontextinfo.c b/src/backend/cdb/cdbdtxcontextinfo.c index 4a38616929c7..3e0dda08c240 100644 --- a/src/backend/cdb/cdbdtxcontextinfo.c +++ b/src/backend/cdb/cdbdtxcontextinfo.c @@ -49,7 +49,7 @@ DtxContextInfo_CreateOnMaster(DtxContextInfo *dtxContextInfo, bool inCursor, dtxContextInfo->distributedXid = getDistributedTransactionId(); if (dtxContextInfo->distributedXid != InvalidDistributedTransactionId) { - dtxContextInfo->distributedTimeStamp = getDtmStartTime(); + dtxContextInfo->distributedTimeStamp = getDtxStartTime(); getDistributedTransactionIdentifier(dtxContextInfo->distributedId); dtxContextInfo->curcid = curcid; diff --git a/src/backend/cdb/cdbtm.c b/src/backend/cdb/cdbtm.c index 85d502834240..8eb0a2328ba4 100644 --- a/src/backend/cdb/cdbtm.c +++ b/src/backend/cdb/cdbtm.c @@ -166,7 +166,7 @@ isDtxContext(void) */ DistributedTransactionTimeStamp -getDtmStartTime(void) +getDtxStartTime(void) { if (shmDistribTimeStamp != NULL) return *shmDistribTimeStamp; @@ -265,7 +265,7 @@ currentDtxActivate(void) (errmsg("reached the limit of %u global transactions per start", LastDistributedTransactionId))); - MyTmGxact->distribTimeStamp = getDtmStartTime(); + MyTmGxact->distribTimeStamp = getDtxStartTime(); MyTmGxact->sessionId = gp_session_id; setCurrentDtxState(DTX_STATE_ACTIVE_DISTRIBUTED); } diff --git a/src/backend/utils/gpmon/gpmon.c b/src/backend/utils/gpmon/gpmon.c index 7c7bb4dca39e..f31899c5433c 100644 --- a/src/backend/utils/gpmon/gpmon.c +++ b/src/backend/utils/gpmon/gpmon.c @@ -164,7 +164,7 @@ void gpmon_gettmid(int32* tmid) *tmid = (int32)QEDtxContextInfo.distributedSnapshot.distribTransactionTimeStamp; else /* On QD */ - *tmid = (int32)getDtmStartTime(); + *tmid = (int32)getDtxStartTime(); } diff --git a/src/include/cdb/cdbtm.h b/src/include/cdb/cdbtm.h index 7a2a52d5dc95..d5345e7b9c7d 100644 --- a/src/include/cdb/cdbtm.h +++ b/src/include/cdb/cdbtm.h @@ -275,7 +275,7 @@ extern volatile int *shmNumCommittedGxacts; extern char *DtxStateToString(DtxState state); extern char *DtxProtocolCommandToString(DtxProtocolCommand command); extern char *DtxContextToString(DtxContext context); -extern DistributedTransactionTimeStamp getDtmStartTime(void); +extern DistributedTransactionTimeStamp getDtxStartTime(void); extern void dtxCrackOpenGid(const char *gid, DistributedTransactionTimeStamp *distribTimeStamp, DistributedTransactionId *distribXid); From 78d4e6bb82cde6ca66cdbc544d859d99224924be Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Tue, 18 Feb 2020 10:56:00 -0800 Subject: [PATCH 027/102] Avoid ifaddrs utility crash `ifa_addr` may be null for interface returned by getifaddrs(). Hence, checking for the same should be perfomed, else ifaddrs crashes. As side effect to this crashing, on my ubuntu laptop gpinitstandby always fails. Interface for which `getifaddrs()` returned null for me is: gpd0: flags=4240 mtu 1500 unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 500 (UNSPEC) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 (gdb) p *list $5 = {ifa_next = 0x5555555586a8, ifa_name = 0x555555558694 "gpd0", ifa_flags = 4240, ifa_addr = 0x0, ifa_netmask = 0x0, ifa_ifu = {ifu_broadaddr = 0x0, ifu_dstaddr = 0x0}, ifa_data = 0x555555558bb8} Reviewed-by: Jacob Champion Reviewed-by: Mark Sliva --- gpMgmt/bin/ifaddrs/main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gpMgmt/bin/ifaddrs/main.c b/gpMgmt/bin/ifaddrs/main.c index 7fe7f19062ec..287dcc924245 100644 --- a/gpMgmt/bin/ifaddrs/main.c +++ b/gpMgmt/bin/ifaddrs/main.c @@ -66,6 +66,9 @@ int main(int argc, char *argv[]) continue; } + if (addr == NULL) + continue; + switch (addr->sa_family) { case AF_INET: From bf7bc9e938c8db76d4e4e3b83852059f21aec0a2 Mon Sep 17 00:00:00 2001 From: Haozhou Wang Date: Wed, 19 Feb 2020 10:02:20 +0800 Subject: [PATCH 028/102] Fix dependencies issue in GPPKG utility (6X_STABLE) (#9494) 1. When two gppkg packages have the same dependencies, gppkg utility will refuse to install the second gppkg package and throw an error. This patch fixes this issue and the second gppkg package can install successfully. 2. Fix install/uninstall issue if the master and standby master use the same node address. PS: This patch is backported from the master branch --- gpMgmt/bin/gppylib/operations/package.py | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/gpMgmt/bin/gppylib/operations/package.py b/gpMgmt/bin/gppylib/operations/package.py index c504be019998..39d790e493ae 100644 --- a/gpMgmt/bin/gppylib/operations/package.py +++ b/gpMgmt/bin/gppylib/operations/package.py @@ -593,6 +593,7 @@ def execute(self): try: cmd.run(validateAfter=True) except ExecutionError, e: + already_install = False lines = e.cmd.get_results().stderr.splitlines() # Forking between code paths 2 and 3 depends on some meaningful stderr @@ -619,12 +620,22 @@ def execute(self): # package postgis-1.0-1.x86_64 is already installed for line in lines: if 'already installed' in line.lower(): - package_name = line.split()[1] + # if installed version is newer than currently, we use old version name + if 'newer than' in line.lower(): + # example: package json-c-0.12-1.x86_64 (which is newer than json-c-0.11-1.x86_64) is already installed + package_name = line.split()[6].replace(')','') + else: + package_name = line.split()[1] rpm_name = "%s.rpm" % package_name rpm_set.remove(rpm_name) + already_install = True + elif 'conflicts with file' in line.lower(): + # if the library file(s) is(are) the same as installed dependencies, we skip it and use the installed dependencies + already_install = True else: # This is unexpected, so bubble up the ExecutionError. - raise + if already_install is not True: + raise # MPP-14359 - installation and uninstallation prechecks must also consider # the archive. That is, if a partial installation had added all rpms @@ -780,7 +791,8 @@ def resolve_shared_dependencies(self, rpm_set, dependency_lines): cmd = Command('Discerning culprit rpms for %s' % violated_capability, 'rpm -q --whatprovides %s --dbpath %s' % (violated_capability, RPM_DATABASE)) cmd.run(validateAfter=True) - culprit_rpms = set(cmd.get_results().stdout.splitlines()) + # remove the .x86_64 suffix for each rpm package to match the name in rpm_set + culprit_rpms = set(dep.replace('.x86_64', '') for dep in cmd.get_results().stdout.splitlines()) rpm_set -= culprit_rpms @@ -898,7 +910,7 @@ def execute(self): if self.is_update: rpm_install_command = 'rpm -U --force %s --dbpath %s --prefix=%s' else: - rpm_install_command = 'rpm -i %s --dbpath %s --prefix=%s' + rpm_install_command = 'rpm -i --force %s --dbpath %s --prefix=%s' rpm_install_command = rpm_install_command % \ (" ".join([os.path.join(TEMP_EXTRACTION_PATH, rpm) for rpm in rpm_set]), RPM_DATABASE, @@ -1054,7 +1066,7 @@ def __init__(self, gppkg, master_host, standby_host, segment_host_list): if master_host != standby_host: self.standby_host = standby_host else: - self.standby_host = [] + self.standby_host = None self.segment_host_list = segment_host_list def execute(self): From d34adf5e3e900de08790fa468c451dc73bab483f Mon Sep 17 00:00:00 2001 From: "Huiliang.liu" Date: Wed, 19 Feb 2020 17:29:02 +0800 Subject: [PATCH 029/102] Enhance gpfdist error log (#9230) (#9587) Print errno and message if local_send() fails. Print detail information on session end --- src/bin/gpfdist/gpfdist.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/bin/gpfdist/gpfdist.c b/src/bin/gpfdist/gpfdist.c index 836ec6d0e489..a208eafe54ac 100644 --- a/src/bin/gpfdist/gpfdist.c +++ b/src/bin/gpfdist/gpfdist.c @@ -1178,7 +1178,11 @@ static int local_send(request_t *r, const char* buf, int buflen) if (r->session) session_end(r->session, 0); } else { - gdebug(r, "gpfdist_send failed - due to (%d: %s)", e, strerror(e)); + if (!ok) { + gwarning(r, "gpfdist_send failed - due to (%d: %s)", e, strerror(e)); + } else { + gdebug(r, "gpfdist_send failed - due to (%d: %s), should try again", e, strerror(e)); + } } return ok ? 0 : -1; } @@ -1344,13 +1348,14 @@ session_get_block(const request_t* r, block_t* retblock, char* line_delim_str, i /* finish the session - close the file */ static void session_end(session_t* session, int error) { - gprintln(NULL, "session end."); + gprintln(NULL, "session end. id = %ld, is_error = %d, error = %d", session->id, session->is_error, error); if (error) session->is_error = error; if (session->fstream) { + gprintln(NULL, "close fstream"); fstream_close(session->fstream); session->fstream = 0; } @@ -1772,6 +1777,10 @@ static void do_write(int fd, short event, void* arg) n = local_send(r, datablock->hdr.hbyte + datablock->hdr.hbot, n); if (n < 0) { + /* + * TODO: It is not safe to check errno here, should check and + * return special value in local_send() + */ if (errno == EPIPE || errno == ECONNRESET) r->outblock.bot = r->outblock.top; request_end(r, 1, "gpfdist send block header failure"); From c79eefcebe8231f8717bd252cdbc83037b4e42ca Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Thu, 20 Feb 2020 09:46:16 -0800 Subject: [PATCH 030/102] Bugfix: rows might be split into wrong partitions split_rows() scans tuples from T and route them to new parts (A, B) based on A's or B's constraints. If T has one or more dropped columns before its partition key, T's partition key would have a different attribute number from its new parts. In this case, the constraints choose a wrong column which can cause bad behaviors. To fix it, each tuple iteration should reconstruct the partition tuple slot and assign it to econtext before ExecQual calls. The reconstruction process can happen once or twice because we assume A, B might have two different tupdescs. One bad behavior, rows are split into wrong partitions. Reproduce: ```sql DROP TABLE IF EXISTS users_test; CREATE TABLE users_test ( id INT, dd TEXT, user_name VARCHAR(40), user_email VARCHAR(60), born_time TIMESTAMP, create_time TIMESTAMP ) DISTRIBUTED BY (id) PARTITION BY RANGE (create_time) ( PARTITION p2019 START ('2019-01-01'::TIMESTAMP) END ('2020-01-01'::TIMESTAMP), DEFAULT PARTITION extra ); /* Drop useless column dd for some reason */ ALTER TABLE users_test DROP COLUMN dd; /* Forgot/Failed to split out new partitions beforehand */ INSERT INTO users_test VALUES(1, 'A', 'A@abc.com', '1970-01-01', '2020-01-01 12:00:00'); INSERT INTO users_test VALUES(2, 'B', 'B@abc.com', '1980-01-01', '2020-01-02 18:00:00'); INSERT INTO users_test VALUES(3, 'C', 'C@abc.com', '1990-01-01', '2020-01-03 08:00:00'); /* New partition arrives late */ ALTER TABLE users_test SPLIT DEFAULT PARTITION START ('2020-01-01'::TIMESTAMP) END ('2021-01-01'::TIMESTAMP) INTO (PARTITION p2020, DEFAULT PARTITION); /* * - How many new users already in 2020? * - Wow, no one. */ SELECT count(1) FROM users_test_1_prt_p2020; ``` Reviewed-by: Georgios Kokolatos Reviewed-by: Heikki Linnakangas (cherry picked from commit 101922f1540bc670ee1756e26612bfe0f4edb299) Co-authored-by: Ashwin Agrawal --- contrib/pg_upgrade/test_gpdb_pre.sql | 1 + src/backend/commands/tablecmds.c | 26 ++++++----- src/test/regress/expected/partition1.out | 56 ++++++++++++++++++++++++ src/test/regress/sql/partition1.sql | 40 +++++++++++++++++ 4 files changed, 113 insertions(+), 10 deletions(-) diff --git a/contrib/pg_upgrade/test_gpdb_pre.sql b/contrib/pg_upgrade/test_gpdb_pre.sql index 1b8cb32665d0..3c2e5701f8c3 100644 --- a/contrib/pg_upgrade/test_gpdb_pre.sql +++ b/contrib/pg_upgrade/test_gpdb_pre.sql @@ -49,3 +49,4 @@ DROP TABLE IF EXISTS public.parttest_t; DROP TABLE IF EXISTS public.pt_dropped_col_distkey; DROP TABLE IF EXISTS partition_pruning.sales; DROP TABLE IF EXISTS public.partdisttest; +DROP TABLE IF EXISTS public.users_test; diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 287c3be495e4..7164ff437bd9 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -17458,19 +17458,31 @@ split_rows(Relation intoa, Relation intob, Relation temprel) break; } - /* prepare for ExecQual */ - econtext->ecxt_scantuple = slotT; + /* + * Map attributes from origin to target. We should consider dropped + * columns in the origin. + * + * ExecQual should use targetSlot rather than slotT in case possible + * partition key mapping. + */ + AssertImply(!PointerIsValid(achk), PointerIsValid(bchk)); + targetSlot = reconstructMatchingTupleSlot(slotT, achk ? rria : rrib); + econtext->ecxt_scantuple = targetSlot; /* determine if we are inserting into a or b */ if (achk) { targetIsA = ExecQual((List *)achk, econtext, false); + + if (!targetIsA) + targetSlot = reconstructMatchingTupleSlot(slotT, rrib); } else { - Assert(PointerIsValid(bchk)); - targetIsA = !ExecQual((List *)bchk, econtext, false); + + if (targetIsA) + targetSlot = reconstructMatchingTupleSlot(slotT, rria); } /* load variables for the specific target */ @@ -17489,12 +17501,6 @@ split_rows(Relation intoa, Relation intob, Relation temprel) targetRelInfo = rrib; } - /* - * Map attributes from origin to target. We should consider dropped - * columns in the origin. - */ - targetSlot = reconstructMatchingTupleSlot(slotT, targetRelInfo); - /* insert into the target table */ if (RelationIsHeap(targetRelation)) { diff --git a/src/test/regress/expected/partition1.out b/src/test/regress/expected/partition1.out index 7f0e5d8f9a44..f2b0bd83772f 100644 --- a/src/test/regress/expected/partition1.out +++ b/src/test/regress/expected/partition1.out @@ -2840,3 +2840,59 @@ reset session authorization; DROP TABLE part_expr_test_range; DROP TABLE part_expr_test_list; DROP ROLE part_expr_role; +-- +-- Test handling of dropped columns in SPLIT PARTITION. (PR #9386) +-- +DROP TABLE IF EXISTS users_test; +NOTICE: table "users_test" does not exist, skipping +CREATE TABLE users_test +( + id INT, + dd TEXT, + user_name VARCHAR(40), + user_email VARCHAR(60), + born_time TIMESTAMP, + create_time TIMESTAMP +) +DISTRIBUTED BY (id) +PARTITION BY RANGE (create_time) +( + PARTITION p2019 START ('2019-01-01'::TIMESTAMP) END ('2020-01-01'::TIMESTAMP), + DEFAULT PARTITION extra +); +NOTICE: CREATE TABLE will create partition "users_test_1_prt_extra" for table "users_test" +NOTICE: CREATE TABLE will create partition "users_test_1_prt_p2019" for table "users_test" +-- Drop useless column dd for some reason +ALTER TABLE users_test DROP COLUMN dd; +-- Assume we forgot/failed to split out new partitions beforehand +INSERT INTO users_test VALUES(1, 'A', 'A@abc.com', '1970-01-01', '2019-01-01 12:00:00'); +INSERT INTO users_test VALUES(2, 'B', 'B@abc.com', '1980-01-01', '2020-01-01 12:00:00'); +INSERT INTO users_test VALUES(3, 'C', 'C@abc.com', '1990-01-01', '2021-01-01 12:00:00'); +-- New partition arrives late +ALTER TABLE users_test SPLIT DEFAULT PARTITION START ('2020-01-01'::TIMESTAMP) END ('2021-01-01'::TIMESTAMP) + INTO (PARTITION p2020, DEFAULT PARTITION); +NOTICE: exchanged partition "extra" of relation "users_test" with relation "pg_temp_114588" +NOTICE: dropped partition "extra" for relation "users_test" +NOTICE: CREATE TABLE will create partition "users_test_1_prt_p2020" for table "users_test" +NOTICE: CREATE TABLE will create partition "users_test_1_prt_extra" for table "users_test" +-- Expect A +SELECT user_name FROM users_test_1_prt_p2019; + user_name +----------- + A +(1 row) + +-- Expect B +SELECT user_name FROM users_test_1_prt_p2020; + user_name +----------- + B +(1 row) + +-- Expect C +SELECT user_name FROM users_test_1_prt_extra; + user_name +----------- + C +(1 row) + diff --git a/src/test/regress/sql/partition1.sql b/src/test/regress/sql/partition1.sql index 2ce2ea95814c..3776d1f8b0f3 100644 --- a/src/test/regress/sql/partition1.sql +++ b/src/test/regress/sql/partition1.sql @@ -1447,3 +1447,43 @@ reset session authorization; DROP TABLE part_expr_test_range; DROP TABLE part_expr_test_list; DROP ROLE part_expr_role; + +-- +-- Test handling of dropped columns in SPLIT PARTITION. (PR #9386) +-- +DROP TABLE IF EXISTS users_test; + +CREATE TABLE users_test +( + id INT, + dd TEXT, + user_name VARCHAR(40), + user_email VARCHAR(60), + born_time TIMESTAMP, + create_time TIMESTAMP +) +DISTRIBUTED BY (id) +PARTITION BY RANGE (create_time) +( + PARTITION p2019 START ('2019-01-01'::TIMESTAMP) END ('2020-01-01'::TIMESTAMP), + DEFAULT PARTITION extra +); + +-- Drop useless column dd for some reason +ALTER TABLE users_test DROP COLUMN dd; + +-- Assume we forgot/failed to split out new partitions beforehand +INSERT INTO users_test VALUES(1, 'A', 'A@abc.com', '1970-01-01', '2019-01-01 12:00:00'); +INSERT INTO users_test VALUES(2, 'B', 'B@abc.com', '1980-01-01', '2020-01-01 12:00:00'); +INSERT INTO users_test VALUES(3, 'C', 'C@abc.com', '1990-01-01', '2021-01-01 12:00:00'); + +-- New partition arrives late +ALTER TABLE users_test SPLIT DEFAULT PARTITION START ('2020-01-01'::TIMESTAMP) END ('2021-01-01'::TIMESTAMP) + INTO (PARTITION p2020, DEFAULT PARTITION); + +-- Expect A +SELECT user_name FROM users_test_1_prt_p2019; +-- Expect B +SELECT user_name FROM users_test_1_prt_p2020; +-- Expect C +SELECT user_name FROM users_test_1_prt_extra; From 311e320ef108fa2c53d94fcb6ceb074086135440 Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Fri, 21 Feb 2020 16:07:49 -0800 Subject: [PATCH 031/102] docs - update backup/restore docs (#9602) Synch. docs with backup/restore 1.17 --- .../managing/backup-gpbackup-incremental.xml | 7 +- .../admin_guide/managing/backup-gpbackup.xml | 40 +++-- gpdb-doc/dita/utility_guide/ref/gpbackup.xml | 82 +++++++--- gpdb-doc/dita/utility_guide/ref/gprestore.xml | 150 ++++++++++++++---- 4 files changed, 210 insertions(+), 69 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/managing/backup-gpbackup-incremental.xml b/gpdb-doc/dita/admin_guide/managing/backup-gpbackup-incremental.xml index 07ca343f0531..14a28cf9f2f3 100644 --- a/gpdb-doc/dita/admin_guide/managing/backup-gpbackup-incremental.xml +++ b/gpdb-doc/dita/admin_guide/managing/backup-gpbackup-incremental.xml @@ -1,9 +1,8 @@ - - Creating Incremental Backups with gpbackup and gprestore + Creating and Using Incremental Backups with gpbackup and gprestore

    The gpbackup and gprestore utilities support creating incremental backups of append-optimized tables and restoring from incremental backups. An @@ -108,9 +107,7 @@

    If you try to add an incremental backup to a backup set, the backup operation fails if the gpbackup options are not consistent.

    For information about the gpbackup and gprestore utility - options, see gpbackup - and gprestore in the Greenplum Database Utility Guide.

    + options, see the gpbackup and gprestore reference documentation.

    Example Using Incremental Backup Sets

    Each backup has a timestamp taken when the backup is created. For example, if you create diff --git a/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml b/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml index 9f1c2e9fbc2d..6e44f338f976 100644 --- a/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml +++ b/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml @@ -52,6 +52,20 @@

  • You can execute multiple instances of gpbackup, but each execution requires a distinct timestamp.
  • Database object filtering is currently limited to schemas and tables.
  • +
  • When backing up a partitioned table where some or all leaf partitions are in different + schemas from the root partition, the leaf partition table definitions, including the + schemas, are backed up as metadata. This occurs even if the backup operation specifies + that schemas that contain the leaf partitions should be excluded. To control data being + backed up for this type of partitioned table in this situation, use the + --leaf-partition-data option.
      +
    • If the --leaf-partition-data option is not specified, the leaf + partition data is also backed up even if the backup operation specifies that the + leaf partition schemas should excluded.
    • +
    • If the --leaf-partition-data option is specified, the leaf + partition data is not be backed up if the backup operation specifies that the leaf + partition schemas should excluded. Only the metadata for leaf partition tables are + backed up.
    • +
  • If you use the gpbackup --single-data-file option to combine table backups into a single file per segment, you cannot perform a parallel restore operation with gprestore (cannot set --jobs to a value higher @@ -347,16 +361,24 @@ Restore Status: Success
  • gpbackup backs up all schemas and tables in the specified database, unless you exclude or include individual schema or table objects with schema level or table level filter options.

    -

    The schema level options are --include-schema or - --exclude-schema command-line options to gpbackup. For - example, if the "demo" database includes only two schemas, "wikipedia" and "twitter," both - of the following commands back up only the "wikipedia" +

    The schema level options are --include-schema, + --include-schema-file, or --exclude-schema, + --exclude-schema-file command-line options to gpbackup. + For example, if the "demo" database includes only two schemas, "wikipedia" and "twitter," + both of the following commands back up only the "wikipedia" schema:$ gpbackup --dbname demo --include-schema wikipedia $ gpbackup --dbname demo --exclude-schema twitter

    You can include multiple --include-schema options in a gpbackup or multiple --exclude-schema options. For example:$ gpbackup --dbname demo --include-schema wikipedia --include-schema twitter

    +

    If you have a large number of schemas, you can list the schemas in a text file and specify + the file with the --include-schema-file or + --exclude-schema-file options in a gpbackup command. + Each line in the file must define a single schema, and the file cannot contain trailing + lines. For example, this command uses a file in the gpadmin home directory + to include a set of + schemas.gpbackup --dbname demo --include-schema-file /users/home/gpadmin/backup-schemas

    To filter the individual tables that are included in a backup set, or excluded from a backup set, specify individual tables with the --include-table option or the --exclude-table option. The table must be schema qualified, @@ -484,8 +506,8 @@ public.sales_1_prt_dec17

    Then a back up or restore operation completes.

    To have gpbackup or gprestore send out status email notifications, you must place a file named gp_email_contacts.yaml in the - home directory of the user running gpbackup or gprestore in - the same directory as the utilities ($GPHOME/bin). A utility issues a + home directory of the user running gpbackup or gprestore + in the same directory as the utilities ($GPHOME/bin). A utility issues a message if it cannot locate a gp_email_contacts.yaml file in either location. If both locations contain a .yaml file, the utility uses the file in user $HOME.

    @@ -499,9 +521,9 @@ public.sales_1_prt_dec17

    Then number of objects backed up or restored. For information about the contents of a notification email, see .

    The UNIX mail utility must be running on the Greenplum Database host and must be - configured to allow the Greenplum superuser (gpadmin) to send email. - Also ensure that the mail program executable is locatable via the - gpadmin user's $PATH. + configured to allow the Greenplum superuser (gpadmin) to send email. Also + ensure that the mail program executable is locatable via the gpadmin user's + $PATH. gpbackup and gprestore Email File Format diff --git a/gpdb-doc/dita/utility_guide/ref/gpbackup.xml b/gpdb-doc/dita/utility_guide/ref/gpbackup.xml index 5110fb5f63dc..73763af57814 100644 --- a/gpdb-doc/dita/utility_guide/ref/gpbackup.xml +++ b/gpdb-doc/dita/utility_guide/ref/gpbackup.xml @@ -11,11 +11,13 @@ [--compression-level level] [--data-only] [--debug] - [--exclude-schema schema_name] - [--exclude-table schema.table] + [--exclude-schema schema_name [--exclude-schema schema_name ...]] + [--exclude-table schema.table [--exclude-table schema.table ...]] + [--exclude-schema-file file_name] [--exclude-table-file file_name] - [--include-schema schema_name] - [--include-table schema.table] + [--include-schema schema_name [--include-schema schema_name ...]] + [--include-table schema.table [--include-table schema.table ...]] + [--include-schema-file file_name] [--include-table-file file_name] [--incremental [--from-timestamp backup-timestamp]] [--jobs int] @@ -44,6 +46,8 @@ --with-globals option with gprestore to restore global objects. See for additional information.

    +

    For materialized views, data is not backed up, only the materialized view definition is + backed up.

    gpbackup stores the object metadata files and DDL files for a backup in the Greenplum Database master data directory by default. Greenplum Database segments use the COPY ... ON SEGMENT command to store their data for backed-up tables in @@ -127,10 +131,29 @@ schema_name Optional. Specifies a database schema to exclude from the backup. You can specify this option multiple times to exclude multiple schemas. You cannot combine this option with - the option --include-schema, or a table filtering option such as - --include-table. See for more - information. + the option --include-schema, --include-schema-file, or + a table filtering option such as --include-table. + See + for more information. + See + for limitations when leaf partitions of a partitioned table are in different schemas + from the root partition. + + + --exclude-schema-file + file_name + Optional. Specifies a text file containing a list of schemas to exclude from the + backup. Each line in the text file must define a single schema. The file must not + include trailing lines. If a schema name uses any character other than a lowercase + letter, number, or an underscore character, then you must include that name in double + quotes. You cannot combine this option with the option --include-schema + or --include-schema-file, or a table filtering option such as + --include-table. + See + for more information. + See + for limitations when leaf partitions of a partitioned table are in different schemas + from the root partition. --exclude-table @@ -139,8 +162,9 @@ format <schema-name>.<table-name>. If a table or schema name uses any character other than a lowercase letter, number, or an underscore character, then you must include that name in double quotes. You can specify this option multiple times. - You cannot combine this option with the option --exclude-schema, or - another a table filtering option such as --include-table. + You cannot combine this option with the option --exclude-schema, + --exclude-schema-file, or another a table filtering option such as + --include-table. You cannot use this option in combination with --leaf-partition-data. Although you can specify leaf partition names, gpbackup ignores the partition names. @@ -155,8 +179,9 @@ <schema-name>.<table-name>. The file must not include trailing lines. If a table or schema name uses any character other than a lowercase letter, number, or an underscore character, then you must include that name in double quotes. - You cannot combine this option with the option --exclude-schema, or - another a table filtering option such as --include-table. + You cannot combine this option with the option --exclude-schema, + --exclude-schema-file, or another a table filtering option such as + --include-table. You cannot use this option in combination with --leaf-partition-data. Although you can specify leaf partition names in a file specified with --exclude-table-file, gpbackup ignores the partition @@ -171,28 +196,40 @@ option multiple times to include multiple schemas. If you specify this option, any schemas that are not included in subsequent --include-schema options are omitted from the backup set. You cannot combine this option with the options - --exclude-schema, --include-table, or + --exclude-schema, --exclude-schema-file, + --exclude-schema-file, --include-table, or --include-table-file. See for more information. + + --include-schema-file + file_name + Optional. Specifies a text file containing a list of schemas to back up. Each line in + the text file must define a single schema. The file must not include trailing lines. If + a schema name uses any character other than a lowercase letter, number, or an underscore + character, then you must include that name in double quotes. See for more + information. + --include-table schema.table Optional. Specifies a table to include in the backup. The table must be in the format <schema-name>.<table-name>. If a table or schema name uses any character other than a lowercase letter, number, or an underscore character, then you - must include that name in single quotes. See Schema and Table Names for information about characters that are supported in - schema and table names. + must include that name in single quotes. See Schema and Table Names for information about characters that are + supported in schema and table names. You can specify this option multiple times. You cannot combine this option with a schema filtering option such as --include-schema, or another table filtering option such as --exclude-table-file. - You can also specify the qualified name of a sequence or a view. + You can also specify the qualified name of a sequence, a view, or a materialized view. If you specify this option, the utility does not automatically back up dependent objects. You must also explicitly specify dependent objects that are required. For - example if you back up a view, you must also back up the tables that the view uses. If - you back up a table that uses a sequence, you must also back up the sequence. + example if you back up a view or a materialized view, you must also back up the tables + that the view or materialized view uses. If you back up a table that uses a sequence, + you must also back up the sequence. You can optionally specify a table leaf partition name in place of the table name, to include only specific leaf partitions in a backup with the --leaf-partition-data option. When a leaf partition is backed up, the @@ -211,11 +248,12 @@ Any tables not listed in this file are omitted from the backup set. You cannot combine this option with a schema filtering option such as --include-schema, or another table filtering option such as --exclude-table-file. - You can also specify the qualified name of a sequence or a view. + You can also specify the qualified name of a sequence, a view, or a materialized view. If you specify this option, the utility does not automatically back up dependent objects. You must also explicitly specify dependent objects that are required. For - example if you back up a view, you must also specify the tables that the view uses. If - you specify a table that uses a sequence, you must also specify the sequence. + example if you back up a view or a materialized view, you must also specify the tables + that the view or the materialized view uses. If you specify a table that uses a + sequence, you must also specify the sequence. You can optionally specify a table leaf partition name in place of the table name, to include only specific leaf partitions in a backup with the --leaf-partition-data option. When a leaf partition is backed up, the diff --git a/gpdb-doc/dita/utility_guide/ref/gprestore.xml b/gpdb-doc/dita/utility_guide/ref/gprestore.xml index e63bf039cb6b..4a00c3e159d1 100644 --- a/gpdb-doc/dita/utility_guide/ref/gprestore.xml +++ b/gpdb-doc/dita/utility_guide/ref/gprestore.xml @@ -13,13 +13,17 @@ [--backup-dir directory] [--create-db] [--debug] - [--exclude-schema schema_name] - [--exclude-table schema.table] + [--exclude-schema schema_name [--exclude-schema schema_name ...]] + [--exclude-table schema.table [--exclude-table schema.table ...]] [--exclude-table-file file_name] - [--include-schema schema_name] - [--include-table schema.table] + [--exclude-schema-file file_name] + [--include-schema schema_name [--include-schema schema_name ...]] + [--include-table schema.table [--include-table schema.table ...]] + [--include-schema-file file_name] [--include-table-file file_name] + [--redirect-schema schema_name] [--data-only | --metadata-only] + [--incremental] [--jobs int] [--on-error-continue] [--plugin-config config_file_location] @@ -66,6 +70,12 @@ restore those statistics by providing --with-stats to gprestore. By default, only database objects in the backup set are restored.

    +

    When a materialized view is restored, the data is not restored. To populate the + materialized view with data, use REFRESH MATERIALIZED VIEW. The tables that + are referenced by the materialized view definition must be available. The + gprestore log file lists the materialized views that were restored and + the REFRESH MATERIALIZED VIEW commands that are used to populate the + materialized views with data.

    Performance of restore operations can be improved by creating multiple parallel connections to restore table data and metadata. By default gprestore uses 1 connection, but you can increase this number with the --jobs option for large restore @@ -142,21 +152,34 @@ --exclude-schema schema_name Optional. Specifies a database schema to exclude from the restore operation. You can - specify this option multiple times to exclude multiple schemas. You cannot combine this - option with the option --include-schema, or a table filtering option - such as --include-table. + specify this option multiple times. You cannot combine this option with the option + --include-schema, --include-schema-file, or a table + filtering option such as --include-table. + + + --exclude-schema-file + file_name + Optional. Specifies a text file containing a list of schemas to exclude from the + backup. Each line in the text file must define a single schema. The file must not + include trailing lines. If a schema name uses any character other than a lowercase + letter, number, or an underscore character, then you must include that name in double + quotes. You cannot combine this option with the option --include-schema + or --include-schema-file, or a table filtering option such as + --include-table. --exclude-table schema.table - Optional. Specifies a table to exclude from the restore operation. The table must be - in the format <schema-name>.<table-name>. If a table or schema - name uses any character other than a lowercase letter, number, or an underscore - character, then you must include that name in double quotes. You can specify this option - multiple times. If the table is not in the backup set, the restore operation fails. You - cannot specify a leaf partition of a partitioned table. - You cannot combine this option with the option --exclude-schema, or - another a table filtering option such as --include-table. + Optional. Specifies a table to exclude from the restore operation. You can specify + this option multiple times. The table must be in the format + <schema-name>.<table-name>. If a table or schema name uses any + character other than a lowercase letter, number, or an underscore character, then you + must include that name in double quotes. You can specify this option multiple times. If + the table is not in the backup set, the restore operation fails. You cannot specify a + leaf partition of a partitioned table. + You cannot combine this option with the option --exclude-schema, + --exclude-schema-file, or another a table filtering option such as + --include-table. --exclude-table-file @@ -168,17 +191,17 @@ letter, number, or an underscore character, then you must include that name in double quotes. If a table is not in the backup set, the restore operation fails. You cannot specify a leaf partition of a partitioned table. - You cannot combine this option with the option --exclude-schema, or - another a table filtering option such as --include-table. + You cannot combine this option with the option --exclude-schema, + --exclude-schema-file, or another a table filtering option such as + --include-table. --include-schema schema_name Optional. Specifies a database schema to restore. You can specify this option multiple - times to include multiple schemas. If you specify this option, any schemas that you - specify must be available in the backup set. Any schemas that are not included in - subsequent --include-schema options are omitted from the restore - operation. + times. If you specify this option, any schemas that you specify must be available in the + backup set. Any schemas that are not included in subsequent + --include-schema options are omitted from the restore operation. If a schema that you specify for inclusion exists in the database, the utility issues an error and continues the operation. The utility fails if a table being restored exists in the database. @@ -187,6 +210,18 @@ See Filtering the Contents of a Backup or Restore for more information. + + --include-schema-file + file_name + Optional. Specifies a text file containing a list of schemas to restore. Each line in + the text file must define a single schema. The file must not include trailing lines. If + a schema name uses any character other than a lowercase letter, number, or an underscore + character, then you must include that name in double quotes. + The schemas must exist in the backup set. Any schemas not listed in this file are + omitted from the restore operation. + You cannot use this option if objects in the backup set have dependencies on multiple + schemas. + --include-table schema.table @@ -195,12 +230,13 @@ character other than a lowercase letter, number, or an underscore character, then you must include that name in double quotes. You can specify this option multiple times. You cannot specify a leaf partition of a partitioned table. - You can also specify the qualified name of a sequence or a view. + You can also specify the qualified name of a sequence, a view, or a materialized view. If you specify this option, the utility does not automatically restore dependent objects. You must also explicitly specify the dependent objects that are required. For - example if you restore a view, you must also restore the tables that the view uses. If - you restore a table that uses a sequence, you must also restore the sequence. The - dependent objects must exist in the backup set. + example if you restore a view or a materialized view, you must also restore the tables + that the view or the materialized view uses. If you restore a table that uses a + sequence, you must also restore the sequence. The dependent objects must exist in the + backup set. You cannot combine this option with a schema filtering option such as --include-schema, or another table filtering option such as --exclude-table-file. @@ -215,12 +251,15 @@ number, or an underscore character, then you must include that name in double quotes. Any tables not listed in this file are omitted from the restore operation. You cannot specify a leaf partition of a partitioned table. - You can also specify the qualified name of a sequence or a view. + You can also specify the qualified name of a sequence, a view, or a materialized view. If you specify this option, the utility does not automatically restore dependent objects. You must also explicitly specify dependent objects that are required. For - example if you restore a view, you must also specify the tables that the view uses. If - you specify a table that uses a sequence, you must also specify the sequence. The - dependent objects must exist in the backup set. + example if you restore a view or a materialized view, you must also specify the tables + that the view or the materialized uses. If you specify a table that uses a sequence, you + must also specify the sequence. The dependent objects must exist in the backup set. + For a materialized view, the data is not restored. To populate the materialized view + with data, you must use REFRESH MATERIALIZED VIEW and the tables that + are referenced by the materialized view definition must be available. If you use the --include-table-file option, gprestore does not create roles or set the owner of the tables. The utility restores table indexes and rules. Triggers are also restored but are not @@ -228,6 +267,44 @@ See for more information. + + --incremental (Beta) + Restores only the table data in the incremental backup specified by the + --timestamp option. Table data is not restored from previous + incremental backups in the backup set. For information about incremental backups, see + . + This is a Beta feature and is not + supported in a production environment. + An incremental backup contains the following table data that can be restored. + +

      +
    • Data from all heap tables.
    • +
    • Data from append-optimized tables that have been modified since the previous + backup.
    • +
    • Data from leaf partitions that have been modified from the previous backup.
    • +
    + + When this option is specified, gprestore restores table data by + truncating the table and reloading data into the table. + + When this option is specified, gpbackup assumes + that no changes have been made to the table definitions of the tables being restored, + such as adding or removing columns. + + + + --redirect-schema + schema_name + Optional. Restore data in the specified schema instead of the original schemas. The + specified schema must already exist. If the data being restored is in multiple schemas, + all the data is redirected into the specified schema. + This option must be used with an option that includes tables, + --inlcude-table or --include table-file. + You cannot use this option with an option that excludes schemas or tables such as + --exclude-schema or --exclude-table. + You can use this option with the --metadata-only or + --data-only options. + --jobs int @@ -257,10 +334,17 @@ --on-error-continue Optional. Specify this option to continue the restore operation if an SQL error occurs when creating database metadata (such as tables, roles, or functions) or restoring data. - If another type of error occurs, the utility exits. The utility displays an error - summary and writes error information to the gprestore log file and - continues the restore operation. - The default is to exit on the first error. + If another type of error occurs, the utility exits. The default is to exit on the first + error. + When this option is included, the utility displays an error summary and writes error + information to the gprestore log file and continues the restore + operation. The utility also creates text files in the backup directory that contain the + list of tables that generated SQL errors.
      +
    • Tables with metadata errors - + gprestore_<backup-timestamp>_<restore-time>_error_tables_metadata
    • +
    • Tables with data errors - + gprestore_<backup-timestamp>_<restore-time>_error_tables_data
    • +
    --plugin-config From 48d9db0ce6dcc2cb897a19e13d4a28945a65904e Mon Sep 17 00:00:00 2001 From: ppggff Date: Fri, 21 Feb 2020 16:15:54 -0800 Subject: [PATCH 032/102] Fix missing initialization in ResLockAcquire Missing initialization of 'holdsStrongLockCount' may cause RemoveLocalLock() fail to operate fastpath array. --- src/backend/utils/resscheduler/resqueue.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/backend/utils/resscheduler/resqueue.c b/src/backend/utils/resscheduler/resqueue.c index 8e2d11db8b7d..ee3bfc941747 100644 --- a/src/backend/utils/resscheduler/resqueue.c +++ b/src/backend/utils/resscheduler/resqueue.c @@ -165,9 +165,12 @@ ResLockAcquire(LOCKTAG *locktag, ResPortalIncrement *incrementSet) locallock->lock = NULL; locallock->proclock = NULL; locallock->hashcode = LockTagHashCode(&(localtag.lock)); + locallock->istemptable = false; locallock->nLocks = 0; locallock->numLockOwners = 0; locallock->maxLockOwners = 8; + locallock->holdsStrongLockCount = FALSE; + locallock->lockCleared = false; locallock->lockOwners = NULL; locallock->lockOwners = (LOCALLOCKOWNER *) MemoryContextAlloc(TopMemoryContext, locallock->maxLockOwners * sizeof(LOCALLOCKOWNER)); From 70b35f2efa8b55e74c44f348ad075479e45b6156 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Fri, 21 Feb 2020 11:47:25 -0800 Subject: [PATCH 033/102] Avoid function bodies check on QE Since QD performs the function bodies check and then only dispatches to QE, we can avoid performing the checks again on QE. Hence, setting `check_function_bodies=false;` for QE process. Without this GUC `check_function_bodies` required to be in sync between QD and QE since if `check_function_bodies=false` on QD, QE must also not perform the check. Disabling it always on QE eliminates the need. The issue was reported from field where below function was being created. ``` set check_function_bodies = false; -- wait for gp_vmem_idle_resource_timeout time and then run CREATE FUNCTION public.f1() RETURNS smallint AS $$ SELECT f2() $$ LANGUAGE sql; ``` Reviewed-by: Asim R P --- src/backend/tcop/postgres.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 8f9dba86f99b..b15b7c425026 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -5293,6 +5293,13 @@ PostgresMain(int argc, char *argv[], ereport(ERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("MPP protocol messages are only supported in QD - QE connections"))); + /* + * QD performs the function body check, hence QE doesn't + * need to do the check again. Turn off the check in QE + * process as an optimization. Also, helps eliminate the + * need for having this GUC in-sync between QD and QE. + */ + check_function_bodies=false; /* Set statement_timestamp() */ SetCurrentStatementStartTimestamp(); From 2d2b22bc3da03215531aeb8ffea492dfa1094416 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Fri, 21 Feb 2020 11:47:35 -0800 Subject: [PATCH 034/102] Make bgwriter_checkpoint test stable by adding vacuum pg_proc It's best to vacuum pg_proc so that buffers are not marked dirty later for pg_proc when executing the newly created functions. Reviewed-by: Asim R P --- src/test/fsync/expected/bgwriter_checkpoint.out | 5 +++-- src/test/fsync/sql/bgwriter_checkpoint.sql | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/src/test/fsync/expected/bgwriter_checkpoint.out b/src/test/fsync/expected/bgwriter_checkpoint.out index 6077c49bb9b5..c154ad105084 100644 --- a/src/test/fsync/expected/bgwriter_checkpoint.out +++ b/src/test/fsync/expected/bgwriter_checkpoint.out @@ -15,8 +15,8 @@ -- the hit times of fsync_counter is undetermined, both 5, 6 or 7 are -- correct, so mark them out to make case stable. -- start_matchsubs --- m/num times hit:\'[5-7]\'/ --- s/num times hit:\'[5-7]\'/num times hit:\'greater_than_two\'/ +-- m/num times hit:\'[4-7]\'/ +-- s/num times hit:\'[4-7]\'/num times hit:\'greater_than_two\'/ -- end_matchsubs begin; create function num_dirty(relid oid) returns bigint as @@ -57,6 +57,7 @@ create table fsync_test2(a int, b int) distributed by (a); insert into fsync_test1 select i, i from generate_series(1,100)i; insert into fsync_test2 select -i, i from generate_series(1,100)i; end; +vacuum pg_proc; -- Reset all faults. -- -- NOTICE: important. diff --git a/src/test/fsync/sql/bgwriter_checkpoint.sql b/src/test/fsync/sql/bgwriter_checkpoint.sql index db2e61a5caee..6415b7580b83 100644 --- a/src/test/fsync/sql/bgwriter_checkpoint.sql +++ b/src/test/fsync/sql/bgwriter_checkpoint.sql @@ -15,8 +15,8 @@ -- the hit times of fsync_counter is undetermined, both 5, 6 or 7 are -- correct, so mark them out to make case stable. -- start_matchsubs --- m/num times hit:\'[5-7]\'/ --- s/num times hit:\'[5-7]\'/num times hit:\'greater_than_two\'/ +-- m/num times hit:\'[4-7]\'/ +-- s/num times hit:\'[4-7]\'/num times hit:\'greater_than_two\'/ -- end_matchsubs begin; create function num_dirty(relid oid) returns bigint as @@ -60,6 +60,7 @@ insert into fsync_test1 select i, i from generate_series(1,100)i; insert into fsync_test2 select -i, i from generate_series(1,100)i; end; +vacuum pg_proc; -- Reset all faults. -- -- NOTICE: important. From 0b99aeaa6aa85b1be51923348df1eafb10ea0158 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Fri, 21 Feb 2020 21:00:24 -0800 Subject: [PATCH 035/102] Revert "Avoid function bodies check on QE" This reverts commit 70b35f2efa8b55e74c44f348ad075479e45b6156. Test plpython_returns is failing in CI. Will look into the failure and bring in the change again after fixing the same. --- src/backend/tcop/postgres.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index b15b7c425026..8f9dba86f99b 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -5293,13 +5293,6 @@ PostgresMain(int argc, char *argv[], ereport(ERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("MPP protocol messages are only supported in QD - QE connections"))); - /* - * QD performs the function body check, hence QE doesn't - * need to do the check again. Turn off the check in QE - * process as an optimization. Also, helps eliminate the - * need for having this GUC in-sync between QD and QE. - */ - check_function_bodies=false; /* Set statement_timestamp() */ SetCurrentStatementStartTimestamp(); From 34a03e5fbb42d94cd4b1ba7da4df6ca543a02995 Mon Sep 17 00:00:00 2001 From: Paul Guo Date: Tue, 25 Feb 2020 11:56:21 +0800 Subject: [PATCH 036/102] Check fts probe request before WaitLatch() in fts loop for timely fts probe response. (#9478) fts probe trigger via query gp_request_fts_probe_scan() or internal function FtsNotifyProber() may wait ~60 additionally seconds (i.e. guc gp_fts_probe_interval) because FtsLoop()->WaitLatch() blocks until timeout. The root cause is that it is possible that the latch in below stack is waken up at first. WaitLatch() SyncRepWaitForLSN() RecordTransactionCommit() CommitTransaction() CommitTransactionCommand() updateConfiguration() processResponse() FtsWalRepMessageSegments() FtsLoop() I found this issue when testing test fts_unblock_primary. The test sometimes run for 60 more seconds than usual. Fix this by rechecking the probe request before FtsLoop()->WaitLatch(). Reviewed-by: Ashwin Agrawal Cherry-picked from 545f4466244d3c7840df50d2076986ecbf6d284c --- src/backend/fts/fts.c | 12 ++++++++++++ src/test/walrep/sql/missing_xlog.sql | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/src/backend/fts/fts.c b/src/backend/fts/fts.c index 65a65c4f7123..332995666e40 100644 --- a/src/backend/fts/fts.c +++ b/src/backend/fts/fts.c @@ -404,6 +404,18 @@ void FtsLoop() timeout = elapsed >= gp_fts_probe_interval ? 0 : gp_fts_probe_interval - elapsed; + /* + * In above code we might update gp_segment_configuration and then wal + * is generated. While synchronizing wal to standby, we need to wait on + * MyLatch also in SyncRepWaitForLSN(). The set latch introduced by + * outside fts probe trigger (e.g. gp_request_fts_probe_scan() or + * FtsNotifyProber()) might be consumed by it so we do not WaitLatch() + * here with a long timout here else we may block for that long + * timeout, so we recheck probe_requested here before waitLatch(). + */ + if (probe_requested) + timeout = 0; + rc = WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout * 1000L); diff --git a/src/test/walrep/sql/missing_xlog.sql b/src/test/walrep/sql/missing_xlog.sql index b53152263450..57f35ef754a0 100644 --- a/src/test/walrep/sql/missing_xlog.sql +++ b/src/test/walrep/sql/missing_xlog.sql @@ -155,7 +155,7 @@ select count(*) = 2 as mirror_up from gp_segment_configuration select wait_for_mirror_sync(0::smallint); select role, preferred_role, content, mode, status from gp_segment_configuration; -- start_ignore -\! gpconfig -c gp_fts_mark_mirror_down_grace_period -v 30 +\! gpconfig -r gp_fts_mark_mirror_down_grace_period \! gpconfig -r wal_keep_segments \! gpstop -u -- end_ignore From b387ef35ec78a381eda93a5bf6efaacbeccaf837 Mon Sep 17 00:00:00 2001 From: Hubert Zhang Date: Tue, 25 Feb 2020 16:23:20 +0800 Subject: [PATCH 037/102] Introduce execute on initplan option for function For query like 'create table t as select * from f()', if f() needs to do dispatch, then it must be run on QD. Currently, function could be specified to execute on master, but the above CTAS query will run the function on EntryDB. In fact, QD needs to do the CTAS work and cannot run function at all. To overcome this problem, we introduce a new location option for function: EXECUTE ON INITPLAN and run the f() on initplan before the CTAS work and store function results into tuplestore. Then when the real function running on EntryDB, it skip the function logic, but fetch tuples from the tuplestore instead. New plan is like: Redistribute Motion 1:3 (slice1) Hash Key: f.i -> Function Scan on f InitPlan 1 (returns $0) (slice2) -> Function Scan on f f_1 Note that this commit only has basic support for this feature, Only one function is allowed in CTAS query. (cherry picked from commit a21ff23b615e5f9a7dd3b02d82fad86175b188ec) --- src/backend/commands/functioncmds.c | 9 + src/backend/executor/nodeFunctionscan.c | 65 +++++- src/backend/executor/nodeMaterial.c | 2 +- src/backend/executor/nodeShareInputScan.c | 2 +- src/backend/executor/nodeSubplan.c | 45 ++++ src/backend/nodes/copyfuncs.c | 2 + src/backend/nodes/outfuncs.c | 2 + src/backend/nodes/readfast.c | 2 + src/backend/optimizer/plan/createplan.c | 110 ++++++++++ src/backend/optimizer/plan/planagg.c | 3 +- src/backend/optimizer/plan/planmain.c | 2 +- src/backend/optimizer/plan/subselect.c | 7 +- src/backend/optimizer/util/pathnode.c | 15 ++ src/backend/optimizer/util/walkers.c | 2 + src/backend/parser/gram.y | 7 +- src/backend/parser/parse_target.c | 1 + src/backend/storage/file/buffile.c | 11 + src/backend/storage/file/fd.c | 12 ++ src/backend/utils/sort/tuplestorenew.c | 17 +- src/bin/pg_dump/pg_dump.c | 2 + src/include/catalog/pg_proc.h | 1 + src/include/nodes/execnodes.h | 33 +-- src/include/nodes/plannodes.h | 2 + src/include/nodes/primnodes.h | 8 + src/include/optimizer/subselect.h | 5 +- src/include/parser/kwlist.h | 1 + src/include/storage/buffile.h | 1 + src/include/storage/fd.h | 1 + src/include/utils/tuplestorenew.h | 3 +- .../regress/expected/function_extensions.out | 196 ++++++++++++++++++ .../function_extensions_optimizer.out | 196 ++++++++++++++++++ src/test/regress/parallel_schedule | 1 - src/test/regress/sql/function_extensions.sql | 84 ++++++++ 33 files changed, 824 insertions(+), 26 deletions(-) diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c index 174637d91186..cb8a8cbc7146 100644 --- a/src/backend/commands/functioncmds.c +++ b/src/backend/commands/functioncmds.c @@ -666,6 +666,8 @@ interpret_exec_location(DefElem *defel) exec_location = PROEXECLOCATION_ANY; else if (strcmp(str, "master") == 0) exec_location = PROEXECLOCATION_MASTER; + else if (strcmp(str, "initplan") == 0) + exec_location = PROEXECLOCATION_INITPLAN; else if (strcmp(str, "all_segments") == 0) exec_location = PROEXECLOCATION_ALL_SEGMENTS; else @@ -695,6 +697,13 @@ validate_sql_exec_location(char exec_location, bool proretset) errmsg("EXECUTE ON MASTER is only supported for set-returning functions"))); break; + case PROEXECLOCATION_INITPLAN: + if (!proretset) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("EXECUTE ON INITPLAN is only supported for set-returning functions"))); + break; + case PROEXECLOCATION_ALL_SEGMENTS: if (!proretset) ereport(ERROR, diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c index f05fda995d3a..b7ef12854859 100644 --- a/src/backend/executor/nodeFunctionscan.c +++ b/src/backend/executor/nodeFunctionscan.c @@ -24,12 +24,15 @@ */ #include "postgres.h" +#include "catalog/pg_proc.h" #include "catalog/pg_type.h" #include "executor/nodeFunctionscan.h" #include "funcapi.h" #include "nodes/nodeFuncs.h" #include "utils/builtins.h" +#include "utils/lsyscache.h" #include "utils/memutils.h" +#include "utils/tuplestorenew.h" #include "cdb/cdbvars.h" #include "cdb/memquota.h" @@ -83,6 +86,48 @@ FunctionNext_guts(FunctionScanState *node) direction = estate->es_direction; scanslot = node->ss.ss_ScanTupleSlot; + /* + * FunctionNext read tuple from tuplestore instead + * of executing the real function. + * Tuplestore is filled by the FunctionScan's initplan. + */ + if(node->resultInTupleStore && Gp_role != GP_ROLE_DISPATCH) + { + bool gotOK = false; + bool forward = true; + + /* + * setup tuplestore reader for the firstly time + */ + if (!node->ts_state->matstore) + { + + char rwfile_prefix[100]; + function_scan_create_bufname_prefix(rwfile_prefix, sizeof(rwfile_prefix)); + + node->ts_state->matstore = ntuplestore_create_readerwriter(rwfile_prefix, 0, false, false); + /* + * delete file when close tuplestore reader + * tuplestore writer is created in initplan, so it needs to keep + * the file even if initplan ended. + * we should let the reader to delete it when reader's job finished. + */ + ntuplestore_set_is_temp_file(node->ts_state->matstore, true); + + node->ts_pos = (NTupleStoreAccessor *) ntuplestore_create_accessor(node->ts_state->matstore, false); + ntuplestore_acc_seek_bof((NTupleStoreAccessor *) node->ts_pos); + } + + ntuplestore_acc_advance((NTupleStoreAccessor *) node->ts_pos, forward ? 1 : -1); + gotOK = ntuplestore_acc_current_tupleslot((NTupleStoreAccessor *) node->ts_pos, scanslot); + + if(!gotOK) + { + return NULL; + } + return scanslot; + } + if (node->simple) { /* @@ -358,7 +403,9 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags) scanstate->ss.ps.plan = (Plan *) node; scanstate->ss.ps.state = estate; scanstate->eflags = eflags; - + scanstate->resultInTupleStore = node->resultInTupleStore; + scanstate->ts_state = palloc0(sizeof(GenericTupStore)); + scanstate->ts_pos = NULL; /* * are we adding an ordinality column? */ @@ -641,6 +688,15 @@ ExecEndFunctionScan(FunctionScanState *node) ExecEagerFreeFunctionScan(node); EndPlanStateGpmonPkt(&node->ss.ps); + + /* + * destroy tuplestore reader if exists + */ + if (node->ts_state->matstore != NULL) + { + ntuplestore_destroy_accessor((NTupleStoreAccessor *) node->ts_pos); + ntuplestore_destroy(node->ts_state->matstore); + } } /* ---------------------------------------------------------------- @@ -738,3 +794,10 @@ ExecSquelchFunctionScan(FunctionScanState *node) { ExecEagerFreeFunctionScan(node); } + +void +function_scan_create_bufname_prefix(char* p, int size) +{ + snprintf(p, size, "FUNCTION_SCAN_%d", + gp_session_id); +} diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c index 044d1b210fb8..bd5e8c25923f 100644 --- a/src/backend/executor/nodeMaterial.c +++ b/src/backend/executor/nodeMaterial.c @@ -96,7 +96,7 @@ ExecMaterial(MaterialState *node) shareinput_create_bufname_prefix(rwfile_prefix, sizeof(rwfile_prefix), ma->share_id); elog(DEBUG1, "Material node creates shareinput rwfile %s", rwfile_prefix); - ts = ntuplestore_create_readerwriter(rwfile_prefix, PlanStateOperatorMemKB((PlanState *)node) * 1024, true); + ts = ntuplestore_create_readerwriter(rwfile_prefix, PlanStateOperatorMemKB((PlanState *)node) * 1024, true, true); tsa = ntuplestore_create_accessor(ts, true); } else diff --git a/src/backend/executor/nodeShareInputScan.c b/src/backend/executor/nodeShareInputScan.c index 27a394022be3..c85d06aff500 100644 --- a/src/backend/executor/nodeShareInputScan.c +++ b/src/backend/executor/nodeShareInputScan.c @@ -89,7 +89,7 @@ init_tuplestore_state(ShareInputScanState *node) node->ts_state = palloc0(sizeof(GenericTupStore)); - node->ts_state->matstore = ntuplestore_create_readerwriter(rwfile_prefix, 0, false); + node->ts_state->matstore = ntuplestore_create_readerwriter(rwfile_prefix, 0, false, false); node->ts_pos = (void *) ntuplestore_create_accessor(node->ts_state->matstore, false); ntuplestore_acc_seek_bof((NTupleStoreAccessor *)node->ts_pos); } diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index eef1076a6150..83adf3427e69 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -32,6 +32,7 @@ #include "utils/lsyscache.h" #include "utils/memutils.h" #include "access/heapam.h" +#include "utils/tuplestorenew.h" #include "cdb/cdbexplain.h" /* cdbexplain_recvExecStats */ #include "cdb/cdbvars.h" #include "cdb/cdbdisp.h" @@ -716,6 +717,8 @@ ExecInitSubPlan(SubPlan *subplan, PlanState *parent) sstate->tab_eq_funcs = NULL; sstate->lhs_hash_funcs = NULL; sstate->cur_eq_funcs = NULL; + sstate->ts_state = palloc0(sizeof(GenericTupStore)); + sstate->ts_pos = NULL; /* * If this plan is un-correlated or undirect correlated one and want to @@ -1037,6 +1040,25 @@ PG_TRY(); */ oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory); + /* + * Setup the tuplestore writer for functionscan initplan + * + * Note that the file of tuplestore should not be deleted when + * closing file. This is due to the tuplestore reader is outside + * initplan, and reader will delete the file when it finished. + */ + if (subLinkType == INITPLAN_FUNC_SUBLINK && !node->ts_state->matstore) + { + char rwfile_prefix[100]; + + function_scan_create_bufname_prefix(rwfile_prefix, sizeof(rwfile_prefix)); + + node->ts_state->matstore = ntuplestore_create_readerwriter(rwfile_prefix, PlanStateOperatorMemKB((PlanState *)(node->planstate)) * 1024, true, false); + ntuplestore_set_is_temp_file(node->ts_state->matstore, false); + + node->ts_pos = (void *)ntuplestore_create_accessor(node->ts_state->matstore, true); + } + /* * Run the plan. (If it needs to be rescanned, the first ExecProcNode * call will take care of that.) @@ -1047,6 +1069,12 @@ PG_TRY(); { int i = 1; + if (subLinkType == INITPLAN_FUNC_SUBLINK) + { + ntuplestore_acc_put_tupleslot((NTupleStoreAccessor *) node->ts_pos, slot); + continue; + } + if (subLinkType == EXISTS_SUBLINK || subLinkType == NOT_EXISTS_SUBLINK) { /* There can be only one setParam... */ @@ -1112,6 +1140,23 @@ PG_TRY(); } } + /* + * Flush and cleanup the tuplestore writer + * + * Note that the file of tuplestore will not be deleted at here. + * This is due to the tuplestore reader is outside initplan, and + * reader will delete the file when it finished. + * + */ + if (subLinkType == INITPLAN_FUNC_SUBLINK && node->ts_state->matstore) + { + ntuplestore_acc_seek_bof((NTupleStoreAccessor *) node->ts_pos); + ntuplestore_flush(node->ts_state->matstore); + + ntuplestore_destroy_accessor((NTupleStoreAccessor *) node->ts_pos); + ntuplestore_destroy(node->ts_state->matstore); + } + if (!found) { if (subLinkType == EXISTS_SUBLINK || subLinkType == NOT_EXISTS_SUBLINK) diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 2681e41cd713..b2e1346bcaac 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -755,6 +755,8 @@ _copyFunctionScan(const FunctionScan *from) */ COPY_NODE_FIELD(functions); COPY_SCALAR_FIELD(funcordinality); + COPY_NODE_FIELD(param); + COPY_SCALAR_FIELD(resultInTupleStore); return newnode; } diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 8e61ae6a6703..4bad87cb2f35 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -768,6 +768,8 @@ _outFunctionScan(StringInfo str, const FunctionScan *node) WRITE_NODE_FIELD(functions); WRITE_BOOL_FIELD(funcordinality); + WRITE_NODE_FIELD(param); + WRITE_BOOL_FIELD(resultInTupleStore); } static void diff --git a/src/backend/nodes/readfast.c b/src/backend/nodes/readfast.c index 01f46b8a3399..0210b0c39d08 100644 --- a/src/backend/nodes/readfast.c +++ b/src/backend/nodes/readfast.c @@ -1822,6 +1822,8 @@ _readFunctionScan(void) READ_NODE_FIELD(functions); READ_BOOL_FIELD(funcordinality); + READ_NODE_FIELD(param); + READ_BOOL_FIELD(resultInTupleStore); READ_DONE(); } diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 701c3dfa51ae..f98d44765b63 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -25,6 +25,8 @@ #include "access/skey.h" #include "access/sysattr.h" #include "catalog/pg_class.h" +#include "catalog/pg_exttable.h" +#include "catalog/pg_proc.h" #include "foreign/fdwapi.h" #include "miscadmin.h" #include "nodes/makefuncs.h" @@ -48,6 +50,7 @@ #include "parser/parse_clause.h" #include "parser/parsetree.h" #include "parser/parse_oper.h" /* ordering_oper_opid */ +#include "rewrite/rewriteManip.h" #include "utils/guc.h" #include "utils/lsyscache.h" #include "utils/uri.h" @@ -183,6 +186,7 @@ static EquivalenceMember *find_ec_member_for_tle(EquivalenceClass *ec, static Motion *cdbpathtoplan_create_motion_plan(PlannerInfo *root, CdbMotionPath *path, Plan *subplan); +static void append_initplan_for_function_scan(PlannerInfo *root, Path *best_path, Plan *plan); /* * GPDB_92_MERGE_FIXME: The following functions have been removed in PG 9.2 @@ -452,6 +456,7 @@ create_scan_plan(PlannerInfo *root, Path *best_path) best_path, tlist, scan_clauses); + append_initplan_for_function_scan(root, best_path, plan); break; case T_TableFunctionScan: @@ -4668,6 +4673,7 @@ make_functionscan(List *qptlist, node->scan.scanrelid = scanrelid; node->functions = functions; node->funcordinality = funcordinality; + node->resultInTupleStore = false; return node; } @@ -7103,3 +7109,107 @@ cdbpathtoplan_create_motion_plan(PlannerInfo *root, return motion; } /* cdbpathtoplan_create_motion_plan */ + +/* + * append_initplan_for_function_scan + * + * CDB: gpdb specific function to append an initplan node for function scan. + * + * Note that append initplan for function scan node only takes effect when + * the function location is PROEXECLOCATION_INITPLAN and optimizer is off. + * + * Considering functions which include DDLs, they cannot run on QEs. + * But for query like 'create table t as select * from f();' QD needs to do + * the CTAS work and function f() will be run on EntryDB, which is also a QE. + * To support this kind of query in GPDB, we run the function scan on initplan + * firstly, and store the results into tuplestore, later the function scan + * on EnrtyDB could fetch tuple from tuplestore instead of executing the real + * fucntion. + */ +static void +append_initplan_for_function_scan(PlannerInfo *root, Path *best_path, Plan *plan) +{ + FunctionScan *fsplan = (FunctionScan *)plan; + char exec_location; + Param *prm; + RangeTblFunction *rtfunc; + FuncExpr *funcexpr; + + /* Currently we limit function number to one */ + if (list_length(fsplan->functions) != 1) + return; + + rtfunc = (RangeTblFunction *) linitial(fsplan->functions); + + if (!IsA(rtfunc->funcexpr, FuncExpr)) + return; + + /* function must be specified EXECUTE ON INITPLAN */ + funcexpr = (FuncExpr *) rtfunc->funcexpr; + exec_location = func_exec_location(funcexpr->funcid); + if (exec_location != PROEXECLOCATION_INITPLAN) + return; + + /* + * Create a copied FunctionScan plan as a initplan + * Initplan is responsible to run the real function + * and store the result into tuplestore. + * Original FunctionScan just read the tuple store + * (indicated by resultInTupleStore) and return the + * result to upper plan node. + * + * We are going to construct what is effectively a sub-SELECT query, so + * clone the current query level's state and adjust it to make it look + * like a subquery. Any outer references will now be one level higher + * than before. (This means that when we are done, there will be no Vars + * of level 1, which is why the subquery can become an initplan.) + */ + PlannerInfo *subroot; + Query *parse; + subroot = (PlannerInfo *) palloc(sizeof(PlannerInfo)); + memcpy(subroot, root, sizeof(PlannerInfo)); + subroot->query_level++; + subroot->parent_root = root; + /* reset subplan-related stuff */ + subroot->plan_params = NIL; + subroot->init_plans = NIL; + subroot->cte_plan_ids = NIL; + + subroot->parse = parse = (Query *) copyObject(root->parse); + IncrementVarSublevelsUp((Node *) parse, 1, 1); + + /* append_rel_list might contain outer Vars? */ + subroot->append_rel_list = (List *) copyObject(root->append_rel_list); + IncrementVarSublevelsUp((Node *) subroot->append_rel_list, 1, 1); + + /* create initplan for this FunctionScan plan */ + FunctionScan* initplan =(FunctionScan*) copyObject(plan); + + /* + * the following param of initplan is a dummy param. + * this param is not used by the main plan, since when + * function scan is running in initplan, it stores the + * result rows in tuplestore instead of a scalar param + */ + prm = SS_make_initplan_from_plan(subroot, (Plan *)initplan, InvalidOid, -1, InvalidOid, true); + + fsplan->param = prm; + fsplan->resultInTupleStore = true; + + /* + * Make sure the initplan gets into the outer PlannerInfo, along with any + * other initplans generated by the sub-planning run. We had to include + * the outer PlannerInfo's pre-existing initplans into the inner one's + * init_plans list earlier, so make sure we don't put back any duplicate + * entries. + */ + root->init_plans = list_concat_unique_ptr(root->init_plans, + subroot->init_plans); + + /* Decorate the top node of the plan with a Flow node. */ + initplan->scan.plan.flow = cdbpathtoplan_create_flow(root, + best_path->locus, + best_path->parent ? best_path->parent->relids + : NULL, + &initplan->scan.plan); +} diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c index c81107639724..b11b63b425f0 100644 --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -595,7 +595,8 @@ make_agg_subplan(PlannerInfo *root, MinMaxAggInfo *mminfo) SS_make_initplan_from_plan(subroot, plan, exprType((Node *) mminfo->target), -1, - exprCollation((Node *) mminfo->target)); + exprCollation((Node *) mminfo->target), + false); /* * Make sure the initplan gets into the outer PlannerInfo, along with any diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index a6d9227897bf..c93431670f60 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -100,7 +100,7 @@ query_planner(PlannerInfo *root, List *tlist, exec_location = check_execute_on_functions((Node *) parse->targetList); - if (exec_location == PROEXECLOCATION_MASTER) + if (exec_location == PROEXECLOCATION_MASTER || exec_location == PROEXECLOCATION_INITPLAN) CdbPathLocus_MakeEntry(&result_path->locus); else if (exec_location == PROEXECLOCATION_ALL_SEGMENTS) CdbPathLocus_MakeStrewn(&result_path->locus, diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index a95fa4ef4d4e..c441241cfc71 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -3011,7 +3011,7 @@ finalize_agg_primnode(Node *node, finalize_primnode_context *context) Param * SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan, Oid resulttype, int32 resulttypmod, - Oid resultcollation) + Oid resultcollation, bool is_initplan_func_sublink) { SubPlan *node; Param *prm; @@ -3042,7 +3042,10 @@ SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan, * comments in ExecReScan). */ node = makeNode(SubPlan); - node->subLinkType = EXPR_SUBLINK; + if (is_initplan_func_sublink) + node->subLinkType = INITPLAN_FUNC_SUBLINK; + else + node->subLinkType = EXPR_SUBLINK; get_first_col_type(plan, &node->firstColType, &node->firstColTypmod, &node->firstColCollation); node->qDispSliceId = 0; /*CDB*/ diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 844b39e0b7b2..62edfa23c79f 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -2661,6 +2661,18 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel, } exec_location = PROEXECLOCATION_MASTER; break; + case PROEXECLOCATION_INITPLAN: + /* + * This function forces the execution to master. + */ + if (exec_location == PROEXECLOCATION_ALL_SEGMENTS) + { + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + (errmsg("cannot mix EXECUTE ON INITPLAN and ALL SEGMENTS functions in same function scan")))); + } + exec_location = PROEXECLOCATION_INITPLAN; + break; case PROEXECLOCATION_ALL_SEGMENTS: /* * This function forces the execution to segments. @@ -2704,6 +2716,9 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel, case PROEXECLOCATION_MASTER: CdbPathLocus_MakeEntry(&pathnode->locus); break; + case PROEXECLOCATION_INITPLAN: + CdbPathLocus_MakeEntry(&pathnode->locus); + break; case PROEXECLOCATION_ALL_SEGMENTS: CdbPathLocus_MakeStrewn(&pathnode->locus, getgpsegmentCount()); diff --git a/src/backend/optimizer/util/walkers.c b/src/backend/optimizer/util/walkers.c index 75dcd48a67df..6110e9cdfe56 100644 --- a/src/backend/optimizer/util/walkers.c +++ b/src/backend/optimizer/util/walkers.c @@ -271,6 +271,8 @@ plan_tree_walker(Node *node, case T_FunctionScan: if (walker((Node *) ((FunctionScan *) node)->functions, context)) return true; + if (walker((Node *) ((FunctionScan *) node)->param, context)) + return true; if (walk_scan_node_fields((Scan *) node, walker, context)) return true; break; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 2b768fbb9301..9d28716b769b 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -720,7 +720,7 @@ static Node *makeIsNotDistinctFromNode(Node *expr, int position); HASH HOST - IGNORE_P INCLUSIVE + IGNORE_P INCLUSIVE INITPLAN LIST LOG_P @@ -9098,6 +9098,10 @@ common_func_opt_item: { $$ = makeDefElem("exec_location", (Node *)makeString("master")); } + | EXECUTE ON INITPLAN + { + $$ = makeDefElem("exec_location", (Node *)makeString("initplan")); + } | EXECUTE ON ALL SEGMENTS { $$ = makeDefElem("exec_location", (Node *)makeString("all_segments")); @@ -15671,6 +15675,7 @@ unreserved_keyword: | INDEXES | INHERIT | INHERITS + | INITPLAN | INLINE_P | INPUT_P | INSENSITIVE diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c index 0d1f7a32debe..ac3b24741780 100644 --- a/src/backend/parser/parse_target.c +++ b/src/backend/parser/parse_target.c @@ -1716,6 +1716,7 @@ FigureColnameInternal(Node *node, char **name) case ANY_SUBLINK: case ROWCOMPARE_SUBLINK: case CTE_SUBLINK: + case INITPLAN_FUNC_SUBLINK: case NOT_EXISTS_SUBLINK: break; } diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index 391211a73f42..eb223eb0b8c6 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -320,6 +320,17 @@ BufFileClose(BufFile *file) pfree(file); } +/* + * BufFileSetIsTempFile + * + * Set the file of BufFile is temp file or not + */ +void +BufFileSetIsTempFile(BufFile *file, bool isTempFile) +{ + FileSetIsTempFile(file->file, isTempFile); +} + /* * BufFileLoadBuffer * diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index b52f15d62399..3b63a84baf39 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3366,3 +3366,15 @@ data_sync_elevel(int elevel) { return data_sync_retry ? elevel : PANIC; } + +/* + * Set file is temp file or not + */ +void +FileSetIsTempFile(File file, bool isTempFile) +{ + if (isTempFile) + VfdCache[file].fdstate |= FD_TEMPORARY; + else + VfdCache[file].fdstate &= ~FD_TEMPORARY; +} diff --git a/src/backend/utils/sort/tuplestorenew.c b/src/backend/utils/sort/tuplestorenew.c index 20cfadc1a75e..c18ad08d61e2 100644 --- a/src/backend/utils/sort/tuplestorenew.c +++ b/src/backend/utils/sort/tuplestorenew.c @@ -746,9 +746,10 @@ ntuplestore_create_common(int64 maxBytes, char *operation_name) * * filename must be a unique name that identifies the share. * filename does not include the pgsql_tmp/ prefix + * useWorkFile specify whether to use workfile for tuplestore */ NTupleStore * -ntuplestore_create_readerwriter(const char *filename, int64 maxBytes, bool isWriter) +ntuplestore_create_readerwriter(const char *filename, int64 maxBytes, bool isWriter, bool useWorkFile) { NTupleStore* store = NULL; char filenamelob[MAXPGPATH]; @@ -760,7 +761,9 @@ ntuplestore_create_readerwriter(const char *filename, int64 maxBytes, bool isWri store = ntuplestore_create_common(maxBytes, "SharedTupleStore"); store->rwflag = NTS_IS_WRITER; store->lobbytes = 0; - store->work_set = workfile_mgr_create_set(store->operation_name, filename); + store->work_set = NULL; + if (useWorkFile) + store->work_set = workfile_mgr_create_set(store->operation_name, filename); store->pfile = BufFileCreateNamedTemp(filename, false /* interXact */, store->work_set); @@ -1396,4 +1399,14 @@ ntuplestore_create_spill_files(NTupleStore *nts) nts->instrument->workfileCreated = true; } +/* + * Specify the BufFiles used by tuplestore are temp files or not + */ +void +ntuplestore_set_is_temp_file(NTupleStore *ts, bool isTempFile) +{ + BufFileSetIsTempFile(ts->pfile, isTempFile); + BufFileSetIsTempFile(ts->plobfile, isTempFile); +} + /* EOF */ diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 771aca5c17ba..9cb0722f26de 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -10593,6 +10593,8 @@ dumpFunc(Archive *fout, FuncInfo *finfo) appendPQExpBuffer(q, " EXECUTE ON MASTER"); else if (proexeclocation[0] == PROEXECLOCATION_ALL_SEGMENTS) appendPQExpBuffer(q, " EXECUTE ON ALL SEGMENTS"); + else if (proexeclocation[0] == PROEXECLOCATION_INITPLAN) + appendPQExpBuffer(q, " EXECUTE ON INITPLAN"); else { write_msg(NULL, "unrecognized proexeclocation value: %c\n", proexeclocation[0]); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 8f8ea98a1cd7..49a977958b49 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -5249,6 +5249,7 @@ DESCR("import collations from operating system"); #define PROEXECLOCATION_ANY 'a' #define PROEXECLOCATION_MASTER 'm' +#define PROEXECLOCATION_INITPLAN 'i' #define PROEXECLOCATION_ALL_SEGMENTS 's' #endif /* PG_PROC_H */ diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index a245ab0b61ea..f17d9c88354f 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -845,6 +845,19 @@ typedef struct GenericExprState ExprState *arg; /* state of my child node */ } GenericExprState; +/* ---------------- + * Generic tuplestore structure + * used to communicate between ShareInputScan nodes, + * Materialize and Sort + * + * ---------------- + */ +typedef union GenericTupStore +{ + struct NTupleStore *matstore; /* Used by Materialize */ + void *sortstore; /* Used by Sort */ +} GenericTupStore; + /* ---------------- * WholeRowVarExprState node * ---------------- @@ -1127,6 +1140,8 @@ typedef struct SubPlanState FmgrInfo *tab_eq_funcs; /* equality functions for table datatype(s) */ FmgrInfo *lhs_hash_funcs; /* hash functions for lefthand datatype(s) */ FmgrInfo *cur_eq_funcs; /* equality functions for LHS vs. table */ + void *ts_pos; + GenericTupStore *ts_state; } SubPlanState; /* ---------------- @@ -2051,8 +2066,14 @@ typedef struct FunctionScanState bool delayEagerFree; /* is is safe to free memory used by this node, * when this node has outputted its last row? */ + + /* tuplestore info when function scan run as initplan */ + bool resultInTupleStore; /* function result stored in tuplestore */ + void *ts_pos; /* accessor to the tuplestore */ + GenericTupStore *ts_state; /* tuple store state */ } FunctionScanState; +extern void function_scan_create_bufname_prefix(char *p, int size); /* ---------------- * TableFunctionState information @@ -2393,18 +2414,6 @@ typedef struct HashJoinState * ---------------------------------------------------------------- */ -/* ---------------- - * Generic tuplestore structure - * used to communicate between ShareInputScan nodes, - * Materialize and Sort - * - * ---------------- - */ -typedef union GenericTupStore -{ - struct NTupleStore *matstore; /* Used by Materialize */ - void *sortstore; /* Used by Sort */ -} GenericTupStore; /* ---------------- * MaterialState information diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index d50ebd828b09..c45e12534346 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -745,6 +745,8 @@ typedef struct FunctionScan Scan scan; List *functions; /* list of RangeTblFunction nodes */ bool funcordinality; /* WITH ORDINALITY */ + Param *param; /* used when funtionscan run as initplan */ + bool resultInTupleStore; /* function result stored in tuplestore */ } FunctionScan; /* ---------------- diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 1a608ca85920..caed77f1c29d 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -612,6 +612,13 @@ typedef struct TableValueExpr * EXPR_SUBLINK (SELECT with single targetlist item ...) * ARRAY_SUBLINK ARRAY(SELECT with single targetlist item ...) * CTE_SUBLINK WITH query (never actually part of an expression) + * INITPLAN_FUNC_SUBLINK for function run as initplan. + * For query like (create table t as select * from f()), QD is used to run + * CTAS, hence f() could only be run on entryDB(or QEs). But entryDB could not + * do the dispatch work. So if f() contains DDLs, the above query would fail. + * We introduce this new INITPLAN_FUNC_SUBLINK to make f() run as initplan + * and store intermidiate result into tuplestore. CTAS will fetch tuples from + * this tuplestore. * For ALL, ANY, and ROWCOMPARE, the lefthand is a list of expressions of the * same length as the subselect's targetlist. ROWCOMPARE will *always* have * a list with more than one entry; if the subselect has just one target @@ -651,6 +658,7 @@ typedef enum SubLinkType EXPR_SUBLINK, ARRAY_SUBLINK, CTE_SUBLINK, /* for SubPlans only */ + INITPLAN_FUNC_SUBLINK, /* for function run as initplan */ NOT_EXISTS_SUBLINK /* GPORCA uses NOT_EXIST_SUBLINK to implement correlated left anti semijoin. */ } SubLinkType; diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8e7cc28cb3f1..6f9547608787 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -34,9 +34,10 @@ extern Node *convert_EXISTS_sublink_to_join(PlannerInfo *root, extern Node *SS_replace_correlation_vars(PlannerInfo *root, Node *expr); extern Node *SS_process_sublinks(PlannerInfo *root, Node *expr, bool isQual); extern void SS_finalize_plan(PlannerInfo *root, Plan *plan, - bool attach_initplans); + bool attach_initplans); extern Param *SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan, - Oid resulttype, int32 resulttypmod, Oid resultcollation); + Oid resulttype, int32 resulttypmod, Oid resultcollation, + bool attach_initplans); extern Param *assign_nestloop_param_var(PlannerInfo *root, Var *var); extern Param *assign_nestloop_param_placeholdervar(PlannerInfo *root, PlaceHolderVar *phv); diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index f6bad5a1baa1..c80576463e9d 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -217,6 +217,7 @@ PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD) PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD) PG_KEYWORD("inherits", INHERITS, UNRESERVED_KEYWORD) PG_KEYWORD("initially", INITIALLY, RESERVED_KEYWORD) +PG_KEYWORD("initplan", INITPLAN, UNRESERVED_KEYWORD) /* GPDB */ PG_KEYWORD("inline", INLINE_P, UNRESERVED_KEYWORD) PG_KEYWORD("inner", INNER_P, TYPE_FUNC_NAME_KEYWORD) PG_KEYWORD("inout", INOUT, COL_NAME_KEYWORD) diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h index 1ded6c26f295..2f4528be8a6e 100644 --- a/src/include/storage/buffile.h +++ b/src/include/storage/buffile.h @@ -62,5 +62,6 @@ extern void BufFileResume(BufFile *buffile); extern bool gp_workfile_compression; extern void BufFilePledgeSequential(BufFile *buffile); +extern void BufFileSetIsTempFile(BufFile *file, bool isTempFile); #endif /* BUFFILE_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index cabfae7646d0..4c87be6d2a74 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -142,5 +142,6 @@ extern char *GetTempFilePath(const char *filename, bool createdir); extern const char *FileGetFilename(File file); extern void FileSetIsWorkfile(File file); +extern void FileSetIsTempFile(File file, bool isTempFile); #endif /* FD_H */ diff --git a/src/include/utils/tuplestorenew.h b/src/include/utils/tuplestorenew.h index 3e8862bf39de..316530ba1e18 100644 --- a/src/include/utils/tuplestorenew.h +++ b/src/include/utils/tuplestorenew.h @@ -26,7 +26,7 @@ void ntuplestore_setinstrument(NTupleStore* ts, struct Instrumentation *ins); /* Tuple store method */ extern NTupleStore *ntuplestore_create(int64 maxBytes, char *operation_name); -extern NTupleStore *ntuplestore_create_readerwriter(const char* filename, int64 maxBytes, bool isWriter); +extern NTupleStore *ntuplestore_create_readerwriter(const char* filename, int64 maxBytes, bool isWriter, bool useWorkFile); extern bool ntuplestore_is_readerwriter_reader(NTupleStore* nts); extern void ntuplestore_flush(NTupleStore *ts); extern void ntuplestore_destroy(NTupleStore *ts); @@ -66,5 +66,6 @@ extern bool ntuplestore_acc_seek_first(NTupleStoreAccessor *tsa); extern bool ntuplestore_acc_seek_last(NTupleStoreAccessor *tsa); extern void ntuplestore_acc_seek_bof(NTupleStoreAccessor *tsa); extern void ntuplestore_acc_seek_eof(NTupleStoreAccessor *tsa); +extern void ntuplestore_set_is_temp_file(NTupleStore *ts, bool isTempFile); #endif /* TUPSTORE_NEW_H */ diff --git a/src/test/regress/expected/function_extensions.out b/src/test/regress/expected/function_extensions.out index 38efcbbd8a8e..1ac540953426 100644 --- a/src/test/regress/expected/function_extensions.out +++ b/src/test/regress/expected/function_extensions.out @@ -402,3 +402,199 @@ NOTICE: unique_violation (1 row) +-- Test CTAS select * from f() +-- Above query will fail in past in f() contains DDLs. +-- Since CTAS is write gang and f() could only be run at EntryDB(QE) +-- But EntryDB and QEs cannot run DDLs which needs to do dispatch. +-- We introduce new function location 'EXECUTE ON INITPLAN' to run +-- the function on initplan to overcome the above issue. +CREATE OR REPLACE FUNCTION get_country() + RETURNS TABLE ( + country_id integer, + country character varying(50) + ) +AS $$ + begin + drop table if exists public.country; + create table public.country( country_id integer, + country character varying(50)); + insert into public.country + (country_id, country) + select 111,'INDIA' + union all select 222,'CANADA' + union all select 333,'USA' ; + RETURN QUERY + SELECT + c.country_id, + c.country + FROM + public.country c order by country_id; + end; $$ +LANGUAGE 'plpgsql' EXECUTE ON INITPLAN; +SELECT * FROM get_country(); +NOTICE: table "country" does not exist, skipping +CONTEXT: SQL statement "drop table if exists public.country" +PL/pgSQL function get_country() line 3 at SQL statement +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement + country_id | country +------------+--------- + 111 | INDIA + 222 | CANADA + 333 | USA +(3 rows) + +SELECT get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement + get_country +-------------- + (111,INDIA) + (222,CANADA) + (333,USA) +(3 rows) + +DROP TABLE IF EXISTS t1_function_scan; +NOTICE: table "t1_function_scan" does not exist, skipping +EXPLAIN CREATE TABLE t1_function_scan AS SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. + QUERY PLAN +------------------------------------------------------------------------------------------------- + Redistribute Motion 1:3 (slice1) (cost=0.25..30.25 rows=1000 width=36) + Hash Key: get_country.country_id + -> Function Scan on get_country (cost=0.25..10.25 rows=1000 width=36) + InitPlan 1 (returns $0) (slice2) + -> Function Scan on get_country get_country_1 (cost=0.25..10.25 rows=1000 width=36) + Optimizer: Postgres query optimizer +(6 rows) + +CREATE TABLE t1_function_scan AS SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: table "country" does not exist, skipping +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +INSERT INTO t1_function_scan SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +INSERT INTO t1_function_scan SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan; + count +------- + 9 +(1 row) + +-- test with limit clause +DROP TABLE IF EXISTS t1_function_scan_limit; +NOTICE: table "t1_function_scan_limit" does not exist, skipping +CREATE TABLE t1_function_scan_limit AS SELECT * FROM get_country() limit 2; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan_limit; + count +------- + 2 +(1 row) + +-- test with order by clause +DROP TABLE IF EXISTS t1_function_scan_order_by; +NOTICE: table "t1_function_scan_order_by" does not exist, skipping +CREATE TABLE t1_function_scan_order_by AS SELECT * FROM get_country() f1 ORDER BY f1.country_id DESC limit 1; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT * FROM t1_function_scan_order_by; + country_id | country +------------+--------- + 333 | USA +(1 row) + +-- test with group by clause +DROP TABLE IF EXISTS t1_function_scan_group_by; +NOTICE: table "t1_function_scan_group_by" does not exist, skipping +CREATE TABLE t1_function_scan_group_by AS SELECT f1.country_id, count(*) FROM get_country() f1 GROUP BY f1.country_id; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan_group_by; + count +------- + 3 +(1 row) + +-- test join table +DROP TABLE IF EXISTS t1_function_scan_join; +NOTICE: table "t1_function_scan_join" does not exist, skipping +CREATE TABLE t1_function_scan_join AS SELECT f1.country_id, f1.country FROM get_country() f1, t1_function_scan_limit; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan_join; + count +------- + 6 +(1 row) + +DROP TABLE IF EXISTS t2_function_scan; +NOTICE: table "t2_function_scan" does not exist, skipping +CREATE TABLE t2_function_scan (id int, val int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +INSERT INTO t2_function_scan SELECT k, k+1 FROM generate_series(1,100000) AS k; +CREATE OR REPLACE FUNCTION get_id() + RETURNS TABLE ( + id integer, + val integer + ) +AS $$ + begin + RETURN QUERY + SELECT * FROM t2_function_scan; + END; $$ +LANGUAGE 'plpgsql' EXECUTE ON INITPLAN; +DROP TABLE IF EXISTS t3_function_scan; +NOTICE: table "t3_function_scan" does not exist, skipping +CREATE TABLE t3_function_scan AS SELECT * FROM get_id(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +SELECT count(*) FROM t3_function_scan; + count +-------- + 100000 +(1 row) + diff --git a/src/test/regress/expected/function_extensions_optimizer.out b/src/test/regress/expected/function_extensions_optimizer.out index b944481324cd..000280982d27 100644 --- a/src/test/regress/expected/function_extensions_optimizer.out +++ b/src/test/regress/expected/function_extensions_optimizer.out @@ -406,3 +406,199 @@ NOTICE: unique_violation (1 row) +-- Test CTAS select * from f() +-- Above query will fail in past in f() contains DDLs. +-- Since CTAS is write gang and f() could only be run at EntryDB(QE) +-- But EntryDB and QEs cannot run DDLs which needs to do dispatch. +-- We introduce new function location 'EXECUTE ON INITPLAN' to run +-- the function on initplan to overcome the above issue. +CREATE OR REPLACE FUNCTION get_country() + RETURNS TABLE ( + country_id integer, + country character varying(50) + ) +AS $$ + begin + drop table if exists public.country; + create table public.country( country_id integer, + country character varying(50)); + insert into public.country + (country_id, country) + select 111,'INDIA' + union all select 222,'CANADA' + union all select 333,'USA' ; + RETURN QUERY + SELECT + c.country_id, + c.country + FROM + public.country c order by country_id; + end; $$ +LANGUAGE 'plpgsql' EXECUTE ON INITPLAN; +SELECT * FROM get_country(); +NOTICE: table "country" does not exist, skipping +CONTEXT: SQL statement "drop table if exists public.country" +PL/pgSQL function get_country() line 3 at SQL statement +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement + country_id | country +------------+--------- + 111 | INDIA + 222 | CANADA + 333 | USA +(3 rows) + +SELECT get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement + get_country +-------------- + (111,INDIA) + (222,CANADA) + (333,USA) +(3 rows) + +DROP TABLE IF EXISTS t1_function_scan; +NOTICE: table "t1_function_scan" does not exist, skipping +EXPLAIN CREATE TABLE t1_function_scan AS SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. + QUERY PLAN +------------------------------------------------------------------------------------------------- + Redistribute Motion 1:3 (slice1) (cost=0.25..30.25 rows=1000 width=36) + Hash Key: get_country.country_id + -> Function Scan on get_country (cost=0.25..10.25 rows=1000 width=36) + InitPlan 1 (returns $0) (slice2) + -> Function Scan on get_country get_country_1 (cost=0.25..10.25 rows=1000 width=36) + Optimizer: Postgres query optimizer +(6 rows) + +CREATE TABLE t1_function_scan AS SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: table "country" does not exist, skipping +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +INSERT INTO t1_function_scan SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +INSERT INTO t1_function_scan SELECT * FROM get_country(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan; + count +------- + 9 +(1 row) + +-- test with limit clause +DROP TABLE IF EXISTS t1_function_scan_limit; +NOTICE: table "t1_function_scan_limit" does not exist, skipping +CREATE TABLE t1_function_scan_limit AS SELECT * FROM get_country() limit 2; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan_limit; + count +------- + 2 +(1 row) + +-- test with order by clause +DROP TABLE IF EXISTS t1_function_scan_order_by; +NOTICE: table "t1_function_scan_order_by" does not exist, skipping +CREATE TABLE t1_function_scan_order_by AS SELECT * FROM get_country() f1 ORDER BY f1.country_id DESC limit 1; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT * FROM t1_function_scan_order_by; + country_id | country +------------+--------- + 333 | USA +(1 row) + +-- test with group by clause +DROP TABLE IF EXISTS t1_function_scan_group_by; +NOTICE: table "t1_function_scan_group_by" does not exist, skipping +CREATE TABLE t1_function_scan_group_by AS SELECT f1.country_id, count(*) FROM get_country() f1 GROUP BY f1.country_id; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan_group_by; + count +------- + 3 +(1 row) + +-- test join table +DROP TABLE IF EXISTS t1_function_scan_join; +NOTICE: table "t1_function_scan_join" does not exist, skipping +CREATE TABLE t1_function_scan_join AS SELECT f1.country_id, f1.country FROM get_country() f1, t1_function_scan_limit; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'country_id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CONTEXT: SQL statement "create table public.country( country_id integer, + country character varying(50))" +PL/pgSQL function get_country() line 4 at SQL statement +SELECT count(*) FROM t1_function_scan_join; + count +------- + 6 +(1 row) + +DROP TABLE IF EXISTS t2_function_scan; +NOTICE: table "t2_function_scan" does not exist, skipping +CREATE TABLE t2_function_scan (id int, val int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +INSERT INTO t2_function_scan SELECT k, k+1 FROM generate_series(1,100000) AS k; +CREATE OR REPLACE FUNCTION get_id() + RETURNS TABLE ( + id integer, + val integer + ) +AS $$ + begin + RETURN QUERY + SELECT * FROM t2_function_scan; + END; $$ +LANGUAGE 'plpgsql' EXECUTE ON INITPLAN; +DROP TABLE IF EXISTS t3_function_scan; +NOTICE: table "t3_function_scan" does not exist, skipping +CREATE TABLE t3_function_scan AS SELECT * FROM get_id(); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'id' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +SELECT count(*) FROM t3_function_scan; + count +-------- + 100000 +(1 row) + diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 3bb08c3baab7..32b4aac4866d 100755 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -9,7 +9,6 @@ # we'd prefer not to have checkpoints later in the tests because that # interferes with crash-recovery testing. test: tablespace - # ---------- # The first group of parallel tests # ---------- diff --git a/src/test/regress/sql/function_extensions.sql b/src/test/regress/sql/function_extensions.sql index 36e2063a8075..99dace9d7d44 100644 --- a/src/test/regress/sql/function_extensions.sql +++ b/src/test/regress/sql/function_extensions.sql @@ -232,3 +232,87 @@ BEGIN END; $$ LANGUAGE plpgsql volatile; SELECT trigger_unique(); + +-- Test CTAS select * from f() +-- Above query will fail in past in f() contains DDLs. +-- Since CTAS is write gang and f() could only be run at EntryDB(QE) +-- But EntryDB and QEs cannot run DDLs which needs to do dispatch. +-- We introduce new function location 'EXECUTE ON INITPLAN' to run +-- the function on initplan to overcome the above issue. + +CREATE OR REPLACE FUNCTION get_country() + RETURNS TABLE ( + country_id integer, + country character varying(50) + ) + +AS $$ + begin + drop table if exists public.country; + create table public.country( country_id integer, + country character varying(50)); + insert into public.country + (country_id, country) + select 111,'INDIA' + union all select 222,'CANADA' + union all select 333,'USA' ; + RETURN QUERY + SELECT + c.country_id, + c.country + FROM + public.country c order by country_id; + end; $$ +LANGUAGE 'plpgsql' EXECUTE ON INITPLAN; + +SELECT * FROM get_country(); +SELECT get_country(); + +DROP TABLE IF EXISTS t1_function_scan; +EXPLAIN CREATE TABLE t1_function_scan AS SELECT * FROM get_country(); +CREATE TABLE t1_function_scan AS SELECT * FROM get_country(); +INSERT INTO t1_function_scan SELECT * FROM get_country(); +INSERT INTO t1_function_scan SELECT * FROM get_country(); +SELECT count(*) FROM t1_function_scan; + + +-- test with limit clause +DROP TABLE IF EXISTS t1_function_scan_limit; +CREATE TABLE t1_function_scan_limit AS SELECT * FROM get_country() limit 2; +SELECT count(*) FROM t1_function_scan_limit; + +-- test with order by clause +DROP TABLE IF EXISTS t1_function_scan_order_by; +CREATE TABLE t1_function_scan_order_by AS SELECT * FROM get_country() f1 ORDER BY f1.country_id DESC limit 1; +SELECT * FROM t1_function_scan_order_by; + +-- test with group by clause +DROP TABLE IF EXISTS t1_function_scan_group_by; +CREATE TABLE t1_function_scan_group_by AS SELECT f1.country_id, count(*) FROM get_country() f1 GROUP BY f1.country_id; +SELECT count(*) FROM t1_function_scan_group_by; + +-- test join table +DROP TABLE IF EXISTS t1_function_scan_join; +CREATE TABLE t1_function_scan_join AS SELECT f1.country_id, f1.country FROM get_country() f1, t1_function_scan_limit; +SELECT count(*) FROM t1_function_scan_join; + +DROP TABLE IF EXISTS t2_function_scan; +CREATE TABLE t2_function_scan (id int, val int); +INSERT INTO t2_function_scan SELECT k, k+1 FROM generate_series(1,100000) AS k; + +CREATE OR REPLACE FUNCTION get_id() + RETURNS TABLE ( + id integer, + val integer + ) +AS $$ + begin + RETURN QUERY + SELECT * FROM t2_function_scan; + END; $$ +LANGUAGE 'plpgsql' EXECUTE ON INITPLAN; + +DROP TABLE IF EXISTS t3_function_scan; +CREATE TABLE t3_function_scan AS SELECT * FROM get_id(); +SELECT count(*) FROM t3_function_scan; + From 875b88345ff1c5765140f0b9424324df1b48cdfd Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Tue, 25 Feb 2020 10:33:58 -0800 Subject: [PATCH 038/102] docs - make pxf overview page more friendly (#9610) * docs - make pxf overview page more friendly * address comments from david * include db2 and msoft sql server in list of sql dbs supported --- gpdb-doc/markdown/pxf/graphics/datatemp.png | Bin 0 -> 73753 bytes gpdb-doc/markdown/pxf/jdbc_cfg.html.md.erb | 2 +- gpdb-doc/markdown/pxf/jdbc_pxf.html.md.erb | 2 +- .../markdown/pxf/overview_pxf.html.md.erb | 57 ++++++++++++------ 4 files changed, 40 insertions(+), 21 deletions(-) create mode 100644 gpdb-doc/markdown/pxf/graphics/datatemp.png diff --git a/gpdb-doc/markdown/pxf/graphics/datatemp.png b/gpdb-doc/markdown/pxf/graphics/datatemp.png new file mode 100644 index 0000000000000000000000000000000000000000..6b2ba474671e4e2e47861a4a467cbd126cd3abde GIT binary patch literal 73753 zcmd43V_;>=7C+cYC+VPLCmnZe+qP}ncE`3mwr$(CofF&6r0>1&&3o@ZUuHhd$2wLsvf(y`T z`Zon6Wz@lsTQYn}PyrzKZeLA(>gorT*d+wU0qh9E@oUo@S`78MCCRxZci#R;&1#+g zjf9fh0M{DhFY+Hs|4m059?uKL zE6iqp%eXS_!=P1>e7V8~!_P1^MKUx7{i|_lOj`5f!b_XR$sBMgqyKxWe=UI92<{&^ zs5j@&naoi`sm5HAtUh0&8D;A>JUH1)Yqo+imNK1~5O!{~zU= zBkY0lhmez-qhn-7)YdeD9Fwh_-cfR_{tTSEDX4p1>5WUs3R)TKXMqd z1XgFZ=zi$!m~C1a|JIPJw5}%Cal|dzJJOjbu6rGx!hk|B7^M^(2TPIifVwY{8F5S9 zbjxS1K7^9r#yMSK|Lyj8RbAAl$^Ha}jlql*rHm?aXfqZB1s8i?IU)!eVgXR8w&F?;)UF2v_R6?R}#n8OU;ayK4 zP?#idTKxXfHg~E7tJj37s9#;Me$gsM!3NtHUA_5Y6VulsMxo+4|=u z91rbmG_k_vuhWA5!l7CTi(Bm7W~8X`s#t{nd(W`gzarP!AnqhnS@HWS zXA|O~5L&QZpf~R^%NX1Vw7(=c>pXCgl0DvQYBVE}1Sq=3BEO+Z} zX&iQZDv5C&S~x6QxWQnjp9FF-{8~ExRvp z0f33|)ur1>Z!`vvDw&;*boXlS1Fddp@tlIO-(Z=?ilF^Rd(((2%`VMV24aGDa&tbK zFVV@km`{cNSeT93ZsQoS;SC&Af|@iB7f)nix#)yu?9HdQCmLO-R5#prer4%QgBeZ7=n=L&k)vdrtw6C^f_+?^8DUP}QTcteg4Q~?UBiR}L>o9CNxDtg@e6WNH>Wt2=0Q!Rf z7qviS9Gs}m18Vq0_35`^vx`7EAGOe_I_=N%ru%5O&|h+=7Io4mARr)7t*8Gl-wufa zwBaXPb2E7oS|kej$f*^iL+x0}(~1f*cB}IMn$xnwA=wtTwl!5o*w@^X21`pfJA<9$K-5dJx21vYqA^m57(9WBTr zRJLSmP+SU^6ffE{X59;wxs1eS0M>Rm@>9C|ovYk>4mX}MvkYJR@%t+|IXUQ48b~A# zTWBEHT|5_iwh7JFWIlD#C+{LR7`S!L>scHFYXPG@43m0sQ&09m{+d=Ngm;iwYFnW@Xw zY4gm%J@nmJgOMWX8L{g7CRyDD89SyQlj&?Tx?~CS*zIWumg#|(4EpsGyVLAC9jP2X z%0x!xZ3_2^baW3vOWLhCg{#h!|V2Pow-E!DAvqGVv`f;kRbCn4m5{Gi5!VFtOrjYn*Tt zd5T?vH1S8zd{_XgnVQ?(<;K@Ctrp11OfJN}Lz#B%jt@~6poaQ-$xOvvo3{bztKK@C zglpHtd%4h2Q;-zzd{|P@&UZRZ7K5mfwHhz_lErJm3op++GO0|Ch-hX!z9#<0NV&Gd z?C}i;Ic6-OCi794?^v7?5+l>|`N>cV7Ej8#bPPvHLZG$(1bif)fe}9hgzz7|+yp;e ztrp%!Iwwc{I$af-)2#l)HR~1nwe=xad2{1>8qx~`;OuCRymOcasb)7{BPHJ_P?&*_ z1&){mViZ|YciE9iV`${j3w8+?$rdznUX4h*d2qxG#J{PdCOBQvXp@|B%BXL(41K%xP5Z#Yvrq)m5^i7{U-=6wHM%Oh5huZT{H zW{KGlyQkp#42up^2IjROO9t;)MlI8;n7UBe)kiKtxXLaU;p_A4=Hy3C)2KBi-g z8q6yMMpK?r@NBt3e1&0?i;}9n&7(>GVZzs!*Or~xQdy3shrjo7JzDl!^{00swkC~r zio{W_{}glf;pQ2XC{erda-ia6`B|xL=S)(s?mtc+8)#!Mozp5U>{BG1cv~dQ;yf?p zd$7$=p6OAX(Au_~(6K0n$ z_vyF2AN#M6W?_e93KihaQ3!f8z2>rudVIfIu794M=4@ccgp)biICQ~$a2_s|BU&gV z&rvE@b0ogEY8=L?2u;o7c8ed#27f~dH7%(qU#SnY%vP{(=T)u{#m@sV0F+sl5yCs} zi6*Zkq$ds`?5>#{a5mH2J)KZe4y2|g8@@);1R ztgeup31`83VWc*{7JJ(wVRSV&M<9AX!z2$%cL!^t&jn<4e%|K&^C32cr}^*P!eVB&-u zW>XtTVF!;SXPfZ;`tuH4MO9U_KwOxmBltYX{w587$8P_|X2Py%Nm9|0e#wAU=`JXF{t?Q*>GAmi7aS@D(%yR#CoeY;+UF)I zBijp_dL1caT<+5yz%qYBs;=#x&r)8dh^+nJ^gkLC#A?I!tGGSAi#yAKkfYgRc9xDz zMRz-ngi0cRM8J|k@|M0+Pjo@;-SGY{Y9p2BK5d__AhPW_KhoS5!;(5NEM*p5R zFRv;#Tbem+wdcPXwBh|2oN4?+D8cUBM6I%H14vemrYIoy8-t%Pt?(^o%cq`o&bJEL z`~CN}czAza?kFM|O>hEPJEKZqu4_@lqj|nQUg--_uGVnFl{RJw=0gg~*0x}VB9ns* z;x9Z#+a`vG?zsGyLwc}^m!G4xbqW>RCCvcyo6CEpL=J@~S^da}3(#Z$5E!5wk z;=gNo-T!}_K*N}vXvhB_ltzz&#Ky|Qx$L4>Y`XdsQuka@v0;mfl0GJno(-KHOGt+6 z8kz>`O$5je)RLGf-2VzPk6E(9P2T(v{(ie)Qwtr#;~)XNgAjZ9&j0@2 zNC)iyElLxQ91T;Jd28t2UP#lVWX<4nO@253!rv45Bd}eg`Ui?Gz z)JU5m8lbHByD^B0>c#H8_VCcBCWwB2JP4V z&!j{B14~1?boc zpKg*u-}csAyt99D<^QE+^znSaBBj|Bge}>AX{zbGw48h(sefCxVE)isy*>4{EmETI zfnP1hC?S^36z^^S2Ztk+&muO#qbQomkI{i1q!!PX!#r{fWHG-_QXtz6 z1*;#;5>2P;$;uE40902=A%(-@aP6G;GP_*U0b6hO) z5;q~&PR6ObU&w-Ibc7117=&M5CtWmSVnV^X1@$akQfz}yB1kp&ye&2Bl&v{{g$!Rz zS1G9~P@RA+>Iv#tDOD+9&6IVYOgT4sUJy=bbwmw0S3lN>TRNC3DxTz?CRuaz*FYD- zp})4=0YP&`RQRhyzOG4isDL|p@ue`m3 zazl1+VoL37S+*Xh|o_&cs!M zG4n%#n0(RSDZ?t#?1uMaiw_ED4nYwc_ZHM>K@4TZ^qa~SRQ!G_t8I>z=RDZCnAzv~ zJYKx@eKXdH-M4PH63qjy>O!r#2LZZnOqA(5?2(OuA7vK#kp zdo4kD$#aj53Y1gO_MpdxQVoP@0mXMg5VqgrEId3qIXOJ~l7zz>+2`ScJ4?{-`f#R& zOblu9?00`wD|OY)lw(?p3r?h<#kNuz_^Cy?KkCKn#QM+>IE3ghcvN_I>E{86j7S|$ zWpR6k zd9Ek*Yq;ZZ;G)lWq+z4Dv~)mV`q2@kudJnex=1;lXN8WnFd%q<;!_ojc8gsPx%qI# zn`_o@U#jL;PRxV{dNzo~(^`OhHGA2>di?FoUbSwYB{H+!kNTuJ*D@ier%W&sa$GoN zv!7W1?J0=`{fCb&zgo8J*^JvJdpQZ34|C@BWP&g+wR^;gB2T)RRu3I~`$anpCK?8I zz--d__S7(Dk@`9!!821}oL{+;dD{aElD#rE&>IO;xi6_rI*08=?+U%8B+01~*?x;g-+Sfo zj283?n0>T6Rl_9}l(rat$L;CU={ZFolZ`BVu!kpsbqk8ZAdrlXV4lgCQW{GcNe{V% z1Sm&qS-IKLErcCwNr$F!XNE{IWqSrd2vU=LktZ=oFJoXiz(e}$+8?G` zOp!g-OS3&d4iY zO2P}N322=2xivH)p{U!J0hF26XZJVBwQD%#sgF8c#t||t3bHE6sl#DNBHWg~lqZVc zDY=XNmzsW#X?ck(p{Z9(OkksJTkB8T_lfy1lG^7<)P7zjjOVzzvFN{Aulf`E)pS|JBCs1tAyT01ZnLS@Q`Y%c>p!PCvT_51ZyV`dIv|b zx$jG1?3E7b+Q$VURkI8wnZvbhfNifJisZ-X`kYh&D2 z$Pf_;ZcUCO=H0EjvV$qxI2hCmDBkxK?Cbmc8uE1$!!QT$kW354?W$;nMTc?d=)0qBUk$)Faz9LGs*s^pWl47uLO zZOH=NzCe6wUaXv52JYG&W@zGC$0)RCV;7J>r)*ERCT=5^(2tNvvN6)5V%M zW-7Onv;u7rf@dZ9bHhZ#QychdDD!z*g1YpL@m<*U;4Dk4IeD#!?;1x@k>jOBmS=~H zov&M(a|oMv*RrM9D6#Okh`&1kX*72i@;zKM>gG_YdPE%n%&*aKhymO}Rdz8DLZe3QN!p_;+7L-`Bz?cVQ# z&t1!2u)aJz%Zo&Mgl)Q4*=3VD%h4UXqjp=Te%(HiN01V%YX#Asy)TNLT<|qgLoqZo z^nu`CEhdZGrNh;+HjYq@bJLmMuWhosQ3aDj7q(t;M}LW2?-~oqG+pPD9;nCgf>d3= zJ13@I3)&1($8a3F*rV*0Qz#WmU>q6jp#w%9ZGxb8A#Y4e5H#9pb$em8T63jRu^AQ! zkE6qrsZqgmFxP;DTKq98AeJaVgy)Ra5qROI7$aCCO#ob{VJ1>|om~4`CG@1iVhv&O z(CUlAxX$wsr8xn9YpRIR3Zsom5oDw|Q>z@az0qH-`#@MPuG(}D!q!mcgCAfyH__`G z3KH77lq;>$9r7xjwvcK~_@RI!Hkp)!5z)Qza=@vR zhcRE3f*PU?(v>(3<_u8s{$AB)QHeLF+3Jfpq3N1zDsLLKek2Y%oe##XP7j)Iq<5j# zU+uYR$e3?0EkOY4$M91vPAj*@?c~I+5;IU)zHp&q$!!aA$d^y)d^ikzt>po79P^s3 zzf8q2VWUD-F|Fm1hvqCb;WS&mQMNMhef9T53C?ruMwZd%&7LFi?Ed4oH58cXQH5Z{ zd7`P32KLCb=G5ZXkedgYQaxEM?+jAHJH%84XXJ33=$!$nPKf9u+sv8Y>P@p3S^&!v zk2WmZgZEMiOxxEr&bfQ1Uxbr##v`KZeFK0rIF}RF9mF|p;)X`;mD7+2|T z`e2DqG4?YzQsoF+-g%D;?>tBW0P5BV;4qVh=$+LDBvIxM!x?Ire#f|9{zS@8Q>`8i zmUi0V#G;^oOP`cVVI9wHkZf>|;`dLYWaEiUcLG-+*$veUqSG$L#ESbhb!p<#bOoO4 zRP+@fjyl+u@GSs`m#t9XGhhr8qCMy-iDjYs1`^7K3RW$s3gAPClZ%eM_9<^B$sLvr zPmN>b>&d03@S9mnbG26fL-^LdMFg>KO4N=~uF6D?A@FUgzH#w7zr4QQuFcjZZCW#2 zaFR8jj-_G9Oa|d}g(JN}X_;Hf&CA1u?e!R_Mw=a$vh^CDEHUw;Qfc@A^l^^A`oIek zl}1PCx|guOeMX4_MbO5&LS`NoDo&JT>zmDu6%96r2XGMneUU;o#q+~COe^0Cv$dEQ zSap4kg6E^Ph`@J^Gs=1oW+*4yihC3n_-m<3+@f-wDbu+M!Lz%PtH2F3HKIEzHW-H` z38BJFZck(gHsqA_d+;eadSQ3pt+`Ssz6Y*is%FG_HcPcPfsz3#ywk%6VD z>?6uW;S=j_-D?`Eu7%$wpK1$j&PE8=+FY@uq$VzlkHcE-xP3}f%|?l-wK;m9YJr;E z0Qfzoe_eG4+@6o;cBL!?EzYU)3>?^0!wQ&YE7-!pSvXi>`(Dr;~V@yo{~=`<5j zJ~KjcC`3jvIqje9)T%8;Qss+#*u|UbhN92d@|vfRC_dxt_qua=#V(lnWg73#2YTTc zbYD8(U)zD>`}%&~Vltt$-jk;?Q&}j3v%f!K-kB}R3?X8oMh$n4_s7nkfC79GFz1Sk z<+@p%uOK@x7=d7MxD`qKfawn?yEh_AWqOI7Vd?WXf0)~!z3=x84DRxDnWi;6L09(l z=veAPTp*mAEFK~%P?qWmKy}qJTiYDl;Zkk1g-i{J@J=-92{IHbA`S?ZAW!=D2~ccy ze68&5(J?X##9MF8Zi+iLZ1-ur*m}$tZ@Rn0EAq^!sNi$QP$(^Mpti%ndz~0OTXWm|-O%0B&GHalu4R%J$_Y_MWZb zv20F0B>9ec4Jwt%R#7w54`rke=v>{_RJ-R-j*kzf?d|O8{@OYVP>tjZIIYqG+ljBHvCu}JBAz3Wn(x9Q>@ZHUu6G2!yT~>9k>`A=Z{LL<&yxfKDFvSU`rbW$gVK2|zdvup<+_KYo<1*~ zZu9aCEF^rJyz|s^mhvqYKnxBD$=M|sbfwa#6}VQveA)g8krFait^yy^tl`VI<3&1j zzL+E=pDyyU^jK-D1|cE<_=0Q*3g&uV_%zw+adWI`25{VVUXL%~TXbVf^eCXrzQm@Cz zeZC#I-R|up*>>BRF}(f34K!_QEB6|iTYLa^XR-`TULX-z9I_Lo*Ofs2=rmyMa;ed< zm~A@#TMSajP7IQ0Li<-|qm{f{!^Gw#4wv;JiucEDu2N?C?f1fb)91Z8B^v*uitj=% zb~AXIQHLe$z#eDEgdkdMnTEK=Jr2-<{OYmtx-}cM%5T4i$;KW zZc}Z}mHMFCUi))raC(L}#=zUyQH72umOxOqC@HOIR*DjPAg-?29ZZ566lus{W^j9l zUI@bu+_%$#3e?-htn3^flH7i@A-+ax`KPu9!MVPz7<$Y%vWw8l0`AW@z+UiSlIc7n zqucbB`CeO%QpjZStX>kKi+1F(wC{0Zv%|TFr(*rW0E8b#KdJQ%^# zJJG`Xe$Er{sym(e>BGCfKAlYRVPlAJKG+?c3!XejwznLWqmlOjr{ zn3Gx4WHo2IiT*MQS*^DS6@os-WnzN4Ni36-VY|oSA|geL%H$e6|N1V+E%g07YMLKx z>lIM?Xm{wG^^nx`2x76^467dmi?9b4l>+knOn#73qd`fI%A|NiQ1zbjca6y~RnUP^3Nn$$s4}VFZ*>N#=B+e2|#YEX(oDjqinevm=5^zk^?| z#;jCa9pWKkrNz)Zux#OCH3|bTx|@o*&7J1AR_2u&};RcjHGDgqt;kapxNmBNIpVM9I&UyxEimiCIh*R zU}uc+5)k%W#qoM^CarfF9q_I)*`O#}u0jNim|J5cC7an+X%w%c3nDo?@)uULZyg^; z=DvTkc`Z5xnK)SlhDX5wCN@Up9cJ5W6ect3%F=ywy95W(>i9L?3kE*6v%AYH)*Ed- z2yVxKh)0GxF(G)hSmT)WOcefxVhk*YoTRzRyd9+NB`78a`Tm<*oUGF^H8Pp?fLb?X zZt>0S7m|#fxe^tv3Ml_+TC2J3?fC)_;hF|7taB#;ao^9WQA-gMG>tU*!ltAORccaH zLCv%(+b6!JR+Qc?TJ9|so@3Fy56jJnlsIB=0J74V7l(LY-D!PK~k%R z!7zu^UJ`JZK-wt&0U_~&5h;UEJj)CKDW|WQhIMt;GzSC`8lUOO-PFg$kzR%9t}W0jkq#V>&phH-m9%jsJ7 zBwM5K&A?qiJg=v_?(zzBlP$Xn_^eL&XWe)X;(6S3wF(I1=^1h~X1`onf$jbqhzKOu zI7RS?pscX)EAx$VQ!`S3^wF&CtKRXG!N(I;-2%?Bc+Dr2BKn+{oo0*l zY@+J|qT?gA6_4el3w5ify9o9>_u(9FqaSJi@<#p{Pa2bb|FiN zOjaCWDb4;Sh`8y4(aI_ZfKXF2`FUMj&njWy`898*}7!%mAaGgeacla z%e7A9)~%?BkO7~%d>JHKryYa2BGqsGn(Lqed}BQHI6%aaJKKIb(Z2XCnT4O|C%%9H zw!o+iT%q0J-5{5n=X@llpVI5U=c^7MikoboaSJ4$VQyvCYuu{q=6FuS9=i$&Wq0IA z#8R2A*yqX>vCggfY0XCjWHnS8w!C-h;82nh9ZlAM1T}t;4;Nh<6nZEE4&8sW8>C;Y z29B40^jucH==Bs?U!G8ho=n(>u~yIQVIQ%pTV;A79-LrpW5smtHuKX!~AZ!a*01rSR5z1#(+sP11-!tUK`MgtRE953D{_%{00 zG9^(%oMaon{(5I=H)3RZ5q~Yv-`Dlwaa@9O#hmAvD{W(Pq$etraB#j$j7!MxP}_+K&908w#Kx{a zqx`kcL^oxAbn4a8lxh;`dkkGfl^4ASMZw!M@oxrmLLn$+TY)6tIt!w=;Pay}=PH{c z?iucUmYh_qr^y$Xo=e%?1TMc0^!zEJC&p_?2r!f>)N}u{F0JJuo-m~|o1=}mj6=c|nRLHgL^n?^9$f-aCpx%O{M29J5vDU0@Z z6WLCJ0&)%IARvmxwN0J+7KGXfd+!4bJ6|4fI8jn@HHKs34^1iq@meP&Q@aW+&;zKN znuowhNxQ3U%)I-d7jtZKF>0W16U37O)62(7lx$WqTBIaXw{`*@eSzg03`A*5M^5;g z-_YxV_&c!fN|h9c?1|SoWktrm`Ebb7IB@2a9)Z(XCSq}(6X#10%Y|a{xZfU|Pc>?8 zHX)7)!y4uiTdp_P|(%Bj%{52W*lk?7*NXk(jlY?t#6ssb@^>-qa>7+I?eOGd>(}W=LS6 zAmkA5mC_M+kyP-0In>zX_Er!dA%xl=)4phzE7~U$jvBzZCdaHQmpp8GrF3?Ks| z>T_mfhJ>aa7%Ybt=3L!Jv;v18dosh6AyMvzy834nPx5z6hAq78g{d_Z4&{1}ixX3| zf<~FzW-mNMl?Ca$u=@?@rM55R%6*U<>IDU<`<;0MdH}%p0R~@XNiQ~cO;^SsLXfKC zNIj6eFfErt(={3u-EJhhr|yC)3JAJN4yVM=N&sPbKUcOiD}BbE{o-|aus!#rYDsfe zo2(djW{^TG)eX3R-S(XB;J*x_0-&G7+{NYO3{uD3ylHvpe8?(wRMg42Oe`!62q@-0 z8m$Qe+A;B=#L3C0$u$8t(H_GNVxvv6eztp;UZ)$)=$kOl8Y99P@efZ2+c&FgojfR9 z0uOgdoP_(=iApB?25!dYW-j27l8#JfcCx=3bOJ<&haYvHEga5Xpg#e&g;iG?UZ#me zfRobG8)iI6ccR1o4)?HqK6`$A{5&Idb&}LT>Ys&&f+aM}6owOBy8tTpOuU%N0*T%Z zd7veaOE@>_O|IVA++qstoq6q$=vVU(lM4I6If&|<$%6weqUFWY7{mo z{mZFoJ-F)XnwW@N?sqX*jNj`a2`LBi^_hnR2K(Biox-Bj3|1=$#F6<{_-zhX6oEhjY$k)FX$A@#aP;u}6zM zO)B#p`gw5lTT^xu^UaNi;I&k zRbBt)J)RK=A#db6Q4r%?r=h^0T^`nZzjrbs$qkL`$!SHvv9uFH$)d z)7Dcfciv2zm@q#-I}TlbJl~sLTRbVxmgL7(q9OF{tt6JQuZbhITLx5InV{u3{7b3X zWN1EAo{9Sl{^4AGA<~3&M%zA@&6JJDvUipL?N#o#SUX(<1AkEQg&fsUCb^bu_w%(5 zAKA?UHQ$_$srO)R)N=onl(8SScoWDmcUXITT=ZU`P!lCB2Mwx!NPBSyCUcjT zR*X)Tytj!=9ut=0(hY~bbUE%*`H0SX4R)p)Dv=I@_w}e28|ebncMfr1(hKd~_cW8& zL|P;2kq_grm@jW8;XJO;U@Wx!2)!i+7Z}AqA;J9LyhxFg=HF>Wr@nCXdZz}+D_4k6 zo^2KrwB(O#$r$K16G~jWpXYtI!**|eL$jM!bwic>jr(IENoz{h%9JjDIbS^@T5Dsf z*NxsU`wd=mTjHIA)H3k$fQzb6{uu+EA_M1qwcJB0j4ucOg5?CK8m&jK}@yf0vUFV^_vXB#XnvYvN99N=K>5khBkQs=Yhciz`Fo8R%A2DwdD`Lrr_2EA#Qc87- z_$3^~=#P1~&6%x2rhD!dG?$MBE?wE70M}|pqFoU zdjwS)tuyC9Yo62*Hyb;_&#Mx zuGJN%O(^E%%V$`fUz#myjc8WHTSSE;r9^SK(fVbF!D~tie@bB%*?ZK8+9YJ;z|&Bi zbY>L8OhvnCw&jD`y&)+cN{vcDJWs&+aZBs_V@;{jQmRiK^c(=qhQ{f{rxYGEnGqziAHum%Wp?e$CEj3q>DG) z>dkakSMsas6tqsH`NlDZDelu-9YAm01*vFhAwCn(q!B!ElNE>C57DpGdxvR3x8jX; zs>)UgjPB!VW9roEz~T$p?abu&Q`xc}W)<9yjUQ@O8 zzDb2N?lXm!m~(em-7yTcpS>C5M)A?BG8Lvzfw+|-T(D|o!@(6x9JaU0-%By3^Wk-- ziwUSlX6}|}g;^YUoQ70KrP;gYC8zz<+S>4*?*iB(%03uu<85qy*q*MHqdH-+fOZm~ zPQ8TrAgHk!zGa2_J^lpZd=L=sUdC&)#Djkq{{GXpft1qyzHyjhxUL;u&s$Xq&l6(Fa^z)36&brK^f;rd?M|W zG8NyN6Do$qm3Vby)lZu<+6Z{Wx*vyYHl@slTZ>5QE*Rk?*=}RG#@X(8FW-du_yLq+ z)3`z>=bp5-D-;YTYu}@C;N7q1aj>| zES)3*SO~6QBj~z*u)7QGpMj$8KWH3Zf9`vGU6$`Tn(j-P&`)L}uq0dV1CcM7?2h4o zfk(JjYvOFQUWZGR8{Q=6zxt?sw$>$$UDB$-=0zFVE034O7&i+4(8XABJPqYGl0@!1 z@(7#cHYfg3<-D!&%Z4aiMe1R)1Fp>=FH7*msfUg}Gz+omvlUXuTtAF5bs=QOWWK=9 zCJS3Ix}Jp2TD{eZpM{lmxBE?JyA0iyH6fy$^7$SIAHc{k8LN1L>-`IfHuK}bY@1*c z_4j!4)hvr{e6{DvJ{-=3clqNt_!w*1p#?DWk5wYW>}dg8Sg>{yiOgCNmN81Ab*+@m zeae9m$5h@_Qq-|D{PL$bh!MrT6S^V{lkWJ9@lVouF`BU9T6f$_s3QO`aY2!Q&juAZ`pG5$m5U zB^cPk6*zjwmWI&8y`F&#IXHO43*Xtdh#f8}B~y9#O@`g(xZMrDjyeD{V?_&}Z_E}Z zS9^dCO$gVmxyr{26*^U}G($P;_rF4H_Aabov=dA9q+Z%ElxblJV*Ri=xY9r6@&eYi zmLE@**W`DC-m#4OOZJzgNO(9UuG-I3%j-AJvGV_X* zsWS-lhXf-G$6*C=F!F243&kG0 zj?@Atib~>aqSA)yNz|-i?7Xfao*X*AXxjD*xS?a9)b8NmzcfA|6qhIZO}D790movG zc^CEN%-+V<$-nIg5UzHlLrp(&HCTtkvkmyL>N{bsiA!%6bJdxP2FgNg1z??hAS5vl z501WI&B9EVs(oU-<0;E&rugh8q$1H?&KA+wE}=mAQLhcrLb>tD#Jcs6>R;}A=g@O* zJ+nCBKPxk$n+{T7q(FO2&-rASIcZ8OTIS%iE$r+gyt_3=WZUV})@H$ro?++Jb0f;d#&}d{t zfUwdITWam>Sm4D^qs{d!w5qD98R&j%4dCAs@F|)15UlLhY;PO=wP0XmxS`-xdoL|- z5dy&Ty#*bAPv7`-+on1#AES2(Mor=?eLl44Sp)F`<7uyxW^ z)c9QY)r9+-B{(Y9IR1R;CU~4zhL0Tp`67ZupJcHnzkN=Rs&hKItoYGdJE~GxvMb?) znKxH|Z^&2tZKXqn%bF;(W$1peiDmc|-4@Sqoa+q9MsjqrNWwE>azZ?@K!U-rqPVT2 z2HpgTs-@J#L5R+jY#zQd4!-ULSEmMu(IwL+^A`0kszjh(M!W<;c&*ZnLEa}N7#a%{rdbH#}n5Rq7C z;De?_a#9|v`&sXU|M(0y6;0$igg{SxY@D=f7MvJOflELDIt);?w%7klKWj>mP*CvKKYX-d{hFK2PGDoBeOy0lzUBQH zjV8$NYBZQ1WRRjYGv<^IgI7r_YOl57;8UPy^G}+jNO{>m53M&uV-dYRa))Dmgy|R_ zSsH+Zhev>nYD!5__0T1~I^i^N|J@Gu*LMVlDnb4no1`+5!VdEZh4AaIOy)823)fHc z0FwP_fsDj`@B{6kbRWFLUgiJ7T=(Ke9~|hQfe~2|{wI3+Pb1~=Cx)3HON`?GXe3wr zgxRbV@z zxNCN>U+x4?US6B5}0^P(P`PTV z`|z)SqBh|lFEn5W@$Ul{9IPq>KcyY6qi>n7Tk%S6r8h=?a&xghZ#7w0qVssR8SwIO zZqgV4rlIEH!Qpmy6or7D@30bfV=TSBmJ&xpJAZsOXPWA$>S8B{pjKxUM8?JFpbE|9 z@%qbJIGn#8quxO+DjNwK8zr>3Gkn(6K389#*6onT;k7P6>V!70=QA)UncWu>=;+__ z@}w!UrcPhb7YX3@dIlyPCJmf3u==Y<1y>ou?Tlt_2WK>v0|AOR+Lxm0+B&-$je<*`p+%F~M@Amfgf2!Y! zi5%%7%(Pp3vS^q`UTo82bKf`eN4whvhSy%t4=pw8EQaHa?QiejZF>LRhT!um$@5^{ z`IL(D@PFX?SwH(qhPrjA|Kojs&0{6_?0TAbw6*#V75l4qVDGcP3-6m>`~Rx(UifD# z3hVA*^M63-|FXX5&y@mye82y@^X2c4A0&Y^oN@h%Rm23x`MjHnFFAgi2X9iYAolp} z{Ql$f3m^)yPv(HmB85(5vIX-HC(dv9KN%!=jh=DBwp8B`>rS5o_&B~L^H_cMCK5D? z*mFwoKm)%#SO~8l=Lbw)<&G56YqNM`K$|59OJ>RyU8tjcyFXJPWBjakuCx7LRDEN3 zWNo+f#74))M3adGlERm`f*EkzB4+HE^bQ;-Q7q8Q3qG0le)1hCjIQ`y7xSQdfTL8t*J+ zAm_msfx_wyc0t7kVV1te5>VY|r%@5_hJgy`O;o>2aBp@po%-iSuRjtU|I#k%#OnjU ziSXN^!cvbu_TfBsTr5AK^zv}Fu5$Qpe_l@3A7;G($#cA1LeeiFh~`F@`dUGcv(e#+ z8~hZ2Hw)=XAijLT_KlcW$9#Bq?|H%P_E5PPy$i@ZZ)}$se;)`hhZ&?G)st<RA*9{91g-io!m^ze_oy(44dQlV+o@Eh(*ASHQsv>2=Z z-2TvR^*MJ)cA(A`+v(li&s~Lbkj7a=uO9T8u=y=7GL+dn?Hy|OY$5kzhG@#~mu3nQ zmT*f0lJ4J%@1=XJTSI$3cfEiSkYghIE(>0I#zy(oW+)QI9O^ALFj=Y25cGwQ?-D&~ zlIUFd+MvmF`PXDUw0VcC)o|82TYaJ-$P#${M3(5h@~sk^%A3vlD*EtU+HIkgr;E+c zw?Eud8lxt^KT_FQyct6qh-xXVndi2q5hHz4uGbq$_Q~MRHCU4*-k+D-;&Zr90D6Zf z_6B8dqI{WMx!KO<+c?^sUf-P+c%&o1^{lLd><6I4<$0%4NuE~+z;ELH_^GfwU5dY> zqJLNR@A3xtlURF5i_1(WQbzMlr;&N8k1D02CP5peaH0>A{E_&v6cHBV6S^;o1OrUH zX%QS#*eccNDn>?Z_km!}C>eMDvxN&(dtk&(L1RdO7^rN3R}uhprI_JKHC)YqESOw9 zP={-?s zgXHRT<(a9`Y*XiI;c!`zBDVBEA|mXm-uY~lU#s0tr3g>dM2d)Wql}qAd?vMwgEAEH zOjto7pJ=UE+jMuW*&CCx;ztLuaH0F;)D(lmq!48~nx!&15*f&Y7`O}Jm8gM2Jo3dF zk27hV=33HMCLDQYCZ>T{!%*?xZ;Xz#OetnY7y5^3x4VmPk~{ssE!}R;f@k}N?@x)S zxj--T^Jni8JOuD)mI{9dmS?uLVf#G$6eiyF@M6F+;VOW8rt%RW(@zC*G$|Cl!7Y_o zbMDR!c&@bn(ap1xuw2t(ZThBSW@Qv*8xq}@m@^C`-{6}OCC#ZI8Z3<6Z6fpceF@=+ z_c-5j==6Ai*pqe{QO-zT)l914<;-*7o9FgsG;Z^ zn$eFq2LCgM5|p8zcZ9>S$lr5+-aSXEnrdN}h=Ct(L^PBrA{k1^$OsIqp%*gG-R^v7 zZ*;rkqkn%6lb>H290WJ){=_Ef{zElLqsgc%VHj%x>SQxi=jSM3py0qLv22x3>UOjR zrUkTzC{s~wBN^}*_nX~eb4iS{xz_EelS%~JgQv3l!ob4%4+br+nw`{?mBDQF_{LUw zA{wo}1I3QL^Nw%ktBGEa!0|1qEM`dUS056g9yjAw@=;lx?x>KZ@;u`%r2(9qck0rQ zG09{ctz`s-zE&JJz)TJ=y54ftwiwn_5noq`^iGRyiJo~V^+n_Ptn#0bS?H@&dtQUH z(msK;H};-Oy{bt%4>^PUMW3O;S5m^27D2Z!MJdvR#j9k>u#jVO z)H$+CIITW3a;d@+IW0t?3%qCHL%>qff}unm9{3kYb8XsGK>OoYwRpyZ(5~uk-0<+S zi5V$rMMA+zE!Dy{EpH+|6O)m5M!{N<7Iy0zh^C6t1}2$I9=fR?@F(kV+p$>rvj=kt z#Rff}_Op5;)$gHmHLh&IP7Jff!l;Wi=NWX=;_B)e`Br1`2FtK-OB;)mJ6Ko;1n~B4 z&~rN7PCOQ>w|#tFvc4NkompUp=km!HP%*2L}XLW#=tVM2-~B_)l4qdNooqF&%zp++~qY5a_)U z@J@buvK)RifjVtLwB=9B^e3g=vXSzl=Dwp%{Q97nw3wfyVDAuUE>#|Yx(PkKzgPp1 zZA4H3_ug}&Jiw=HVVG0)AKoZw%mIG#I5zM*ojx(%d`ZRd*T*wpGDx^#C;jJ%o%LuHQ&qlhO@XFK~PJk z`(v3t;$F7%?ZN0BVW2#3DE$SRo4=_ns-NKF=eHR2QS*->&rDp~xe9M;ERkW)EfgCo z^2hXJ)@Y&TS5;95S421MDO`CpS?V3cedIpkIo!PhgSHMJK)RIZ8FA;hT@Ok7`xRv8 z&ToJ7nUkvGEm{Bi>g>o?rOfNb$J(AjKx++JqfWKNA^6f5fHRG7!4+Tqc3w}T1Wij@ z21O?SJx%Mw@Ply$qkdotcexO7^N{87%@lIBZo*XIt%)i{&)ghlPilI8UP6LohVJuK zu)fg5!U9sZCgAyM>`dR6?6`RFtzr5c?I|4O888H=TJ2waD}1;YsGFF@`EEL9RRvTi z#@&)o5@0u)8eJ*FQJuR7N@(Z}?0o5OK_8XB+2J?W(vNgSj@aKH_-ybT#bUC}U%PL z-qq2ZD>_5>I|2P|TQ)r+TU(QI5dDD9#vW629!v(GYP7YeJ0nAL=iph>?dB=37JI8p zNc<1+LcO!6R?jWUFgdxzc!1g$^;n!AWR5}4_U@3B*UrLCLcm$$6c9z0 zK3V5PR4?a^AtKH8#9C9=*M?aIe+$IdHG_6E2t~5`2tuh3Q!neqYKwXL~U;0AHz@*?fQD-)ZK^6yg!LT9;u`+%@pGaS7*gO^|J<;&TC+25c9%N|*6}w3 z(@@%}^u5Qq0^sPje@b57F{25!!QP%yv(2w#d9&?fRH07KP?>@eqCM?IQqf_7&W+94 zHgm;0^L5B!^JF|Sf6!xnh=MO$P8W>-+c3wm`2X#X_@*n#;6jAV+(<|Dg7exgkERE1 z&eze?@v+f^fnLnK$C8|UpQduRw>>hOjh-4J5<)jXyoXSKmS`-%?KQ?Ley8X4i2)Hj zbC~s$WQFxH7PQ8wHX!c~eHZ#)*Yyi;)H6mO5&=U5Gbl6tc8jTM;2!@(p30dRcWzs0 z0ZH=P6Byn(-2PzEgj%P2gT$#X&B-tR*kl^?m(@xanqYbejiriEot)<3tdUp--%Srv za*F-g)rOS|U+_)MX1RDeF+imrT4BeO_tJL|Mqe(^L$p7n)>Rlbe!+9AIAh{t9M+8PTx<0UHv?(pOn$+K-GN9*R2Mm?=0Ns=?O$>xguqz zL6whZA}KKunqy3R*=CllV79qSKlKE*o%QqsUrF?x>l@=TEk?w?lOA{dkS~WVv7hfAI_5q<9VA@X^6F z<>dM4r!QTTL`%J_^W?KP4i%QUZrwp4sn?uu?OpIf(g;xOoJ?au2f6^-%lmW3y>sFN z$fq<n}V~OM-*;U3? z0xzV-RnNDaTYBGj$f%PA+`x9{;I@bYq-H#hC^$A7F=_L&(@5PAqcK z2j*m;q5cr|n5w>hZtOAj)Rb^d|NY94Kf67Tq`{Xi+MhgDl(XcGCk~9 zz6+-Ry)FKRwtk>oIXMor`7ZC}aC-)?MncUfrX=cid1CTXs|EG=e#qxtOIy6;3+@jq zXt~QSF`=*#a((d@IyyOv0{#|87~goJey}mX#TcR>chmmj-y{_*7RZ|8wQ|bz@p2uQ zNu^r&wrULk8jsyqCmeXi>Ys{5U5lRX0O*}u@+A`Jt?ZlyPT?rWD}@ZyG}QBv=%e&h zSP(bOjwSzwn@<>Ep5*|v2tAXe_iq*!hCU=+gh4{f&%RJ3!Yg7Z-1>~=Y6K3y*epgM zm7ADm2mh}(d$sDPPXw2CrU0|y`GoaJofT=3M1=#7QjgPQwl5NOwuw7+x){<_M3!v5 zPWugWG2G(!IOTotgIQ*`hca=t)bjkF?X$(dRGCMnS_I5DKv-m?@Ku(k#5Dwaz7ZuI zn!Vj3b^wswfbPU7Y`8(F(`(rq+BwvaDv|E-VhPi&KY-?B=jg3S4Ap}7yruFVEF}HQ z&i1&-?)#iy-;NYf@k15?t+G!X)OftqTHw@5XZ68p*gkl7Nd}~G9yl$ZfZ(UJ0uckD zOC4ByywOx3n}^-7i$)$t>0cCixHo-{`%QDpPQ-!nTn-TxnBKStlpxi*XzvFX@i0ml zUtefA>s9tfjonlBC49Pw3PYu^?8gL8SHF02b{ zXxU4m7dWiX`b*`x(KX$5VsJ(6R?0oIKz5zA)o#DouAeon3|%|xdlG~93jLExDUnbc zrn7%l61u_OzAzxTLO7j_a?Ox`8;()h(t2nc0Yc0Hr0P2s6w=vnmmbf#@Y(1d>OCXa#LT2<$F$*rh{&l*8eh0?M)>apLzzbRmuP%D{t z<4lc6)$InK3i51k%a4nW7BJ+Fz#PdijA+1s1vU4_K_4gCJFqY8 z^rfS2Q6L+z2a>%}f9G5>^tZ41JW+u~GuRhgG zqr(gC_MIF3-P?iCAvX&%y9Ku$j7wqN2uG zM%ctIexQ9C$BrZ_loILl4uI_w~Rlw_lyZQSiotzQmB7aN--q&32WPCWw>yT{HKGzA zan}}Q;_9_)q!_;cr$GAWB=F+-`lCtV378<9N@A#?rq%@kaLA_u=|~ed zhkmn0vo$dff@!O{-=wB}lfRi)s#!r(YIFw18jcWHsW>U1jE8`uaA@ zD5_=a{_!54fQ*cczMC@!q963gx?LuluLEawn-&$QNc4*jLT%iL(t0x&=dCU*GPbyT zHM;0;(L-aClgN8oT{sjQQ27~iMUj}bo=2=!8a=Q*Z_lvjSTp?yZKxdNI)#CSu0za9JQIB;BTEQKTBqyKFJ_3d-(+x{9jClfZUS zZ1r?=yOHw1#-qY{duv?YJ>RD?Ug#Dy9i!FQdcAV2!rcg@3Jv3*f9&NSpf?*O0gf@E zbU*JI78Vs5EKgVcaNql;(PTq4WC(ZlOHvUy7hhk*x2?BR5o^-BJ|WqUuE4HVKSA^P zb`4fvUtec7q*1w5iwyx0{PgBYpf`&BGb`;BMZcv1*=)O+MaRRXe(sRfMh(f35Zizm z5kh3CEcm0V9rik2QQ>+Ha~tjdax~B&)0a;EAMq%V6;AVJ1_Wd7lG()06d4QYA z@dv^&z8TtbPv2ywwPixCpC^6EN_w&!gDvx5sSwS*6+{xVsi#)2Lxe&kOnEa?zTZ^U zjoZjKzK&?heWM*77B}T0kM#z()@(s8xMD+%{m}%We1nR0 z6-T8`8(Li*Ls*BCJVy$H#iUT9)zErDBOi#`(~N}@>|ILd>*Gnt4$*md6iBR0P)5Ze zEHNe~=G)6XMX7d0NTNt%l=lmnk?E1%;~P%i8H_X_phDh8yV+Pd;m~Hrruf% zgzUK{4E6nIM%R6>Z+Bg;jmpg3pY9+Y_t>RLS5NZNN6L4=F{N)Ty};$ncud0P6qh-@ zrc{MN{OIEGIDs4f4l&46Wgo!lN+nsUyiJKrR7E`=q;AMy+5{EhPq%a)*RT%6}-rM_|NrS>TGfl zld&LBEx%J$npkaha7=ivG-Ju7A=b)D^8Fs>QrS~cmpFwS@ zLNicT07cgGfaE;fs9G#(GuO}5{x!V5M|xML?MljHm3k@~fQAO&+A22oUA6k_s_?|A z?My(*S+9Kp`P2~6pAN$ko8yfpOoau7TKo6!vrL;!hH%B)Y}jzgl?i)w5dE>-@5hnk zjShd9=sz<)zIS$V4Sh44YpgR$(d6+sVmZJ(*gp$a1Fkj<*9E4lw^OFsTAfXWb)thm znNu?O)Fu~JvRUhr;MqQJ3|<_k)&=w$%$7kV6fq1qp`hZFC&3hin0xQri!FOho#dE?}s<@6`#7(oWSk?(J89oI8n!< z+gt@yCHfmSu&(3&rt{H`)UwBiS)vnM2izxhBfu}4_0F17nT*P%{faYEf+?ZtBs44# zKsmI3v%?%39{#)Ry;9oY;)S#^!A&;v10Vs>UDn$mv(Y;m68+NWuC3SVe)!YPkJ!^Q zwGAFpGtCbIIu07b+l(bpieEnu!r?b{s-HVJo{1knlU9X=(Vdca>TY;jD7voot zkAf8@(h^NqPh}1L7%eN3_HMq@stRruhei7m{#J96ercye{n;{ndaEb*xv?8wcNdoP zF5l*S`{t8?-?nyi>^2}p+4jYcm~x`ll7fU{>#5bx-E_K%sLBYkb2_%bp91MubK`@f z%h-KW)W$azYE9^u(ddj0&Mj}>n2pGDa+ENiGnlH>M)cjo-~kPvgX39QO~#D3we4-l zf>6g4cxe~U{>qZ}%j08|8p?0rBH^cPaj zn@iIk>Wn1I&1{i4Dj$*qJbXH3}?IDZouH# zK;z>nt)lEntsVb~UZTS?jVf%(cbc;eUG z&t;{uc^a+h_N57a!Zvrum(Y0-)wvu~Ct;N2*TXcYiQ2c_l%AhSX&`jAgsmN?pGV?U zNXAt>i>27%s7DguNYsj)&WH6+Ew;gF0))1|hYd)@fjq|8iu*}Pv1bl*S3nX7> zp%4iCeO0XaXSTo&#It?@Z`uGsravK37gFt91db^Sv&{=pY}SNK%&oIAQ?!!EM2C%2 zW3xN`D8_aKYe3Hkvj=tI?#(S<&~bXHl~4ZSsG7`{GW@Tc%O>R7SC9}5K1a^~ zwU7J_=Zr%AVNVPxsN(@cgNwBlV$B70AG{?Y-8a4M<6IhY{beAJ08=&jUzPU>nH>); zqOEjxRCu8Jyr$Rq{3BgZ5g5>s9t>YFt49GOxz^`QlfgGbi_b^M3T&)WMSJtXqWb#K7<23jk&x%ptN!j_qv!&&b_j;Q|sX9NWCiwSXltxp>CvN z@SN1tl-m~e9(L#6XUpADBaco;v8Ua*SIUM>&<}-Q`OAKAz+DG zi~`XzfXC{;KKoR^Dwr!Ba!p8Nr{rQlR00RS^n2A&K=Sx`;a%48pFo$LX^)EIh)sF)jkINgf z;pja^5tDAu!r-fxmKHL$4uUlf+py;hSM(YHP=u47qw5!YG&NA%Apy=kaJ$=Hw6y`SYjR$8(7TT}j>TmjQDEmVafH(uel0NoxXdU57GB z%VQ+tpJZk6;COWjL(PvydZq@Tc;DeQf{~@|?HOo+X6Bsx4wwBS>`S6{OH~G}WfW9~ zy`*J3MskrG5=jRQ93Ra#6n6KvA3nPRX*ZkG)R$9W@Kvgz`2V^-v&c_-p}JRliV0XF z+@Q;QsI_{3B)AwZp$aLfhDf;*K&mBoN#? zUpHIM4^B*}h>T?J#oDOCCHS+T3G1j5DGY@rF3yJRdtTm9zqo#3LZjhNYt&FIE-Bep z6uCCx_KSK3aAPxD%kTyUTBWfhnPNb{|1KLW)1(aN5I4E7aJkpzs#9lS4a{2bDDblEg7z4XwEfD>odAXP75T#i)J4n>(RcwC zB5ig@PYS#Eoo#Yr0aZ2{D`t#o8+9+U9gE8A({;!1O8V2?%dU~R6`A`_XGk{_uE7sq zOK3GGzw3kj*qy$Itero`n}|&XT%AB0OrB-SR_YJqa^;G*>5PS){qx9Li(Mdi$ust# zU`FojAWy#SFjc6g*(5(a9I8u!W~Kck#>;GAbuzecw8dUuGiDd~a8jFF3rVv?RS9Tw z2|oiqp`}*~v~IatQhbfo8WibqUVBRg0awC)Zx0pRb22GD{6@xFLvq)OA(qd4?XYM$ z0$+g)Z9&I9gH!9V^lYsT0%(w$GcZC3t%)Tp?ta3tSUjMCAAHW$;c`-3d zTAD>?vJ&1-RceS=PJbKd?WTA#Z!v%kG7qM}u*4>6Z zBs;fHy@nP^7O6B^&S}-MRcWf;y!u?D7#SIh+tkNJY$OHY0>Si~G?1h!N~=ZsuY-Z1 z68%{@5{Zr1;KMw6Z#^UAPj>m;Y#rvKkAs-QP8Gx!9nnAPIbTmIjH}XCzmsEkioqe1 z8pgX^IrJ+K>iWE2*o?^nP3L8i-S#M1FI}(hnAjF|eE-|OFa$`;^`3wqPe+htd<+d) zF7qAH6i;OTedrO&*|b4>WZrFM#%0nX+f+n~@x!gM3*-2F&VEz*#Em%{Q`#z!L_`XnQ1& ztI`l)r>SExdobFNZlbj=MpuTn<#>mk0G1?J?5}T{o{lFje`Y0+(EFOqt8sCfTa~%R zN_il5JQAY3;r^Gxba%RnZYTmj-1LO|ba2`xCN3@b7A@rU zdQ0eblyo#?CUcLEhWzFEw?{E^whGP3Qn|W`5}|Fkkp|+J4J20N=Lf!H&nw;?QXB_Y z!7x`0OhWe0W#w8^lS0Ea#{^`@-OO3>axHf7p7$oar0)=xL)^8#Y{k-9K^q%*ZXRw1 zFP=HM+kSl-_M^L#__P+hu1=C&C=qU%?A|D$NaqRT?Jg{c+uq!`m)=jo2cpTqGH){T z19o0Pk@WLi^OJ_9H+Ox4&039Sh&{0$6YBAD80@@}v&PfK7AmR8l6E+Apd4ND>LlY* ze|&-N1hS&vd+$#Dj9V@LtHbj+i`Ux{($h%+w#NCy)YL#@%QmJ@)R3RFrb{A@-k?r* zfb!`ldI5Ap;9cN$+53Aup?dSh2Hbba<;*TSKxA@FAC&F8ZW&^zjh3kMdIbB-9>)o8 z!L-MMUUKv7?8!iYF;&w?7a@-e%3X_})Pza2{afPUY#nX*{ih8W!pfM_`79!^Arq9B z2P9W&gdUYh%|_F<7Mutf91nOyc)BgoNBKS&_xZeNMQuHuje%2(_u5SBRjN0^tM1Y6 zzBEyVb-59IdVbQ7l37$+r?9#02Dq=(y9dsS=eJVC5cd6+th@OZ;5`@MF5q#Jy4jf) z_T2NS^c~SX44-y9l4tQ5g;SGx;BEhL4WFkg?MH3N${hN7x8;#aUO~k=X10{v1Il7JThZx9RCFtaNm6_LQcI4*(>ZL(eB`)dG&z=)i%?%VUA1 zN)7eT?F9sq$FrHEy@eVd0bmPOxHDrqgQ6lqiiEE2WQtm5E~!!thVbpcc(y6O)gV*;Nm{4HCN^r3tc$4!ymeMZ~evtPylAFFneWT~l$3J$<9UxbH1?9S!sel))~LJbj}Mx9C+CK^_lEl(O{ zQ!+WF6hou4C^Nr5NPbdJn%@%i%G4LZ$IvhpMip9Jt@2$-j1Tp3+2z3I__q%ES3>-$ z!+94p!{Jp9t-%1!_TepP<9-}gi|=SoM>6U3ZpkU_R%5yW4JtJeTThDIzW9eg#N65%J=l*{+`_Sl)r zLo&R!-OlG%!Y9TtuhZ>Rs+V1t(-3`ZL3gyrcnmHf7&&Svmc(+$Pz7#35YSOkv}vPx z%J7Wr>}|-OF|*r{yu7+yENzZ#__&|zv_S}G>-cvvkw;81c)SoA(o3e(*|7k`fSC^u zS{<)bZ_0>^HUvsP8mZm@@G}!X;^s1iBBH;<bIVVbO(J)Op>HR#XTe$&Fw?wRuuxN}i0Ul!MM%S^P@_RgF4 zbxu$Or*gKfp`_3kGY~RXXod$YxV6!v#W9q;t8b8bl$sXU5N4o}(aSBUCvz#8VQ=)X zac!`03Sm+&y{#xRne*QEth>O__&ALYKx+Ej0C#51fdWBD38vV_5%*gcX^+70O9od* zD*X%iBM_^p%xtKcJseLe_Q`bc3(F-u=dUcUPv&y`38j{-9~5OAbAy4z4HoYv2Z!b= z8li}bR_-`1ml^Y6foPj|pXV!qbBa2>oS(0FW0YlAxA%RSkJ|#<1nE9MZM4=C}j7&cHm;PFK6a6UvG!7P(fObF6U^e zO66Rju5P`K@4khLkW(wY9vh<)QjzhmWoXE)#d2l3fne)0GXLW))}0WUpKl=7S$JTz zF*fUDf|}4^PeZ9VO2O3eC1szy<;m!vEbQlr^CbW4I_TEAk|1I> zDW4sq@YWUJ;^BdA#$43!`cl=-y~lPkN#bu#q!`pX{WOup4R$`wOLL!-K04b79|7uQ z;D-JTEIDhAPx6So;Sm0#!iGCzcMS zEq`WZK@jH5OJ3G=-zBPKKQ>v1()GMbPsmOozSNcUL1-5?I{oJ0V%KLLe4n+}U=@5X zUGvD0olL1*e5=ZJYqDJ8yT7ss_ztjc%U2((JhP(Fc48=?(D4#%gxx;Bv<>9#`IP#5 zG)*9eYrTYpnAn?kU&k+*+A}$72P4`wl5t+!zwXY zh|5q7L#*nzqu@q5+PnDL8G-cu*>mq@iu&L;W}byHVcx9|6%7_7&l<2M7^jAXD_-x-=MXlji`4S|K1X!e@1fw-CT)N zZggZ~03ok$3)oo0j>;5kJ4%bLPD~)M^n1zE3_{u1q&q^qb1r_-3i>Q?3F^55VBx}}!w9}8B#|&3Wf1b`BO8gppCGwD#-mI3-ER5XY|0G((#GLO zUXxL;u~v1j)#JrPrOG_b72Ib@_%~^h=wC-)u^|j)pXtSu-a--Ii;4$ z2Zlrq{#J`V39unytI{RGPefNPctOJZ3Qzxy*4e3~`zqarvG4p0pEjTuU%RU&WQp5+ zR_esO&_iUsFrCFVS`w2E`nL!YOQ(nZ--(VMALX#+i@a$u@ot(GU^Hn2f~6|7CWQF} z5UMHyZbF29RYwLro#r}X8UeLLxf-czuITi$#-p?Y14jo({`JDOMze@pOgb28JQl0C zzG8wOUU;h2LvXv=1Rip!(xjp-GUvhX(}i~e5p)BP=7`SYOcCYy;g;7hxiO$=F4mf- zNn=SR@(G(5lja!93kzW4Y%HIwiRnYm&Jdtp+!ufLEIB3(C^|II3G))pcP}FNc#)W^ zwPAQ3IaZ}lAoEIFv9_{pN>L8ToZiY3WQqkPqw&gGQgFRGHvn5pm^P1ymxyOdf*#5i z%8rZ3C_BUo$oWwm zli=V8g@<6xiZU~+p+wiTZ~_IZaeuOBR9u{cFcE^8p(TjydcldkUXduvnr$b4S{9ACP)aD^~)!BY)?wn@O=<2@VP_y+#nk4O{z?- znyS0o()=Ju^n<3%pMvKZgTlU^AGFUY)}~E(Sln81abyss+#SsVTxU31>_W3AY|Op6 zoM5AFz*qzOghPSQQ-7EAD&zy%P<;dh`pm8YVXxK#s!pDW&k|jm(^Hf#@)DP#3E=Jd#eczx{Djl50|eJCnT;*nY(h@bn<~T_!oE zTYXd5Y=Dz4jyZ+2?GvNT*$geik+bV_X;+Lz>qD{K9o*HyA_2!twSMhC?3Syn4Ua8X zUnqvsdgF+1pz%_SxO|8&ii}gQ#hA>?i6T#W( z9{aF5@2l_8&||6)YnjF9C4p>1xtjegReH{7|eE81-oG`u1c2(4Hw(F^H|13 zzCAgKstd8UJL4utv!I&UShq1dl0moAy+>P$&g(D@XW>9bJHTQotXGpxI;oTQU8N_e z8I9i=b(g^acqOn0g2E~^Q>yVH$vX#5AEVJG z5<`ty;4q#p)3TXU%KNDBvC)}c+5wO*j!(ML$6Gt@&){@j>Y@ zJ09U)Qz_q3_X&oA12vm?-aReInatqOs-mzMW23`Fiaa&$GMX=(VAC{y23Msk41`ym7?w--A7J=|oft12*G z^Ftb{cc|4#&++4ucRWzpx`c2aq^m>Hx^SX8|Mf9>qa6cC7c4R$o1zTaKF4;oHu19| zIVgq-CiamX@=h*nxaQMUue|{ojUPAE)oWYiuQoIB?%RJA)*mgt4JyKHbUAcS+Y#*f zWaRwosW!S?5sQ)|bNS{Z{2%Fig<2dT*9DEQf(9YBQl27Wg4f3)wkX-TcZoAm%Q|QU zFx$RfpJj434rlW4XNotjvQ#LXseZxGE013UO)uN(I=-On)$ES$zJn$MiI9Asz+nZ^ zNDi&WC818KD8yz0_QLRr*a9Zk)-Whnr~5AS5EQI#^B&i;M!JSjp zdfWA7(S~;7?*ds|K0;xtT_V!Ee$6%r9goLwzP?lX6NaKJ7Gwq*90do5Dh}Y5cPrw# z{_JUBudb&iMX)ViCbB#QKTHmg!`a8C##NcUE~ofjp4{NzflhBquLaf;m&X>`p(5QH zn_NdoBSeCwN#uxzFlJ54?Jaiqje_~)VC`4LKxYip0k4jnObm zz7F9{iF?IYbzOeA#Cw%=BQEzt+{&rR<*p%O)&V%%nIp^B=Zdf$+H>V}McJ501Px3r zMA=A4@Qa) z`elQuRC3%F&P#2e^-|n7mY1L!myiWfUU_D$SmeMEz>hQ0Pe-{_%^cfIjLmD%?I#7QVP!UCP@Ye`y9|GCb)@>Yk73f$L7ll2FXV`#p)jM{7I9 z;hnwUpB34=A59#F7=LHc&B;D$vPIXuW##>|k5{}V*Z!A|klP7<_5v^Bjyv?{zxB*7 zsI>?nKyf*QGiChW`av(a5)jIp#meGo*&_Z|k+4$(C3T8VcNx(?wfz&3Uk;d;HM%oKl;0D)q*sI8K13wt(W4c6n#~<(;(Nw@P7fE zBqOuIL5jLSgJR(885tYzX0es5YDxC2EJ4u@J@eW7OA~@_Zrgtb;#cZGwg0gC6}1dd zAX_{!#-{0Al58~uA95EdQ0Js)V}r`eQ?Am6FIPbrNxfFA)R9q#(2^V%5>es7S%&#e zRXz~9AnXPdXbo>&NQ*IuSQ!ZMaZ$%RrqlM-#6LDib^0S8X_w!?&uKffGxXPI;2|5ecsKVH}r;uEY98CZ#t`ao<-@$_DO<5(<2G_ppb)N}Ss z6!MoQC@cIdTW)qQP6DowBfC|M(JR=vxboX|d|5nxLEj47*WqZI^jMeU9K_5@H2cn+ zr^Gp1{3?JzqOBB6c5HG5^_(`o#h~=`71QFS0mXO^hKGj-^|rFs>SRJbzVzSu^4{{W z*6x!`n?s9KsZB9HznY$vQ{s!Uv8jcuqW<2{ZZc435HNWL`nS<@AUkr~t_qt!sBE~6 zh)b3ag671M%Fr@L7wYukB>bsAY+Iws&(sDZm5T;Khm@k=V{!ij+Yu00A}x6Pa=)}D z)UckEnH-YU^KRk7p-EnQ{AC~+;b#gy_6415y*4o@AkXWIH1VRwP+{>AjAkLw^cg%< z9@C*N-0H=Y1BInhq{VT23*!znnnnkEXZrDVgiJHpyI$NFce1!Mk|V~r1W;INppNvs z$Cv}oza@XCTkCK;r;WcnSOb_Gm5whBN|A1iJH!mH$y0%ObGzLO>>lkNokyG`DYNqy z(2L(jRN#+SU>*YYS1gE;6ZF;_qAhku&*ftLzC&TD|3AvUGOCWK*)k9a?(PsAg1fr} z4Nh=(cb8zn-Q6u{aF+nV-QC>@9+-yYvp4hJ%Us9N5D@y3i*il zi6n?WhpAEP4(Q;Nx)?Gu z2LWClK#cTh`%SJu8NcHC@4b-rpxUnAh)4(AgAE~XSwHDK4u42%?_wLq?$Ad&O*I9d zbLL&7a$@!EtuiO2b=9O04>4Q(UAGcf1tTi zVRE6all|4<={+TBpJ1acYXLH0@w=DmcO(N(KEEre8ZJ2sh?Oe)zT~SbOGny2b%BRDz=sr5EG@A< zPea-t%2+Fr&~FGrqA(Z;#ZC;O#o*Xmcrs_Hy#2<{(9H5uzFMlxvf;Z_*~QZ50%g}7 z#rK0#3xGy7a|ivPV4i9m6(8%@;-SSsgW#zIPp?`~$6X=VGg&ER*@Klp4v|nygHRii zubaFOa!ze+27IOCklEb>14gQa{C0VR)P&aqECmiH!gsXQCzX;?4a?)~lI|<^E&V;re4Y4ut>+phbdQpj^1DMjqmU^JEI42(zO1YgAij^`r;(s#M)~%1^FJ!zK zoJzTYojIBFm7LwQ<=_QTN5KC91TW;qcW&RhkgUk`-ZfCGJrO*!BJgA+eW~Sg-WUkS zp+Hx}4)ym)uzW(eiNyDdU#x>NHB%}LR4*=WNKXiKrB*TBCg$|gZWVd{BBL3Gb|lp7ocO40If<|HZN5qY@)9^!GxPd5-(^Hk?k`;L z|0lSrw;8V;Y=k8Cg~ubyK3{Y7)7JmcqUP{ft}{h(X=okY$<3AHdN24>=QJ24D;`$Uo`pf~qC}%O28b{QeS9P_zF6jp@l=hB zQG&HqpK3~P=3lJ-341J89WEiU1&01AS~INHw6DVVf|-{=HI^ZL?XG=s9eD?1?Y8$j zeEgFn-fbvKzt2E8lZwjmiS>2|Kjov9Xdv*O2__~|W0!cU7)`G>!6mX{vRBSoS4GF}A4bOKP3SIBRhk1U&Q!NIIwQh&6leKjw-G&?^6}YEiFxnX50! zvV5e5$ljL)2IP{$suX+ltjS5#K86|BCYK1@OBp$ z{%6^Y@swp?GHZZMcvWOO|06QajrpDBxTmV~0R+CeSlW}5pZlt!)g3wb^b)b;ANngt zJ1MZ+cm#S~IyVRGPK~nKW+Z0|w|B?$A%Kwc`*`WH--U^cA)Tj`7Wl9+AkEyuF~yF~ z_EkWLx&~GM)XpE{-t?m{Au`RtY?3hq5h-VYFS}0Iyv8D?I2`tp9y{o5vJW;CfF{F? zgNISxaC`*`WyY;W8JtVZ1dDqNj0(-S>Gesh?h)Lol|f+O>xZ-2;Beww1p)?nmEe+I z$*UbBKv22EGI5l8Op$|07ko_>Bi?Oyzc<%VaA2ocu zc}zoc)?9!5QEItd1csZ>EzaR*^GD6$c_`?@hKJo}5Nwk%enBho&t@*88SFJWy!aO7 ztvnK8H>%HGIcT)Bw3yTN6B~ei4>*7FwN!-}_p z{&uag_)FNg=G0W>>lH1nB$i*F*JBehA{zX`6iem7rv(LTHumkuIs&YIkI+#d!aH$Y z9)O<%5F_TQM|2oV4X*|hFR1&O&eb#^4f(*zb?iZB<#ZFe!;*dw>d;^%G z6m|v(=sP79Ty9{dA^;wN@?uxX5x$h_GJ*hNzk$pc<))G=V8I9#p{fg9Q`hz19stb5 zU_hB*Fko*5ZwJzofUn^ZD4gtozHvN**z=!$QBMYDtvs~|C+zFrm0J2=+>5`AP9*$B zf70^55Qq4!jCghbyeUx9(5>n>e<(-T3nH*GoAm2H1N!s5Mzhk_#o=*15p)S=+>%|LL?_|o0 zVc9bRAc0p{#_HvM0EsC!{AgsD3`l=WC=7gy!i%#NGol5^+^?)KdOp zKp=z}YyUAz=#u(?cMHs$T102I&5zxkNV#|eHX$|+C9q_L4K9!x?USOSe{f{De^Ah` zrpS|%lO#YhgoJ$lDCAKNA(M_=27hVU%Jw4=QoLYhS@Qwg8ArL#G=wZ!4Ffp#mw>&0 z%E(|lSJ$!)PKffFM8TbslAy&>fN~?n4#1B)OhT7(LX&+W>*gli`IP@SYY?IaB?N!g z@L{wA7Z#AH7ee04V^-S)yT2mc0M%SsIc8?auGV|`eB+y|^f?{lI9jR<*oj@bYpKei zBk7-NYC?VMQCgF;1fIO0mtd%j%n1zO$L72bfWgr1WHp4meYn$eaqu>Tw)c@w_@)}l zwU4D-vA<@eS*G<4Lq`*|M57bnk(C20Md)1+5oHheCc84bbikowa$H^7Tt1r0q=3wE zw^dSY$S222S*nsMJ>7knR(8JtNPxQ=wg$$_y4yz=w1j(Wn6WmHpn*jQnEMn5xZRhp z^QzST24*OL9Z++D+Zkl6eK&6NTGFqh9Kf;JiEyBom(6(_)0tMsL1$guOAS{y(RvEXOH>MxkXGBS*{yPGFT=@BS# znSfZIaE;a%CT8l~$;Z6#>eo2sO79nn9wlW-v@c#Q7YE#hj(#x^oihdksNabn!5keG zO@9fm#`~A3zbtOxjUpj{oLy2Wa>M<54@y`|0$qK;$ym3x6O17bn7Mu~~ESs|2CY_2Vn@|!e=1uMd*~Q3o zgkIYC$`>jA8@@;*yEfaNKuE*@MOkbrF4Yu3u=CsS|M|y~gg|C!DVxR2fWvH?P(!qf z!eAtxR7W&a(FN=~fN1Q*ztU2`XlysWEC7azg|2D%^rXl0YU2gM z*?M!f&G~iDp@M2SBN&zN`#M6b?^L^emmy+Tj7^ z;Rs?s+X<8~{421>63TR;Fk^^-G)R8DGnpL=il%shLvy_jr3HAjy__;(C6IhcG@I4I z?{V2L#e6s>W;6LlfVYF);gJM=HS_^+kxabNe>nNAS}h;j*Idy?2urW!W}Yx^sXQu! zA0+xK_uCluNDR~-dDZ}pJg|m%)>U6jG&C!dR`q@YzGjB9cmk60Pua2gs*l+4H>yR^ zNoV!sYY9s5!(t8Ul2}Bh;L$d<5!=>I+2X|7Mir7@mF+&k-pW1^WhPr8Pv`=Rxwusw zJqYQGlmYlIcD{^T>oWujI7nH_OGFF>l}WsGG%&iwEgQ_d3Q*COqoo+{;d|Nuw#G1} z5fu;lO|2cP(KY3%f6Ui2Kt6E!fh?#&PgmxAmrg4oXHr6BHweFjDYjR_OX%ITZ#|nf z?*|aw(DyHEO+YhT9p=V7?0~`btbNr@2hwAkO>$IWX({!A@J!r=F$~J55Fe^hK;UR_ zwqKXGum96Ki!_3bi{f(V2Bt*)i!Nk~{UP!g)wG^%{TNW z)1d(^d2}R6Ry(i=un8JeI}Km)_y`z)VQ?hwv9ZBEwG04V+t~v>;^su9Hf{fIU&B%6 zp?nQX*|I7tD~A@wsNTWRo5Tw-*Gv$ln4hx=m!3&eqcbtSsAZ?$9VARD=Mz z;vGf0el;7_eoz!5f0Uro!LM=JAVafSSBh=DWVL+p#JFz8TG9`3H2`ghscHNmiGow``k`lYoNbxUgs;28?-2INwnyQ~bSyzj zmQ;v-0=iW_6>qmf2w*LaJ``JkRtexA3;}?J1q)sV2oRwK?LZD#L!-E))O^z0kTfhz)HOi-LI+DRDwYxHgC?0?lG@eVZygO4JPBk#QybM(p88GvgG?rql zb!H6hppws@SDgf8ia&?7Ir7;JwYsU?Dq+X)aM@G>m3C$flp?3Q0NIo1do34z zGBsB7_ZGFBHd_sAvd{|m7qv#}9M)u610b)ey~pUqFSJLw%;%T@?Unffq;v40sHMNZ>w^DE`f1nBj|!%6K6#qnc60+g$Xh5ZXbf z9iP%TFif7o?tdMH%;j=b8g+Yr4Q~c*(kE^)!^EDB1jTz(h=#~2#c%pz<{%ksJ$|b1 zJ3|3*V7JTWe{s_M^^JTZST{}m(x~_n_b>RxkKSvFo~wdXWi%PGNM0nwz^oMPysTHE z{U_kTN<5;J#sl=@rLA)u-CVFFN%Df{?)UU0AIlVg6#Pwh>UaW7^B)Rx{(qw|>UqE& zOS2t}=YWD|#2p?E09Y!8Bmm_2ZJLB$#@hE#Ps$;0D4Dmnwy2cSmJ@|g)GU&GDF3Qc z1^m|SO<;7klT%`d0Y8&qf3+>}z-n&k;c!DUY0{qa4XEMNxlU>u_AVMfTrvqm%0G-u8h6=VhdpI+u6OQubVIf0JdM&-SOVh zMSobsKW9|q7Ds3ep!%;wZW*`_of4oG!|r{r9yy?s_AjvfCqtropm?Q;QdWF-SWZX<*?ch{G5O$&NOXU^{S^oH! z!7P0Y75?MFd7)ST>0kZlgI_&q|HG(%dKvc4zr!B#cQ} z-DF@)MO}fBJ(M!qjd-rSJzYh@#59;R!e|D6Q=VbdFI#6?wT2!2k5~X6pj^OMi->tx z`dQxefi2)|YI$fMMMPKYF*8hxv5hXJgI=cYBt8wK952Hzln|!}*y6C6;gWbu@UC{I zuKmtbhet)q+0Ry^+PajKs>pf_-n1twZXO;f>8|rvn8}E|q-qX@IV?5C8rl1(Snm*x z&B6&h)wN%hnbkrzG=8qf=irzJqPZ;XPl74$Z0sEcOrk`zOYLe)B}_26d0GeJXQqT7 zoK8FR^~rP9TpSf*i#YM&=-VEzgi7J*pBAfdh;tyaqz8>kj7()V3#8n;)Ao$|mRp|@ zIeD($+}|7-IjPl>xjb}ph)Xy7mFU*t?UlsS)rSJm>m%10&sd9%6(wS3hCRV&&6u{m zd>eA%tGB2%;PA9$8ivIOPE3GjJk4gfbERT)X!QOIb&zUp8Y7c)zT|a?Z%!Ixt{PKU zxW7w{#WH$D3vYKyPe;!*+R6%V5wMX|vdu2B(_e4hIX&FeOslD(v39%}03?0lfrm~X z?yu{P^KH5v9&R2F4;dP8^W45y!Gf=QEzT~1RJlhNo0y34#I~!RJ*iYeLVh&|6hx50 z=hOR@fPsOg+xtycE|<)?3Gx~Ug|0MMCs5uU0skWtud6^n+mPuj@Apn@SN%s$)XPOG zGN5n|8lQ$0_P=LKy+Qeu0wj^fk&SG>W;Xw8rxpzb$kYlUpk;g2NB@>Z`jT`dfwFt? zuUlU=MhMi|8=)SA#MFOJX9ljsy}UO+*xpsDqZbSs1_sFd*4~*i2PZJmjn1JP@H6Pre%)dX(WN*<5lvbZW$$0`(T&H`5jttM_!Y zX?!qEztd@rj+Sq9Th1<0Az@0%dK}Ru|JE7WPfnJP|Xb{<>X5~z(P!|-Xvs~EVAz>(c3&1i&r&7 z4|4bt!xLz=8*Ft32MsOi7braj6u=Qi@I>7rXKOOu(p_4QnWVSajVRY;oq=t1W07ef z{8S${m`Drl_r=yL>s`LHgGd?w1>=sFce)@{zPyF#$UXDzD$>40d+*=YuK)<^ddjx!gl=`ADuk0{Cjjt+BS~;>>UWNM0O}KJZ#NO9u_qiK}C5)-`8jsvkwH z3P_{4-K=lC-MS?B1cY}%g|#v35jr;rBlos9=9;HPB4?*JUGW3aH~yW#0Rj&;I;*t0 zh-W=%2>xi{0)Ao%CB5zhUhs5u*8Q2v?@o~7Irp-k+rhPjIA)xh5$9#sPLb^q@AG;G-YR!iwly{&8+LLvH2r z2MoBJgRfeTe-lWRN!AeG!n*G6H)3da-;R$rpyK!?!59muixtJ@nQ#3A@m2uhZFwaE zWV|k1Uy1>#2E4=rx7^UEtr*LnH#?beHlf}{%c10T1|&5Ym2qYv;8=pv@$w+SJb$M7 zCb%Rc(>Z^MVXIxbvk(p202mQHRGJ%>pt z(|nx{)LscgA>83AV}m%4f7rjOC)or^3L4jm9PNb!1r+C&a>b4;4fmSKgWe^F#KkSy=|2B&l=5 zw@X1=u~|-y!GUu2RZ@ee90}@sG?<&xxabV7=m8dvi?78xGuANVGd^{yA?N>Dy7r555hVFa}$pD_l+*c?6+E{0+d=sbKFVbcc ztY34RZV0YoGv(AppX6G;8_Q(d&uZx*KhgSiWV}t``HfoEmeJiA+sm7C6XmRUEG`fb zGLg>h3E$d$K2yW^NwU+B!Fv1~Qbq>5Xb7EJj@c}7+1KD!SkBn>W6hj1Qt;}(Lxf7S zW@AkMbzoiSfrQ=hmqQ&z6sLwhIX9)JZ~)$cO>iKmhQ$!Ej)YiZV`^W`w>hBFap!n9 zgujUcMp(A9mNqXOwwcj1WmsN1lOe`kG=XM^@g6Ol=N(7r>c+Nt)$ z*+h}So7?4zOQLoHKgK+m+aH(MBVChmPRjnS!Y)c*OEFEIPxMYsVI)PjkoW zcKD-s!%JW@A<+{yJpP;?+90|_^s3tpaoGdyk2ly3?t^*ezkfdh$zzfNCr&5znVZ+g zsacY!5D0r;$TK~mYK9MhNUmG^mB_Cpvv2r@)!!R!$F0t8Sg*AoyW6k;hw2flRXXJV z(7hgFyBZ0c^u%i8C3TLxeR#-L`c9TaZ^pjkaru(+AS$Sb%z1?u_?*fK4m|Yg4)gJB zQQJhdg7)cd6QUT_xF#;h)k6>&2oNm_DE-*|t_WA-V*{#O9J<$LtE|nj+ZUTpd~9?G zr8Cy(ik(j7nqjOosiyhHf6Hp9X7~uf(qEr0hJX3uA&}y<{-|yHyJ?4Cd)W<+UUy>B ztr!qLI`r?{fLTMQ4Ha3bbOHyefRTLXa62M_#p0DXp_wO-z$2UQ7j)Lb&LE^h?fA!p3xb(wwnUARqVT8HA^FV6bP4wrL%s#c9-ML73)BOtUkag zskaXlQ-Zk(2@O3zrt^j-aMe&@3(}}bWb-|KwcA02fcg%jj7)56OhV1({e2+@RB%}} z9>gmTY4=_lFJl){u+KsR*XH9ZQ?D&mgcACdQtMA67hAv`BpP08taUhT{k65Z{*Xq{ ztMtZ4pxd7t2;gnXVx*%}5f`E|2V%lI2P;~0*$`^oXTCxPwP#k$lr89H6Mg;mS!fPf zBz(;@n?2`b(x^2!!n%CRN)!5@JeS)nI3KH zmeYmfCKZv0K%CT?P9cnS<~W-sGX7Vr-x)ZuCL7Gm|HHW z*eyJzNR9~^kIT7l4oX2{d95r}G$NEs!wx&}cxjwvs5gb^N-LAyx%=CDplB5Qo_>hw zQVlrz(|4qRfB?e%g`6%yf0YUl?bQi3xzw0U>B~e54G2u z`UyEnJES?VD@SnZ8$Jl1_AySUbiX8x?LhZZ8uLp?!P9_!7dh!K`0^fdLK1dJmXe@S zaH4v@7p1y}2zKP;qbsxfk;GC4`@|-Ass^A;rNcs&v?&4=3W@c*6n7fIu0YC`1Ue9A z)YS;fncDexVID`WQx9Q=lmw3po}HORAIwp#9w|5-Bz2EhZ%OdVwHT!Gm$QOxxHa_x9qj?>38EyQP%;KHe-a5$VL*#}4Sr&2F0?8nxX z*3>}apzGAMiWK6V-CZJ5j(Hc9P!I%XwpQDvB@O;(CZj#j+Ggj;s?#+&{qZ!aBR9dE z@oV&MdV7gItFCSP{e7$<4bi&^v|+;vMmievsa5AT0svx^H_<-AR?HuT0*| zjrPmia7o+L3${L6sJZf;lauRTY<*imUMCD4soG{O5~q^r403U^$Y8j|2TA_#{^ku# zhnw5JX-git0oOR}n57AHf%TCU<1Qn};_2Q^nY~HXG5?+TuOf`*K8b+mwV>Ulv5g!D^a+1;Na z`P~!e*;NVv*^S_5=d43ZI)d<@;`oQZ>R2q=ON=v4Yt2pP!A^zHKU(`O$hKF^y9S)y zur|V!I9Z@sX)9|Hd3Zijr$v%7(ckS)eABneIM4r)`Z4(Sx^({RYK!mNRn)}kiinA- zQtl{F#FQWd|K8YvGP}7M=VD;Z>*fS+==X`ezfze|%?D$sq5ct3U{Ye?YLnj~1R70H zO4nBMldQD7Jz$<(Q_D16$${JJ?gfr$jhR5iFL|~Cb;G~vwttX;FHyXhNi{9QMk3R8 zC}ev}e%0bdjWVnbj&0#uJN8DSIXSq%@#=XQz6AB>5ZuHhfz5Aa^pfds*?du+$ASEp zqSLk~cF@q!p;Bu7-%~>tZABuq#6=GaYQttK)zG{(B!@*e&)(#ovyVA|&MMf;mAXdl zL!o{(y?8_1v5LdN;VN2swP_t{LwXW3&OU-Zci#pP#+uC81mbnir4UCywu&=vX!E?b zSz!Zs405;G)pcTxXu?=OcP|g>;L0L6E5gAE3(8}cs4(&3=+mRXkG%($K40?|o#pD4 zzb3RVjb}Mg=vAz;uy;iD88(}O>203o2prSZ`E{r)WFb$6K>y*^OhaNs z^kyL&eF!+CCYigV2n*l;hq`lE0w%uP+bdmtY+tVZ&u14lL6N>R#Bq}0XF4QHQcNkj zHrt3b?nU3n*>Q`ceU!<~{Ba15>n)XG$VW9dK>}Hu?3zckC4`sNl=%JFgbJiD#c??^ zeIQcpc83s&?H^IB40E!IFsDngyk-}fO9R5yBM6&NR#~m!)rD;BFQ9GL&9GD0w%|*N zM6b7=Z7LE}TkPn7=AM0m&uj+Q*0yH;X6~`u>p4tZWjHASYthq8`@si~<4&X@DTx*~ zfhIaL|AOAt-n*Mq3Ca1lQ}SZo2PFpMpSu=Ppea*qPYB;6DhZY}GzW2)INs7p)}HHW zAeRI|U>)zte{SVs>CNb7NT8b8znu5n1HE^!(u)(ShzPIRw&4y5*)w5G4MCoBS>tqb zW*%wLU!(xiKzo;^HmRr3%0j>At-7(}Bh-a~Q3B;fy+JkczeUyzfdb{u0T-qDF3Y2% zCTuT)wYCur7tbezzVbXGKeUzs149Xa8e{8h&C&(KYg4KI!5CX9d>t2;hHS%4YcZ=mh|JbNc+1=Yiyfhx>7J@dm4UE6Bq?H>qah`Sh6=0XXmyri5LCa_JI0Uk zc6e zCnnFHSCqWs=bd0NuJsCOjOhOF>W0~j*iXBGm=5NvZ`O715vB)GBJmdB4{tQZ zo1pZKHXSba2*lo?o|5uyU8}KNyA#EGfLc?RAj&L?nJL3`U4#cohm1x`pk$TFIe3|0 z5l$MZ9?U`s%{sxIPa`KpevqDI&fH|^pR?O~mw{`!F#lm8gu_df%j0p@b|u3Idqcje ze0Ast!J+=y#Fkcm*If4#)EJehvM-L-lUv1am6-^$hF#(JM#EfRDpowfrc1GT5A)Tx zsp5Y#82$9SH6Xq*cbHE)Xv;_m2Y*1JKZbgEOx2%x(>*;6J{qAF~&bvVx*%qY_V5$ zx5&9%!KeL=D}V37c{J^zdL^+)AfC|^1uMtNjo)-rj}cgZ;40l7ji`poGu+mHlodN0 zx-BolvdV<^q3x+L{vmaF-$%W6dKxM>Uf>)x{6VmGe4jQtIg_q0FZwt@=*JH8Rp@Hn zds<$!QR&Lj@e&TpDv_m|HdSA%<3s(fN+%vIHN4+UGTDu}KeCzNV)>tH>FyM?2qmISoKGF$-5wpUH&i6s(ztovZEs`f zY)xeX@kejf+q~wsj^Xbw?9lm|>J10_xabWPjbTrgCqAc?EjXd0Vd$S|b5(jRnsW0r zFf=V=WG~bu)awTKf4()HcGZv=`r%FohwKC%$3}Ob%5>G=DBk+;e548Cg4;9ib#tHY zb{a#|sD9Ea4;!dswucDn4<_iN_;eK|0ZkM@;*Bg$k)6orkKcj1`NjP-fkx$XkC}3_ z*8|^bno+GY=f^_5QqNqm`tOsw%jxrXJJg#$E}lP+P3i?IwH=TloOqbwoHYL8)^HNo zCA)h-Ny10Myjp*rxXJ9B3I=a*`n-D`@$lXLZJAZYXyju1FWKzl=O6;j$>SZlrw6YK zMm_zbhKB^wJN|cKp_tkj;&u%R9gNFxd z=Z!Li`t{BSy1uD%9yU#Bg3M;`Tkh)EZ?_AzC`gB2J)m!G`cm&@?Z&)49T862?Tg1+ z_bAC~I+~t4y}LIpI_s@7!mdc%Wkazq?pTkx0)jFYBStk@FlK2|8MYp{(l{G#binal zFNo8uOu9(zFDq^A$g1V>x}m;AwYhnM9I>nV-LPf+xL=NMIKTKTSW(WEwjTxR$i}?u zacNx9vAX2{kcdu1Q*N%wqj{~nz3h;O-|l&XXw~Y(Ti$4(LvXc74Ck^>WV*ZN_5HEdwJ^Aap$Vz&pPx?2NP6eb zb026$Ma}xi#l;p4D`q3x`Rb8R%a;5O6Hzw?M2K}*oh6%?7E0b3Ht_@b9le=S_)`5IF#`gpb?N$-kF z?I>vxxZt?^#MsiT$EIw_4Vfz-;AMoXcT6@{RzOrykw@V(dE9qiLKdgO<_9m>sxuB2 zwVrGO>*TPjnZ>a?|NN~l)Y{_5wgRKg@sPzF&3NdvQ(<{$(zS^+OoodGX#c6L2Hvv5iUoJF0ir>*LY#Dp(xUE+m{`^NZNeB7+=kw1J+8 zAx522S(}G_=%kWcG7^kKn;KxAC>>Ocup8gz;iRFwO8n{bvQMsIEuj^wrN>x~3kBD` zQOw8B)xvCyZ29VL4m>>p^j;g=z$=bHhG->v!&=zGM!t$@RpXKAuZ~#rSZ>b$oISeU zR0U22!Q%?A6_eRi%*9~S-s%CUG?$JmIanhWu@-+Vl0-7+NATlnEIl&R&v zQDyA4gtB`Q`-^n0<&xr>phVc!vHPrZ=}=4~Oi72=RD*U;<2~o^mWtp;-qg4TLQx$^ z!m+8NUg#8iCtP%U84+m|k0&4YR5M72if&rWaA(dpDD2Z2+MZ(rw>G=Uh#JI=XR;%-7S*-VQl z*qo#>@FYAW-a0vuV*O&{TMC#>u*L{o+^F(*e~-L@`z3&5N>;c4PCaokD5XhM7uxa0 zm(d`<2ZkzBCj2CFS>!!nn@&1>Hm1fzgS@|HDhawfd-ew$n#csRrZGgKQ_A!~(t~>X zz8U54^xc@J&K-q>^^A>7|JHroN)AR{RnUfoKT5Adij68DKK}8C$en2tD+30Ut+5J6 z%m+NnWqZ)R0W&$JMaaZKwyVz#{>{r@rhZ_C4M}#^1iOk$MohOy92tXQf{6Mc>HPU% z4~0xa0^R&jmszU!9~Xe@z99Y`XJcWw{66*y|NfhoOW9#>P%wq=_U%^x>y_^>SB$Z2 z$|L^qj=!(M9DtSJ(%EE4B$By9fXEVcB6mrWDQ*552&!gaK>1KOZ(u zNM|SW!Ti#-MN{0_n1OLj-jIj$$Lk%%!`ez9t52;t0=;2UNxfkNh~blvVftvp>~hoZ zvEwtb{FWU0d`t-0z5p!tOZ_z7VRd4vYaN(^KWPU-udz?Kq*c`u#7Fu7`0ankG8dmbrg2bvfD7XFx8p$!;EKp@tnb+7Cmi(kU3U2&aVbN#sA^bio24L1mU( zax$%jZ(uEBb4uJm&a<)Kfga$Q_A`HqZPREl*QPVj>UcB*dO{3{;2s5~<$aqmi9 zuIlO!#xcKJ&CLQP(6`9GlhXKQn(%miitl1v)%F#w)2}Y#|G-}dkub~cuvVGsqvfS* zR)HnRH&PwhOd`Z0xC}NyaO@a``)^At=y^z(P*s+UW52>JVSHDVJMOSvr1z6jmO#C2 zH0CVps<;EHeazMSwRMD;ckkR^rx+9LwVg=|PL&z$j3yZ#hz`1+SjV-;11pGV^gFHu zi0S1~7Phv{4wkdPLAYME!9A#N%A}PzQ>}6+l3PUa2kokSNjgR2MgL;(>Y%?3?s%V- zSs&hLX{gZOsLTzM*Za-%uTB)_hh|?5 zK0t0^y}+o<7QTfV?8%(X+zjaqj$0|sejFaf|D%e0eIxn#*{pFcdqEP`dRZumAk1Xo z9^qlT(C-H$K*}JnsGvU5lhl0#y>fPD2KB>^iCrl$aBZm`K2q{#i#Y^znr~-#2h_4h zOyW4}=$^abSFGaY0fpAuH`jj^2doL$)ny%#6Uk6+pXu?hljVE>49<{ywk(n9W6t0= z0w0OMMCGhfuU-2YSgB_qZtai&pugeg%S7F-9cFT~fwk4IUF4r6B3zp}`6s+=kUn`z zL)PG$TV{du&W+Q{rWUizeb+79F_v8=TAOc>p&sI{g|nuqB;c_u$bEwvDe=SI+^P2bgJ!(cqN-|{;oz@f?Z6&uacUJ>!p}EBa^KGMNRC|`ZRQw{)Ia<2 zdoDWb5IW5!#Resfu0r)#L8LE_KUVb7kD*O1XL@7_m~t5h?+`#{AcE@OHyA{L7A$V| zJrNh$E=JD6`m@^Mydwm~qGVli|IHrNFnBWI_P9BXT=}aC-o*m7`e> zTJFW-5Jn)|Xq*!-_i+CEIn9fQGm|!Buy-_|A)TOybEln2O^|IR2x}Z?^oqPyUSqtn zRxW*zNfTdSGzV)w)1yTMR@8tU>~l7(xRz~7y(Q_bcg8?|KEpg{bH!*;XN8SYc3g(F zKUYk8m-IC*XJS@+_vuazYDLRY5Ti>=l#mhhbFtIoMuzF_f)LJF9+xnUof*6#m=B!> z19T$I$RQ&1^epUd!J>R=Q z?^wB7CuM9%h_i(Fe`N$7z#Oo|6vc7V`M@VCO5xdGIfzsyB4M6EQp_-`hMf3zB2j6b z^0|Vpq^pB?>vUXqwmOk&JbSikJ{+lNKn#X)9CtxT`Ha7rXr(tYfJ-9l53C`kHxO0` zRr#(?*L^zxi3mzyUDDCM!aW5a_F$b%apkfgV$)G6)?vX&II+LnjgR?GN0o4`LJNF$ zCA03ZwCr}4o(%3o&yZ92)*DeBOu{wl`JfaWI?LI~DK=#t!gQMPa^JV?gu)O)t0SsG zpS=gYMEQkuI^(XKfQctG#wh$!e)QMqEFjX8$UyL?HM0gAh%j^~(dW}3y%?mdRYdx5 zY#35j84Bk?QOpUZ_h^xw;@>A;D0=$(9@OrpPXn&fn&k`qd&y>A=pA&yF6CjKcDKpm}V| zOzoRydSMduS%~!P7eWiU?9hTIcP|gU#{f~#DmwYf5ZPy9odJo0*(zX-xjQUX6?Db> z$V(Dq{YSO#KvG<%A%ekPA4R#h0qnr{Nu(nQrwIztYqh-$$D@|Xm*F;J-p9n&&5pu1 zI@$sU@pwb@kA*`-BD_p@D^z9rRsX`me7!_Fq7v{YK=sx3lHh{3QfZNNKyxL(vI~%a z#cT*V(j#{}Fuvbjr+-zEfz+TOq?CDw>o-Kv9xv9-7!O29!c06pa4QaI$v!@)l#T-e zoy>9wzaFMCa znR|p0Pj%UTcp4f#qqWIA{Y!+n=rq6jAQW& z7^k_6xi_X2!xN%96d|Cc$=dlK7bXo@FO!3a1W8Q4fQ0+DYIunZ03K16Y8g?hl7&SJ z8CaCn%2`Qes*mjjA?Q6yv$QB@b^_iXX2>4|=}%B%tW$0sq-qOBJVpSS)YY4pZAo0q zm%=o6XaK5T$=B&+*4{ZLs)(7~u~%5wG0g}X)}bT7FV}@ZZE3%!v;Vd}7723#at{ht zAG3-fkIT!5vkx)@OC!jmIM2^^d19?N5glc+Sm?>u9s)4MSMZ~E+fo+n)Y9Tl|*C@J$q zLQY=lyy9B8u(Y&9+s>;4^yQ%1a8NO*o9F@y_MPL-Q1Zc|b$c1Ry1H_3TyAnCvVHs8hE~Q^QRAthP7Lfk!t7ktfD}b6P?|!PALzyE#Mv?sKAs-N)vJ?`ZT>eFuC`=Li5> zmOJ3AT+V-!t zPQPVA1A0VE{oXJnx=B1|AU2`RF>Y@6vbYA3d}y!dP*O!+e&S^Qn`)I{>Bi#fQyn@1 zuSrGY4mM2$BwQ<(*JA^4(wH<`a8dJVn>tMFsJ50$xkm9Q{sy1Tn$K<0V9ovE-EL+@ ze!PDGuhxC0+W{pFo;#**o3IbBT8*jTsTy!(`cEX*y)5q1P>-wk?(}<{3iq>7f z{$EDp7e7#Bi)?Le^|Thh8_QNSdlu}7FBCOh3{ck^lVrv}Nu=V%Sy8Z)(KkJ#?k|$A z232_fK0Jobx=Gs*i^N~1rA_f^!=!G-D)rOl`3}g4}-(m9p~S~UIb)}gbq_Dt@}r8JobT_4n?dz)Em!-eyIq)>wbkb*XQa5~M zFPy{KioOc3U(`Suqt11kpIgV}&kya-p_7HOp=WdeeCJwRnY7%GSu|{W(8ND}9G{34 zM-6DbKPbIj^F#1=NelD1+fK|+tO(=Je2<8MK`ty&^hwY6bh%lDTANRl;w)B{cYNa~ zai5}<6|HC_J{~e#oOMY_iGPxurKM%&*Q|_Uj*F*f44-y==LuqdFRO8@&>Z`D8!JQ;2~4M4sW2qp9i)YT^~? zgePPkui5MrywMkqLq5~WZ2dU;{D?4Q^c|8s+F1?dX24Ox3wHfyPbh}@)Ldfw<91ZW z{co>d@hVz!!2u+ARVk-UhqR8JU%)pqoo*87v}tpDYKh@{$um82yq+#IRg8y{vR6DV zhpCRcBqb`Tktho%i=Vd0`H-fzICj%=k!QCv+)u_YBqeUAiWLs5y&u$xN|+oE6y;`j zd2cx4g*$#8&s7P1)$ex;wv=#O^?v@XM!8)K zr~5;Z-->g+;W*PcRt*Bb4^-MqCg7>6*3oIy^0Zvm1HMR?V%SemMXn{&>weiI|0QPk z5kNe1Ny?mq(f7}J{r&yp5_9d|?SYX;r!VtP^VIfmGPx0n%aP;tc;2`0+i_O2Fh8_c zXjFJ+eBR1fL_|c~_O#_PA|X;W?g-PDDWKz~{n_(alX#}yfo;Vl7hUF*J{Vg-CO4;S7Fi)nN9o^R4ANm-k_@kpZ$-}WD=BsF zic1vseQBfnUqroQbfn$$H9X10wrzW2PwZr3+qP}nwr!(hClhC4+fL@`x&Oa+z5TgY zudA;MrJUm@)Y^ca)Cm{-aD3n}h^zy%bA=R?g zEj&wfPE1Ipv=6e$C#3@}+qM~Q5^wKtWIL}OWJ`6LOk>`bwd^A7uzFa@7R0ZUoD(>) zlEb!)JpEz`_-K+U>w2Ct-Dk@Ym9l^SwOw9Wf&Dn1&MI5b{C6&^#-z6=tLl0+yOWe_ zKYT605n1Hrxv+u}dq~NarT6b6X={K>m743a5En9rJpFk`xb)f}=DH*aSYKJiW}w{HQSZ zo4aB=+G$~2hm1HFY=qXSMi7f$F)e+(v97184n3SPwr49-!;dxK0fqt5G@fu%MuAnv zv|$;u_aNWP-o&-*E*R#!pJFAc{@*@2pP$xMfml;9#hRSZiupW$-891#zM6yB%O zdcwN8xeTINX>?&|n3z#dpT`1!2RM3f;gKb6h+45VaymbsHiRxb9l5x;_HtJ*4#sfM z8`kW)F4=kCFPhwemxrRVGT9z5`pM!6_-em&z5iU5$z+30)AQ{}eFTQOEQ-uh@KBG3 zLWLD=C-~Z)Cr$bg{#cRh;v^D$TD4jxh*S7g@#aLn*?;C}qk z!0D+8*; zhyYMW1YQOZg%|GNN$1j_rt$;`*|TKPs7J?khr%580MN){%h$z8`y!=9_Km9IwN+eV zsl;_lOG}2i1OsT1Z9maaxuq zCcansCxf&oWB;o6-Bz|Znx^kzL3)^%n*J##f-5l>oR2Vb|KLMTJ;@~}cJt~_S4(+S zZKu?Ww6wGkN!FWR=I9XqbU-k@52{!!9(Cb$*EN59yR^4=H&PSHlm$Yu9s0EV!k`Rr zPv-nN#Vv^eBFK9(pS1CLr@xPNe)Li2O|%hHa#`%~Z`Y$*c0J@G6DJvQI6tI*ix$@O zpzEAAi9I(Gp}1i)8;Xnx^uX@MR_jk(f5LNY`U`xX%B$Dw2S|wRHh&SXtf@mYCS7&0 zs&-{sL2P)6?FqR{Jx{UDLNWFjDdA-=hp|CC#2yi7dL^nE)m(^N8)9k;(rN>c9I zELpf7%mZR)O(0#I~))OS@`O-5+L9#`7_}%R+-{4$d!|tJ_0VDG#0c0i#pnM zr+78hH!M_SbZovByM$Dcs?nS1oNSol)AnkTv5#(H5%E$7s9JSoz?8MzRwXykw7q_x93xgf7avmY{_%7a1w9; zw~!vskjW(`@Kh#~QJBl`qiKq%#-%nVmy<0Y@snf}EaGy^Lt+;HBAj(Khl^#M>uRO% zt7Vy>C()$&^(XhHC+RzVA3Pe}5LDDsm0DD?avq?t$u20Hkea?FYV+FfuNQqCT~ICA zf{5I*xWU6m$zoLU*AApW{JGp)F!3$kKwjXp*7BBtjLW%SbflEA5$6|jne8tj*c?MTDxudeqS*~NegphW6 zAf8a{x?pBz-jj*Rvh9?-;7shXmpom#@=%*GZ1AUAvh%&q3fvI%`TTS3TV|4!+=zix za(tinmsf@&NkTRo?zGsjxqC$~0#%SF`#>V$9SF(tn@A7jXda_nv&QB%NPfFHHlIsP zoHwJ2+0GY=az1Z5h2;5~yQyAHEgklC&iWbN?hUWYi3IO)g(IK*0Cl`j z5X$p?qb?47e!1NjlJfPGBb>&><*-F_kzyeHXFO?n4d}TtmSd=9h=9NVN09q%QZstZAEoR zHb+Q|<$PQC1U|&FYrEf5Q{K3KfK&5nv&^3h+#b}2KO||HnUQNvX0QT$UYUR27ydX- z(g$YAZ-d}`UqFV;!;66qBBt7zpYNi)hfAe3ke2Fx^Ybiwl&f{4vR-R6(MVra!QHfaQl52DPQUSf1A}(#^)nR%2!xhoO0*o3x!3BR3j1)9tsB2pM zY%zw-#CWtHg`6lqBM->5GWG`7bQ2cB!p1)wLGTF}uPZ(=^}HKe&GYSX^yiT<8i5cx0j(X$8Y_diTN=ht2&B({gHd^8vWdkaiuBSd@goRf2* z;52U*#e?63A%92)CIZSH2o0pHpRii<$|oSb4Om=K~2zfYO*7Y#RsA)iQtJpkB{ z*Oa2;N1(V!4#q2?kWa`a1f5{GB-0;^V-uDoM1*oaBno+XBYb1ww$oLR_=u`?{0|e%zhAO+WqQ3i7%+um;!`{Nefk-(e;BC46 zCv-XhDXl(9{9#cTL_4!$wzvA&-k~?PgpY(Rfj84Rj`}havC6W(A@k~_@V2dlmuBCb zd_PRj#-sT({Hr+E>L=q`=sGK+n5COmjvNlWTo|kEF$)aHV7&{>|?=l0I~q^U13r+wHCyIQXJD~Gfa2nF{k;PYPIYrg2x=9`h*TbEd0;_>S&V@_Djx6VY-r!wtQ{Npd8680iA!-!=zN^B*PIm@3}{bIw35!sb&Sbh z35iOUcU&>g5qo1i2!|?-F>su(1+pyIqzj9)|45!1Uu+5PZY7#9nPgP6hB8}q6s%?* zV)PU!P9^8B^_5 zb&?XorLvyun#ryt>DJ^ur}^&3Qk}2L1ydr_V&C7->-Y!Hz@XI$0BDit$q>?2vH8q$ zjTShu5F;m-2E=gEDl)QK@bH?e2-uiwQq573cwFd55-0%|36yz(h+JGBnj{G`$mxLx zHklL=G^k{<0*+EU1xdn;An~C!P0)-)!gs&7httM#S2b?RAT*_j9pd?FV7`a+_id%G z**Q6_p2DOOn4T*o%bu2dQ?Cm~JW8o(ngXkdKEdCRryXTZQl5;lBJ>x!CFB^$Os6wT zkjWi=^IvL@()leNm4`)m#FY#h5p|{=9Ak@l0n{ZrKldcWvIHtapIMM?$YgVQHIjud z3+><|IBIlzH0_gqDe3PyAK4Cg)zNq4}26MR)ea! zS%TZYxYV54R7oDW)*EO>u~X)kv`Nni%g-|;I2F>-k`ix?6@>aMh=v|lfa#N>7yH5q z`1FP&+&`wNBZxEU;N*4TO*Gd?EQ^ZFlo7v%gOp1%D6u%=Bh9EqXI%n@V2C*Sq9zHn zLD@CJ`A?%sl%}dGMRF#|@UQf(j|yegMcTVl`Q%t-F|$bTVzOEe3tVae>gRY_bBm++ zg`@~(4|$K13xv}_`4qCI>H`UEtyK9Vm=fU@I9P%EpQ++2H@@OZBSp@|f>e{FCXEtu zexCKp-oMKhvm(290%1ka)+`Y0>Xes@!0LM6DpeVm>>AYB$Q>g89y0o2qcCnrsIh!s z$)I<(ppeTGe}n$zc-{BcH|+4&I-}32GztD+U_}K5q-TMl$IaMO>6$`57w<0j>n2R# zcVSVE5jW0zWByG?;Nas3)5#R!>A!ekewi@Q$0Da<W7G)9kbf9537Kd8kg2^>_X# z&1O-L^pQsi3mXvck{FOMgSR$Hx!B9%Bfer!Zuj{tnOeu8BY8N-A(+{evJEALYx5t6 za9RFo*ZW#(Zd&YC)!N#+c+HwZjjm66SQhv=F^YoNDV5Y>-0ND;-IM6nOzGjbkGp+x zlT&C%nbD9^Thu_Z|M-s=Xm)~6;hS4w{Vi63N1xJ79St{gvhBuAE7K_G^i_!CEQ8{@RBC5G)&{&CWG%>DDfjLX_^%thT1=u;$d3M#&k$c9=ES?1}a z#O5k&Z)$29mJ_%l zr%7xfCAT{*msJuEm?+MnH0=2NdhWyYBxydE<4FcAlT0LM_aYQ3L|l^RJS0h~Rw#?Z zXcq%v7XF%9M-}n?X$swaaj7i# zg4?_kmHDV>-KvPm$)R+Fk*KNjAww=*m@68PZ`(RWz{$9ooSjY60HmBzB4ivG$4m0D z2w`uTvGB>Tr2k8Kh41d+N$n|+h)QHX-GAfe=D$4{g>R{Fi7RS1J4N?;_vk%|hmO;R z{L*pEoez^17!HZU*U)WtA}fh=jU~-t*vmb`k604R(#w;7GtoOp87$G$rIy)N^J%9O z_anceI<-S~)0)|TbKe_GiqOb38ymknP{~W}bt5AncL6`-@OMUi?5deBO>!BdU0PDv zKGCA>m`={^O)*~3cC2|ckreR3J;kMSYosRh8P)YTM4#sax^Zzc%1Hb8a~HNN(~QLXB3@c{i&Bd#!(F(5tfpCeOI6Ktj>RJQvr~&LEco1K>@`rN zR-!SGiNeI5dYK&9CiT#(7?HG3_(05k$G)5LRX%ZpY*^1CiWd*kP&%OymtJ)UGbK4y zBts%KR~Lz^+ywi=D^ZPQpL-;h82_5NT}#_V0~?H=C1; zxKZBSweKN+FU3}nF6=%yuutX zuLI!Z`~Rzl573~yU}AY_6Cps28xjaNm?922$te~$@2d>g!T|YXnllx-k*oJ$1co~4-CIG4p5cp!+meZ#S~c=s$Il>Iv^l4Djl zW&Sp3@xPy%ApmXWkt)<#pjlq4%CIs7gTMsU1K#=}?Hn2S0*e4};-9#+IU4g9FnmY) z&LMNyk!9e`oAhz6%=B)Lq=!*xl#8cbMCXa8j}8Yb$Ma<=rw;}PX{Vy?$(XkAX}2tK zbU&Br>jpfB8N=I1OJaE@${5ApPAlc>?9v{9g(YV+ihXMBHxtcXVCDH6>6{iAsIS0T zHa(}VRt&1k66caTjw}PFH;BNL80nrhLh&di5hE9Rhh&9>GvJ|BYG(Fdc~`KHq#rNZ zH>A%iWirH}8P*^}PWQ28yz#+cVO#NS24kB=PUTYp&8VJA(ta#ZtFFL>)&>nY8W2R# z;)ku+YY3Z$LNOhiUOf{w#d(@a$g@ys(yh12KdIx645@;uTF?@?7vOI}B@A1;W`4TN zKnz?@Ii0Kv+w|*4U^`Y)MvRB(euMsRPGfS4yi-%h*Z<1I4_u&A11C-rMYIZi-(p>F zrK3(_??fYE^_+iA8tbt5xLZAXJXn}?N*H{l{<6>81amuuc{z*i&*npKMUFC%%L6O= zp=%7k04*dOz=>8_Xr2wD=S0p+$uXq!#_U}dH;~8&zZ^KsWQI0mZZTblQdUs&sa)P$ z8001Fsu`Su{2GdiF}N zD43NnZ$+_C=ce?4^eH^W*ab`1W}eI4=6}CL0HOjP?y$@s`tuF+ z-yQ@VCP*&OdpW}#na2<%B=+I=V8mQxsV%yRJ)19!Q-%mp%=F)mNlpAa*z*_vWulFE zPKh+KIIa-q1_S~%xm`xx(QJVI%5n7`Q-~QWh){jv+NrSQK&Snw4_-0e!~dyxiEzL+88GG8b4QY__yOWx ztv{Kw^4~t$8v2ddhm6?<@+-qz)Dw2Xg`$@riVTqqm8NYm&$Ke*A>%HC zhddJEa>8+c;t%x5H3Yk9;+Hdop+sir(t`vAKKdKwEC2T!N_#?GXrmFZV!C%&xw;pw z%`pC8ayxdaJ}#%ovFj=>Dv@fsj3fgo9ONmz2K?F~+8wS&n4@ZHiOv%PLE@!?{B{*R z0e;|4#ZhF4L&oZaia$x3ydL`N`@pWd-6}NI#rB+tJ&sj`jVY5IX_91SsH}t_B*shyXfjI%mCuEEgS z7RNMp2K{;O=3p6qlXa!CK!pVr>+^rq}58Ygw2h&0s<;UyD^6n{meGx<^1C#*K~DM7veZ~Wrtb=Vc0mt5Uy15 z(-xmc4+L&RRagjw>Tg1df)B&$C|!`#tltjTi-DG}QsP6$Y`M8IB>tiPe>yV~lAYW4 z+o(3(k<6j@5DB!?^kJ=<_V;dS`_p8No6b#p&&^ezPKawl>Veubn09!WDb=VQWvFRe zLqH;1N*TBouGKDIBVj`%$gf2wdtYDV5i&9(Vb@tP ziyJo!F&BVq^=!C(`>Wk^m#ml)aHAdtW**E$!9?QkSQ=Ek(11323N8LKm3V{ZdT8>3 zO8W@5?`(0>^>m2JCksNPDujZC)~X&18nn@0n8&g--@CpZt7Nihc0(0KL$;w`zY&59 zr}LxG=K-1G@NQXP&2)`nzLcrN%_p5dXR4}CiI;0l&j;`QKP@(tc>n(G&MaYH%#T2) z-Jk(;5mRkYg7ymtv}26^L+8ErDs0A!Ke6;GCiTOBa5fJNTrmnVQVs@?h~hj0V3XhH zg46|$<^7ve?%!{`>S}5cOD18=9JGWhu{ptXQQy^MbIE7MUxwc6_LZHAh*v1Jx z)i?RCdDA5+gZ@e1M8fx}udC}t4>`@S@A&hwbAZg-xp1` zh`#qeSvwqL*Aw__8s6O2V&r^-L*LjEVup}aF`0PsfEM^qonyag^i0Xeh+B5!lbig& zO2CNA{td>)$ycut1m!ajq%H_4$3+#_v~9h5yjH-Ji5W0)%Xw4>Y~I?cDZgfkc5DT{yg) z2EyK^NH#H#B>4M}Mp69mHzJ^P{jmbuP|inRrg-F301krH%`YP>G2CWK2XHgX3qoGT zYKMdQYOq4x;?^Wd?xr+7*g2PG3@b{J-aN5JF8q};9kZ7mz5B@4-WDx*jYNdnk_6l2 z&|Ha7rnM1UgOpA-g>A)XDS}iOv#7HKn+CrW*wg6yOLS%AR63{3^w`K?Hu9L7p<;G? zAmkDKz#_0&xaceM$MKji$czc;h@qz|h^>x0KY><@wt5=L-rn#d?d{S%V@L$z9_}h5 za+|Fnv0&$~Bym%N;aH{Ql^?JOp?{>z3{Ko+C9jo*bS-&p`Tc0V?nX4byH3B=W&#Jn zLN^##Kq^1KwJ-xAsM<@I;9=q=Q5|xZNOhd*Q&{*lRfDhKIV?3p#<= zxTi}bsXxvP!-Itl>J(_5I_}S_Lft+WMCL13orTctJ_|Bnn*_@5d@qx+k1ykyi1nSN zcrj4@2JEC81VNx)bd47X>d_Rx%lW(2e9;879|>~T`|uw@2&+NR-DI+3qwZirYDdn) zo{gMbxGpeg`B@#fP0`VWm~rO!tOp`LyXyPSjBo7wzCgpzO~4>v!Ij2jSWA) z_RFj7uIeF0jH4%H21}eq3@&?(Ry~?wY#Rwit&+minITC@=zg*)Q1Sd)0dV)cN4b!k zP^S&#(jhDctw5(FcdC%UIUk;O3ld3ar&lupiebPVLhiCI{AaP1|6=ff*JKPv2I4@a z+B6RANX%D6Ax@hCdVnUdfPjc^r?TCR4)AEeu3WQ_Sp(sW44O1JUcc9Z&7Q48#;pg* zJw)4q9gWt=tAij6|3RvpUuWJ1YS)z>9 z@0*Q_qlt@p_bQzcHLwP)rOxz+XoIzq&jDnGJsE0>u^UdQZgP4tx8n1`nGt;}+YWPc zwH|Dbw#I^_2;QU-(-$PsXToxrDNb1R+e2f&5D0MZ0x$NO-=b#9kWuF(COm(}9~$M6 zQcO<>|A;nTezrwSUA2xtni4XD1+mzbup5^M4QUe!oowE{-F;@s=~n9AL2mvscmAef zzMx^SP*%o9(ouWUZqYW7$@;yPly;Y|7vXDtNWsTNNLY}J!dCPflZJuJSQwWZMl~tM zQ05QWWE7rHS9M3zK_6#yUXs7a{l}q!g7M#o4Mh*y2GY|Zovwfwe#&AG%%P_@xVNU0 zqV@i4ar|S~Cw2Fd#p$1)4bBSeWGV_3zC~THQx6FfA_*E(K9hK+gUIV`_F^m2e<{!* z)_s%3fZLmax(6C*J5Ubu1z#VJk>;E3}> zbOV<)WZphETpDj%-sW9YUZ%gG#u^vx?g#4LnyzT*Qm8apSqky-!Gi~E*{ro#*Oz|~ z(k4S<#1RS$Z`m4A6JNgd8hnm89(RTK*Haz(FIl4<>pFL0D4+pW>Dukz%nej3n=1)2 zsBi1jr~gzsa*~-MhzanTyaF|jKMF|udOcWTy3h8D^$5k4Bm>X$-*)Zz$>o-wa@P@^ z);ZN#A`R?|g_yzgKr-ZYgAT!62!8t$0<*#!W$UoEqi=K;sQTlXp)VI2dlxWvOLFZ9 zK45umqS;bb5@SI&6SDQOyqKv|6-?_Wq7EX}3a^n|5X+k#^@lT}{Tp!dhYamf0>Nj` z5c{HESwXK&rTzNwM`gkfmy7Qvf4iJgUJqM%JhCG=HYE$Cl@!!=DasM--QeUNe18t{2*WN1Ctc`aDCksxwyxZ-D4^-pvBc$Jlc zQKKR5UElW`9hvLw@fwre=_DqRORfKQ+R!^1?kDIs?KaQS&Me%}3RaItbHZjkK;Wt3 zGGGaS6JBgl#4egYdDQRTfM_#YV4z|R9X}rAK7oKv%W-V-3y7Ig5UbL-yD2beW{kYdN1gYP+R!_wBy(sBd$im}d z7yEv5A<3eCC=mmk7~2c4yW?Y+yVtwe3!Id_uJs>qO$U3%T%9h^l__n1TvK}CT2c%F zt$H-yILOgVoZFXLSt=V_Br{!Yc1uu{X(80iVq<=8QdlDvr~0-=IEFY#t+>>^$ObXs8v3x(I@^ULQ6ckAW+7qN+xtzj&TlO7R*3q4c!>k5HL86RjQ2(yER zLi9k<1(;C&n~1O4c-M{ErOh;AzXXpN4QLGd&CBw@H8=eO6P_qA^LphC2p3MNf{FDeuyuu)X?!5MczfQnK=wj?_Lbx(n|6Z7cRVEU{W zEhDebAciF;qaO_7$ZmTW+Y5Y&Id6@buDxJOi(pn?yqpZ4P#}4zy=UL;^H#snjD@rMoVbKC0W0@2y>dXU>f(p?}!jO3GleHkmWk zl*PV5Jz!v4IUu#CZs53D^y6Wp0L#WR<@%(H5*h`6eqO`WC(kR1Vcn6tDrg_bZt0(p z*mt+$z%x*q^N2ixGjZ7kpIls>Nx9349i$sEaZ`iyx8Y94;rVlo1Z3P+q~|7lkK2gp zt8>T9l1qj;|3#ccDp^1SgXhXKQg*x}9hF8x%YdAxng^&N?^3*SAX(n+{|!*~c6Dyp zu||RvbpqO2)&g%cI{dd84@S>(=&7oz41ng}2)<~+6{vU*GnV$9=XXU--I%`a4uTc> z+~|y=CIfa+b(%Iu=9wHel)CEm5GP7Z5CG=394Uye3(@b+n@S6)>%PRw6jpp6M)BEkk*X7H}Dlq^6AQ?3rH4)u+(Dg-57W`JM zO>i|f`>ty$!8nC=RSPOEi*QTYy&Gl`}}{D9CyXW;VtyhUhV?S1(z>Ol^dau zfhR9ZA}R?Ays^(&cSX@Z(xJQA5z-n1b>yfRXmD#i6KtWeX{obS4{pw@2^DR7Z%a}l z3#&bNGlbMhl{`!+PcUqR@qzVQvVEa(?eBQhtB=(QEuQ^Ah^ZnBwYtU?&I?@iF7_(G zqE~l$i9V=q&IUKW04Bct3t&$Tg@Cw9eDrT?LbR-CPm_GC=#TsJ%DJ&6cfM7 z+!F0xOxJr3Nz_D|)KXh+HCoIdG8LtUW-aI&zVGq9m69Tw96(e`o7XnG18FkSEWfw_ zLH=qzI*EWX(P)o}hhEyAUTF%0xHA9>0G?vRgcV%q-aQ?QQ73NCEI}3pMdK86@k(fd5kF{r)(_nhdMqaT!H%GMShtwii{N4!--X4;N%14GJv3Q?BjIXjQn% zmfXEra9B`Jh#pvI!L;W9EsmGv#;h>}40LK%xHy#RDkNx;oeaeF3gDZ>QN)<%K}Z}V z7SdBX+5)Ydz@7uKf=wIf&3+)^_jGj4t73Mw2s4(cH1Q*3% z9pq1L_%g#J35$pKW8)KpfZ-ya6injTaQrt(_ro}XH6#7#F=IADP@(>1?>^BUDI+~K zNrRwvv_nTlyW5)eZ}zT+o@P)+?{55IQWpE$=y{0$q!Xi1wTLeEUY| zFDW9V99s9^(Uk!E0O7u2}<1sYEql%RstQlsFw`hs-h&s9`&jvLhP%B+pr~M8zl>#X= z62S$$Bp8;RTmg_IF;l}`K!QX}(|!&7yM1{#LIkrbAmMd=xh1lf3FVhebexcP^>=J& z*D0)=7&i|5eraICpov%1gcQ6LTl~UV0p(A{8Qq^P??F|1js|pfifz4FQ^Bs|CW5$! z$PJd*^hx3}#iU{nx4$XZ7(JFolLVC7`SbXFfPb8US35849{>R|BEtSq5WC9gTmE{B zi|k3^6+!_U$EfqJN!TEM-x%`bi;Mb2ykzWGN>vJY_$e?rW;KIfNyUsjhM9T$fPdKt zPdzy`$ZkqIm`)2W`VU%7zQ?rqaz(HM4Y(dZV>@{Cs|#dQeo1_a`S*7ZjI&Y7&8%Tt z_pO`vr$1+wYw}!V5lER@lhk~7Qc=G2nAij8q4ptA{&ba1t}Ymu>w#l#Zy%n0GW0_S zDxqO|U6F|Y7Y%`g1QIu9aS%xU{9*9Mamci_P~Z|o&`N}*S=6^+nFShM!n5x*mO1oe zkujse*}T8h#1TS6lG#1s)I|Fy&LpGht7J2kX6m6?)<%xHLoRG!oQS)76rkw;irKDl3_G|Jd}R?E`%Cy8yw3Cg-4k*pMFnkT&nVerQ;VsKCan|L$-AEQhE9AKWvQx0L5-`Xe&jm zivwWe1Bp@9_^n|b%-R07+U%%vOj$us!>M{w1}M2BeG&Hx2rq>>mgHjm!$AQm0X!+s zNzT1LxPhtp<@XRta@5r6V)6}buO?smYcLv#I2St(|Dzj!5>ZlvN9ttCgHXu}%gk|WxowDLgOl$B zG&yo;FTEq%FGBd+`jEgBdx1BQVh$y%LlupK>HQIEuYAg{D1m@Ffg(d^!kT8HJQ_&< zfzGdm0v8FhvKzO<-9niA09}7L6}W~$SeTfQyVLKJHanCw$+3U?+_rOcUuG~7?mUN$ zJ2VgEFTwIUTW-9;vR!U+)*)L;AS$;}@%Ea3mcloy1`g>53?zUWona~PJGK^*`*#T) zAkOvQdueiz27Xs{dt%pT8z7pQqzZJIG7FB_xCy|IpkKuf6VB^xa{hTU?~wKnK7J$l zS4tuihqKho)MvtO_^^xmlzngpw^aB?9;B-a z={vVk$<0DNz=42Qxe|=9(z@z=Hj`Vpfuqs5$Ydw|iyy3P`s{mX&1cax~XRIw-R1oSYj{1^gKmRNr z&v*ZY$Kv~8@9_cUAMW4knh6hrB`d}Th!MQIsOwKvj4oY>GvIDP(AE16H@wl;CJ^q-R}Ivnax`9%>ltn1pGE0G;|dn*dFL2U?Fgme@}ZA24hIg zPd$6(0i190$+Y`AAe7W9k%`>1RPo&(LA*HzAo<1f2u8)fXuGTb&WS;Z1I` zH^F9^b$oFGopnP=(My{f>JMg+P11jj_Ujkl{S<4_hlKzIdHLjj8AIqU3>T6h{K*1> z$=oBwDt0rV;Q59q&jBF7b^F)81cC|FK^1qz^hp15K4|dZ0w`l{_ZdJbUNvH(Cxy&g zR?&Y=K#`G>87?nY&Q9V%Axt`5!8rvHFVwOqC&0hufQ5Re zl#rLZ*=noyz1+?Nuiv*nT)iQH>TX}~86&HJ36~pg z+<2x-_Epw0Fmh($_|o~p{$%xUZqq{Uv2>VLWWNRsi{tuGO4A=&e;X6mR-m+GA+4uz>b&{h`oD z^)_4j3Hj6^Yi6V0<@4&@FL=?o9CB2@PVZIJ)Qp~Mb*Sp7#I?2YM&k1PfsXCD{G~G7 zz_^$$0PxoRjs~(s*#Dup-|)3c{NlrpSI!%uLm6kF=}XLj$Bc7cC8z&$82ZAdT$EYK zeyDwXRnH5CvRjU~(*`&b)1F$6)L#wZB5>MZwEF>XYH}h7$?Q@z?+*c@YaVat&{B%w z{n%B#zL-z~XBMd?YFkfvUK0ywu>Y!>^2v?_Bt_g*f7!dd;pLVQaKEb?mynrjLn{UA zc0E(YW?KeN8v^$Bkkwnxp(dH^&2e1*|Dh*?Oz1b6Aa<=1CcphG-yoB4VHl> z>P0FoIxk=+45;hCsjPdn&KPqJVKBP6nQT{w6kD?yla&?}wMkMyDr*h{KY`XoOEhXsJwyFCW7LNvq_Jx)}5%5_0Na#@U6cdrW{Md`7=k|u>|v8lmN9rWCd zCV0dty+C7WxLy_)0^go5qk|Fh{!_=jm>@5OQy1b%>)|D%$e@}PM)S@!!c17)9D4O$ zFWee)eFYV=HN5m(Jupi~9~cqY{e;Wk<%g&kGfT?iWnNsY%^_7>b^vYM-~n(4%{i74DuhhWSyiPeu(} z1TCcUz+PE6jk!<_#Q{?uja7sUekwK>;9WD>1Z5FsiUee$UbKIP2k5pIkrs-=uJo+U zEFcE2CU7}`O|f1hRSN?7lb1?Fs2up8-t02eM*30=fy@-L%~L+cAw+T)A=3GQ5HRp- zW2P&{eFQ!acJ8MoX-sA_+M7<-Jydjbfs1Q_h^dUD0t8;lNT@(R_%OPfc0^?*&A;U} zcyG07e{9{Tx1T4$W1t@{iu=MxAPm#QxtY46Z#TIxj+T`i zdOpq#n_6{G{;jhASwv2R`^UXIXXep<>@ zb1J)}^V3|p3XlkbiD`D@-%}p!OSqt_UU3ITnMMn1Ds``RZ+h0tgu{C9*%I?7=VIah z_MB7vHZ6x}iK7{;u_~DV>6k05sjd5)y*Ymk{WoLGcwW$YV+&KW^O#feM(;?-F1GFr+eK3^Ih9fPx{v?Ns7??t!)&Mku(knaKg!2zM$tL2xMK z9|P|sSOz+$sx>>5kjOc*fGdq>1PPYTOHuj0- z;L5xWo?HaoXHH}C5`*-prTY;+8}%O|)P{5z+FXWoVypW*N-Df z1#;_IwQ~$t6{TX`%?>|KRt46tB^%cq9GD%_T&h~Olm|Vq18hkIx0Qkb$E*`d!B`2murFBHj`VQW9xIAl;vCereY*MVeWQl5mF8Ctap9U=(idCQ6LXf`V@ zFc6ufYVvaMban?1pD|HeJ`j8 z(xJHo8T?6olh>Cf*x-}NA^IvG_vRg#d4wIady_l}{`QXqv~wE3xQRI8M2Ua;d8y%S zF1zIPg{6EGLfZV6m@g8METM?f65oiT870bCd03ig_Oa& zdw;~Xx>*3l?aYRvmNi3ev%S^ihcjMBYE4Y4j197=6ozC3Ed~sk>yFC|IKvH8Ez0Q} z#!MFRwa;^|TiP`}C$R6!Ob0qshHC_}L`QdB4b_Mm&Y$)x#61ojwT7w+2#)TgmxnVX z&?KRY4m(yFY3q}F8JQ($<2_4yV|BxE&GnBnRn2))CS=P{Eiw`aat=@1630W*Gl4{RL#Xxw-k5xy{ zYn39iG*3@-4_glB@?Eyu?}7r=)h2N8@CHCTp1F1#f~^}o4i3&-r%e_are<`Hm!JHq zhK!61RKdaEFs%FWWt)PYUfk2u6I=~DD=tRIVaM@0JC%T*4+CEwuz`=@7qOsNWi$O5 zV98b+tCN9O>Qn7l@pp3aZor7CcTKM(R%s89J$RWIwAU5{aG;T)LvxEBndK+$C~1rq1dgA3D4tb?+GTt;q#OT z!*nF8^3VB4&+SM{S@}xUuqj`OYb{FP`Tm~VevG!cKKKo{AggEBAr-cSjXcc@g`%ds zFow|R4I(JqDglal_)*!gWq9Ux8o40jQxneA5X2ja71qCU%ak;A;(x3s2g84-Qpn|` zLVu-W?$O;tmzI8CDAj7Mvx%9IsQ1_#?+aczABH5v8`vGX-kamG8&Xsvv1ZoP)D#3A zR@+$DZ%_}O=fL(|ocB0G%k_P`p8NAB#-WdJ+xJeg+1du#W&Nw!!{%?t>+8u;)vQh2 zEKr$4B!EnGX;)DL6?eTJWw!MRE@u0NEKhMem@q`elxhE5+48cU`Dm|Ub8>P%Y)%GN zB^S3{XZ8QuIMZ+_ytj}47&MG9)`@5uLiS`SvWJwlRMr?xKev&H{2gN>2jK_Ut6@r)3>UwD_svmO$*`F zY&uL)Idt6ai}8j;(pe^laGIt?V>G|)upjQ87RG?2kDpUq+8Giw zGA*f2?>Ax)X>(oktvZN|VZOj1alK~=q}MDlM3m+|okKAsKog8f3M4F8n6G`oI@gE= z-rc_L^QyAvS^)o}V3x-s?i01uZnqil!noVlbd#xRF|n=+Tn_Zm?5u^frqvGVu+@@; z?B3$eyGz=vg6(MGbM6C>{;@pBgVN4elH$XJ?8*XqAW_ipc8W!Fg`&KrqV!X+rlOCu z?MJ7|YtjoykM|rn+zCo(D8#Bg1X6LT0YFaSD;W?xWB7vYsNBd}FA`K=ly+t1A&-zD zn4dQe{Yyw)hns{F!reJx-pGInLX3jeAyK9G6ZdFH1y+5$3!tqQd24mJ+L{GRm`*f^?MlB+2#9}xkHc%Yur{YN)XCbTVw~*o^ z`oH=WK#;ISU*|7o8R)Ae|gc(FAQ064gW`m1&i^{;10NXVQ^Ob8$2 zC)pHn2bLjwnkN~Pw$Z)B3ln`Z-$;#q{7Y|J=gQIk=G1t!^5EcL1!-(Z+i@+Qxf{QE zSZ?=9*7~j|OK%LlerDL8gE3k67x$<4X(&t3rqFVcoDZ=0cT;WWUQT4nkGM*~=AGI? zUkfC^)5<8iZIh4w{#5n!R7m!eXk&k4siUZ{z(v*;pigy8S7|-lFo(`dZ+aY}X|i== zB2I(;TBG2aYuM%kwNIJxYW|C)D&j;cV4B)RUF>rR%MGw$s8ct}jwx&SUgtdY*iYR$luP^$lsGcx$P7q`$KpsD3tU!6t=5sMji|ABFR4t7v z8t#gI5meQ%I_TB7!lYfTm&Tu{CpHO>UfJaS`NyqZ=)o$DW2|u2?|r@d`+$~KG53NC zrz4icLqbF8b>vNP9}E9v)m(GK){U7rdL7T@Nf2>yFl*9J(^P|k(X0=qnl1-Et3j8U zVeG~8tUczHa)&AshFOe<+e?(E9q9%;u`whq>hDqD#w`MSd2i{I$<|vF!Ht)1={fZQ70$Be z-Dx3^lVFeJRYc!Vst0rt)W6RW4=aU78!3p`f;MX|Z7yR-M2tH%V#9ANR8SMmgYS+d z!az-RRs5l!i>rZG`FF|~S`nrI!y9fYvMP6d63K4CUemT&y3b3$B0pB9C(93x!xaby zRWn@WZI@@0qc#QMk#I+bN|a1AZoQFl;SW7nwskL3MLVxJ{l_6%b=j$Q{@segOF1@K zQb3TAqpF;7jMfXs0uo-qwXC4sb5A+fG54eXx1k{sj|yKiTFSL|=5q3(E3v8`gtD>M z?;9w#f9ZvC=4CTln7dZ*@cGxFj`^y7kJ}C?rl!VhzW{r@z>^%D+#zE~*0_KAT`6Aj z{C`B4a^D@(w!{DZ`*%X1`y0CfD2+_kO&^->>N3n38GrJ2aRze-o2-!S`P=D+sV6NW zS4LEq85fKN=YUxLMvqZ9>+>1KQ*NDN_bpvJw3Cpkahvf6JF8Eked_#sKB!wDszbWH z)VjUuJABkS^$pt8*6uGmUXyR5BiQ1sftsaor_-VxlmhH>8%^>La-1>Xqf+w4KC#aK zb1G`27VSMAJ^f8onJM{&BFuv^{?$CW6Tx;c5;CS;bl{A?zyI%%P9pE9i;diI|9s>Ja3eLk5z7;t1kAFF)1V|V~N6XA#`;Jmg(YLN{Xb9c` zeo4d)hjQrPKwE5h)aXU=B{Q#YGB7LYBjcP3j|ZOD-Croz)fPVFxhj*3$f!tBwAd?t zbBT|!1-+mPQnCxDTe&TwdCk?h~h-Cl0Pq(@HLkA4c$ zFflQ7_9#Yc%oEg}grPHvv;#j^OdrvVIqx*8d$`)J@98OYxL?l%e=;FhyY+m!qM~AS z#%@A>-&JTVo_wTPSyh#*;p|&T8=Uz)6I}8s%X^dd=+bdfzpY|_AHgwv)6?amjuqM7h-|T#DO!@nT4tm_<;W*~A;@Iz2Iik|ug5*}n7)4X z!w{4171^tf=Zz z99PNI?6R)s=_7J1i`Nvr&0>*q9i|k%1Z`8leofB47X=mW2X{aB8%}z=^3xWGf5A}pJMB?M)RdRxbZ~ zv6Ykn(4;g6n9OH3*2SYZU)ja3w5l97^?0r+y9nicQNpg>7>O+rvekveX@gn#mHH7* zes(W3Ow~EXZTna6?C0s6ELJvWId;BMo9*1r_}1ZfHRmjhRhJEB!ioYX<60msmyBUAwF%J6&HRE63Zl-_0p@vN&;=nq16N^qR!(mXEP?-b;OD#Gt=Eg}(VfeOL`TOQc6LyZd) zS?BmUi&Hb@%oJza@|Do~sc^8xdO6cp`Vz1*r3_Q4BMdRUdvyO&_qE zcSA~*rFS{c$R_@Y7QkFfe2@!EsGdnV`AMGQbPId)2Hb%F#q$1-r`idEcChxT&dRvu ztR#C@LGFLf@<{h=r3Kk*P&2(f70+1?E7~Ha%l`yTXKfReFDr!?+)y4m|LG%Bm+q37 zDyTqos-Q*qekB&%IxZOSJHhf1ajn(_$}b$ZYXVqUKOqCgnIx2gNR~`8C62%5q|hI? zO9MYL2VRmbM@6yIECqG{<>q>t9xo0%2>v)5Za3qZ8qghkoaiG=0Ml!e{5aCI2atwH zy#4pXW}V}ivH&cTElPmzLC>8kZd_)2u(^rQ48sxAKL35T-9lB|imIn7H=HTibN&-5 zzrg%Z8VhdT -Some of your data may already reside in an external SQL database. PXF provides access to this data via the PXF JDBC connector. The JDBC connector is a JDBC client. It can read data from and write data to SQL databases including MySQL, ORACLE, PostgreSQL, Hive, and Apache Ignite. +Some of your data may already reside in an external SQL database. PXF provides access to this data via the PXF JDBC connector. The JDBC connector is a JDBC client. It can read data from and write data to SQL databases including MySQL, ORACLE, Microsoft SQL Server, DB2, PostgreSQL, Hive, and Apache Ignite. This section describes how to use the PXF JDBC connector to access data in an external SQL database, including how to create and query or insert data into a PXF external table that references a table in an external database. diff --git a/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb b/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb index 73022cf80cfd..95dd118c18f1 100644 --- a/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb @@ -21,39 +21,58 @@ specific language governing permissions and limitations under the License. --> -The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. PXF has its roots in the Apache HAWQ project. +With the explosion of data stores and cloud services, data now resides across many disparate systems and in a variety of formats. Often, data is classified both by its location and the operations performed on the data, as well as how often the data is accessed: real-time or transactional (hot), less frequent (warm), or archival (cold). -- [Introduction to PXF](intro_pxf.html) +The diagram below describes a data source that tracks monthly sales across many years. Real-time operational data is stored in MySQL. Data subject to analytic and business intelligence operations is stored in Greenplum Database. The rarely accessed, archival data resides in AWS S3. - This topic introduces PXF concepts and usage. +centered image -- [Administering PXF](about_pxf_dir.html) +When multiple, related data sets exist in external systems, it is often more efficient to join data sets remotely and return only the results, rather than negotiate the time and storage requirements of performing a rather expensive full data load operation. The *Greenplum Platform Extension Framework (PXF)*, a Greenplum extension that provides parallel, high throughput data access and federated query processing, provides this capability. - This set of topics details the administration of PXF including installation, configuration, initialization, upgrade, and management procedures. +With PXF, you can use Greenplum and SQL to query these heterogeneous data sources: -- [Accessing Hadoop with PXF](access_hdfs.html) +- Hadoop, Hive, and HBase +- Azure Blob Storage and Azure Data Lake +- AWS S3 +- Minio +- Google Cloud Storage +- SQL databases including Apache Ignite, Hive, MySQL, ORACLE, Microsoft SQL Server, DB2, and PostgreSQL (via JDBC) - This set of topics describe the PXF Hadoop connectors, the data types they support, and the profiles that you can use to read from and write to HDFS. +And these data formats: -- [Accessing Azure, Google Cloud Storage, Minio, and S3 Object Stores with PXF](access_objstore.html) +- Avro, AvroSequenceFile +- JSON +- ORC +- Parquet +- RCFile +- SequenceFile +- Text (plain, delimited, embedded line feeds) - This set of topics describe the PXF object storage connectors, the data types they support, and the profiles that you can use to read data from and write data to the object stores. +## Basic Usage -- [Accessing an SQL Database with PXF (JDBC)](jdbc_pxf.html) +You use PXF to map data from an external source to a Greenplum Database *external table* definition. You can then use the PXF external table and SQL to: - This topic describes how to use the PXF JDBC connector to read from and write to an external SQL database such as Postgres or MySQL. +- Perform queries on the external data, leaving the referenced data in place on the remote system. +- Load a subset of the external data into Greenplum Database. +- Run complex queries on local data residing in Greenplum tables and remote data referenced via PXF external tables. +- Write data to the external data source. -- [Troubleshooting PXF](troubleshooting_pxf.html) +Check out the [PXF introduction](intro_pxf.html) for a high level overview important PXF concepts. - This topic details the service- and database- level logging configuration procedures for PXF. It also identifies some common PXF errors and describes how to address PXF memory issues. +## Get Started Configuring PXF -- [PXF Utility Reference](ref/pxf-ref.html) +The Greenplum Database administrator manages PXF, Greenplum Database user privileges, and external data source configuration. Tasks include: - The PXF utility reference. +- [Installing](about_pxf_dir.html), [configuring](instcfg_pxf.html), [starting](cfginitstart_pxf.html), [monitoring](monitor_pxf.html), and [troubleshooting](troubleshooting_pxf.html) the PXF service. +- Managing PXF [upgrade](upgrade_pxf_6x.html) and [migration](migrate_5to6.html). +- [Configuring](cfg_server.html) and publishing one or more server definitions for each external data source. This definition specifies the location of, and access credentials to, the external data source. +- [Granting](using_pxf.html) Greenplum user access to PXF and PXF external tables. - +A Greenplum Database user [creates](intro_pxf.html#create_external_table) a PXF external table that references a file or other data in the external data source, and uses the external table to query or load the external data in Greenplum. Tasks are external data store-dependent: + +- See [Accessing Hadoop with PXF](access_hdfs.html) when the data resides in Hadoop. +- See [Accessing Azure, Google Cloud Storage, Minio, and S3 Object Stores with PXF](access_objstore.html) when the data resides in an object store. +- See [Accessing an SQL Database with PXF](jdbc_pxf.html) when the data resides in an external SQL database. From a478ed38434372b31998b3cb5aecf55c795215b7 Mon Sep 17 00:00:00 2001 From: ppggff Date: Tue, 25 Feb 2020 11:09:17 -0800 Subject: [PATCH 039/102] Fix AO insert, init memoryContext for executorReadBlock --- src/backend/access/appendonly/appendonlyam.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/backend/access/appendonly/appendonlyam.c b/src/backend/access/appendonly/appendonlyam.c index 588e46fcca84..e4d9b0580a0b 100755 --- a/src/backend/access/appendonly/appendonlyam.c +++ b/src/backend/access/appendonly/appendonlyam.c @@ -826,10 +826,13 @@ AppendOnlyExecutorReadBlock_Init(AppendOnlyExecutorReadBlock *executorReadBlock, { MemoryContext oldcontext; + AssertArg(MemoryContextIsValid(memoryContext)); + oldcontext = MemoryContextSwitchTo(memoryContext); executorReadBlock->uncompressedBuffer = (uint8 *) palloc(usableBlockSize * sizeof(uint8)); executorReadBlock->storageRead = storageRead; + executorReadBlock->memoryContext = memoryContext; MemoryContextSwitchTo(oldcontext); } From 301a50cf0d706a79a97c0bd71e5723459f4201c3 Mon Sep 17 00:00:00 2001 From: ppggff Date: Tue, 25 Feb 2020 11:09:33 -0800 Subject: [PATCH 040/102] Fix assert for AO insert where tup may change if relation need oids --- src/backend/access/appendonly/appendonlyam.c | 2 +- src/test/regress/expected/qp_dml_oids.out | 13 +++++++++++++ src/test/regress/expected/qp_dml_oids_optimizer.out | 13 +++++++++++++ src/test/regress/sql/qp_dml_oids.sql | 12 ++++++++++++ 4 files changed, 39 insertions(+), 1 deletion(-) diff --git a/src/backend/access/appendonly/appendonlyam.c b/src/backend/access/appendonly/appendonlyam.c index e4d9b0580a0b..79afe1b7ae15 100755 --- a/src/backend/access/appendonly/appendonlyam.c +++ b/src/backend/access/appendonly/appendonlyam.c @@ -2992,7 +2992,7 @@ appendonly_insert(AppendOnlyInsertDesc aoInsertDesc, */ Assert(itemPtr == NULL); Assert(!need_toast); - Assert(instup == tup); + Assert(relation->rd_rel->relhasoids || instup == tup); /* * "Cancel" the last block allocation, if one. diff --git a/src/test/regress/expected/qp_dml_oids.out b/src/test/regress/expected/qp_dml_oids.out index 148f43a3fb02..c7d9db753e82 100644 --- a/src/test/regress/expected/qp_dml_oids.out +++ b/src/test/regress/expected/qp_dml_oids.out @@ -344,6 +344,19 @@ SELECT COUNT(distinct oid) FROM dml_ao where a = 10; 2 (1 row) +-- +-- Check that 'toast' is disabled by GUC. +-- +set debug_appendonly_use_no_toast to on; +INSERT INTO dml_ao (a, b, c) VALUES (10, 3, repeat('x', 50000)); +INSERT INTO dml_ao (a, b, c) VALUES (10, 4, repeat('x', 50000)); +SELECT COUNT(distinct oid) FROM dml_ao where a = 10; + count +------- + 4 +(1 row) + +reset debug_appendonly_use_no_toast; -- -- Check that new OIDs are generated even if the tuple being inserted came from -- the same relation and segment. diff --git a/src/test/regress/expected/qp_dml_oids_optimizer.out b/src/test/regress/expected/qp_dml_oids_optimizer.out index a11dc6786ee7..0706dc007e6c 100644 --- a/src/test/regress/expected/qp_dml_oids_optimizer.out +++ b/src/test/regress/expected/qp_dml_oids_optimizer.out @@ -346,6 +346,19 @@ SELECT COUNT(distinct oid) FROM dml_ao where a = 10; 2 (1 row) +-- +-- Check that 'toast' is disabled by GUC. +-- +set debug_appendonly_use_no_toast to on; +INSERT INTO dml_ao (a, b, c) VALUES (10, 3, repeat('x', 50000)); +INSERT INTO dml_ao (a, b, c) VALUES (10, 4, repeat('x', 50000)); +SELECT COUNT(distinct oid) FROM dml_ao where a = 10; + count +------- + 4 +(1 row) + +reset debug_appendonly_use_no_toast; -- -- Check that new OIDs are generated even if the tuple being inserted came from -- the same relation and segment. diff --git a/src/test/regress/sql/qp_dml_oids.sql b/src/test/regress/sql/qp_dml_oids.sql index 24a58ffdf95d..600d3cb4c55a 100644 --- a/src/test/regress/sql/qp_dml_oids.sql +++ b/src/test/regress/sql/qp_dml_oids.sql @@ -193,6 +193,18 @@ INSERT INTO dml_ao (a, b, c) VALUES (10, 2, repeat('x', 50000)); SELECT COUNT(distinct oid) FROM dml_ao where a = 10; +-- +-- Check that 'toast' is disabled by GUC. +-- +set debug_appendonly_use_no_toast to on; + +INSERT INTO dml_ao (a, b, c) VALUES (10, 3, repeat('x', 50000)); +INSERT INTO dml_ao (a, b, c) VALUES (10, 4, repeat('x', 50000)); + +SELECT COUNT(distinct oid) FROM dml_ao where a = 10; + +reset debug_appendonly_use_no_toast; + -- -- Check that new OIDs are generated even if the tuple being inserted came from -- the same relation and segment. From f848a75bf9806ed6c0abbdbb9cea233ec7f6d808 Mon Sep 17 00:00:00 2001 From: Alexandra Wang Date: Tue, 25 Feb 2020 17:15:20 -0800 Subject: [PATCH 041/102] Sync check_function_bodies guc value to segments The issue was reported from field where below function was being created. ``` set check_function_bodies = false; -- wait for gp_vmem_idle_resource_timeout time and then run CREATE FUNCTION public.f1() RETURNS smallint AS $$ SELECT f2() $$ LANGUAGE sql; ``` Ideally, we don't need to check function bodies on QE as QD already does it. But GPDB6 and below we can't perform that optimization because of github issue #9620. Co-authored-by: Ashwin Agrawal --- src/include/utils/sync_guc_name.h | 1 + src/include/utils/unsync_guc_name.h | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/src/include/utils/sync_guc_name.h b/src/include/utils/sync_guc_name.h index d78cb1d10837..def32ea01a42 100644 --- a/src/include/utils/sync_guc_name.h +++ b/src/include/utils/sync_guc_name.h @@ -1,3 +1,4 @@ + "check_function_bodies", "client_min_messages", "commit_delay", "commit_siblings", diff --git a/src/include/utils/unsync_guc_name.h b/src/include/utils/unsync_guc_name.h index b47d31f4ebd5..09419a8ab720 100644 --- a/src/include/utils/unsync_guc_name.h +++ b/src/include/utils/unsync_guc_name.h @@ -28,7 +28,6 @@ "bonjour", "bonjour_name", "bytea_output", - "check_function_bodies", "checkpoint_completion_target", "checkpoint_segments", "checkpoint_timeout", From 34f42c0864d9a5de911ee4e7738c187deb9a25f2 Mon Sep 17 00:00:00 2001 From: Alexandra Wang Date: Tue, 25 Feb 2020 16:15:18 -0800 Subject: [PATCH 042/102] Reserve the fts connection only on segments Previously, the fts connection is reserved as super user connection on both master and primaries, however, fts does not need a connection to master, hence remove the reservation on master. Co-authored-by: Ashwin Agrawal Reviewed-by: Paul Guo Reviewed-by: Hubert Zhang --- src/backend/utils/init/postinit.c | 1 + src/backend/utils/misc/guc.c | 2 +- src/include/utils/guc.h | 5 +++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 13bc508dca03..0dc96c7a61dd 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -583,6 +583,7 @@ BaseInit(void) static void check_superuser_connection_limit() { if (!am_ftshandler && + !IS_QUERY_DISPATCHER() && !HaveNFreeProcs(RESERVED_FTS_CONNECTIONS)) ereport(FATAL, (errcode(ERRCODE_TOO_MANY_CONNECTIONS), diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 6320a8255eac..cfa53ce45d75 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1751,7 +1751,7 @@ static struct config_int ConfigureNamesInt[] = { {"superuser_reserved_connections", PGC_POSTMASTER, CONN_AUTH_SETTINGS, gettext_noop("Sets the number of connection slots reserved for " - "superusers (including reserved FTS connections)."), + "superusers (including reserved FTS connection for primaries)."), NULL }, &ReservedBackends, diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index dccf714e5eb7..f1de9465c437 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -23,8 +23,9 @@ #define MAX_PRE_AUTH_DELAY (60) /* * One connection must be reserved for FTS to always able to probe - * primary. So, this acts as lower limit on reserved superuser connections. -*/ + * primary. So, this acts as lower limit on reserved superuser connections on + * primaries. + */ #define RESERVED_FTS_CONNECTIONS (1) From 02102566e01aa3e75f61e0244920d8dfaa2c2920 Mon Sep 17 00:00:00 2001 From: Alexandra Wang Date: Tue, 25 Feb 2020 16:15:27 -0800 Subject: [PATCH 043/102] Bump superuser_reserved_connections to 10 As requested from field, 3 superuser connections is not enough for gpdb when customers run superuser maintenance scripts. 10 is the same value as the resource group admin_group's concurrency default limit. Co-authored-by: Ashwin Agrawal Reviewed-by: Paul Guo Reviewed-by: Hubert Zhang --- src/backend/utils/misc/guc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index cfa53ce45d75..9e88f8ed7b77 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1755,7 +1755,7 @@ static struct config_int ConfigureNamesInt[] = NULL }, &ReservedBackends, - 3, RESERVED_FTS_CONNECTIONS, MAX_BACKENDS, + 10, RESERVED_FTS_CONNECTIONS, MAX_BACKENDS, NULL, NULL, NULL }, From 0935df61505180b0c88f29c010d0fe7516c5f64d Mon Sep 17 00:00:00 2001 From: Tyler Ramer Date: Mon, 7 Oct 2019 14:25:32 -0400 Subject: [PATCH 044/102] Revert c93bb171637769d3ce64cd5f1367d831e0469924 The logic used in the initial commit is faulty and fragile. In the event that we want to force the master postmaster process to listen on a subset of addresses available on the system, it is most likely that we don't want to use the address(es) used by the interconnect. In the event of an external network and internal interconnect, the binding of "backend" listenning sockets to the "external" network would break the interconnect. Authored-by: Tyler Ramer --- src/backend/cdb/motion/ic_udpifc.c | 3 +-- src/backend/postmaster/postmaster.c | 7 ------- src/include/postmaster/postmaster.h | 1 - 3 files changed, 1 insertion(+), 10 deletions(-) diff --git a/src/backend/cdb/motion/ic_udpifc.c b/src/backend/cdb/motion/ic_udpifc.c index 705222c7e6fb..72b7ec880668 100644 --- a/src/backend/cdb/motion/ic_udpifc.c +++ b/src/backend/cdb/motion/ic_udpifc.c @@ -38,7 +38,6 @@ #include "port/pg_crc32c.h" #include "storage/latch.h" #include "storage/pmsignal.h" -#include "postmaster/postmaster.h" #include "utils/builtins.h" #include "utils/guc.h" #include "utils/memutils.h" @@ -1193,7 +1192,7 @@ setupUDPListeningSocket(int *listenerSocketFd, uint16 *listenerPort, int *txFami #endif fun = "getaddrinfo"; - s = getaddrinfo(BackendListenAddress, service, &hints, &addrs); + s = getaddrinfo(NULL, service, &hints, &addrs); if (s != 0) elog(ERROR, "getaddrinfo says %s", gai_strerror(s)); diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 11403fd2a0d5..1ae386fd2044 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -216,7 +216,6 @@ char *Unix_socket_directories; /* The TCP listen address(es) */ char *ListenAddresses; -char *BackendListenAddress; /* * ReservedBackends is the number of backends reserved for superuser use. @@ -1172,12 +1171,6 @@ PostmasterMain(int argc, char *argv[]) "listen_addresses"))); } - /* If there are more than one listen address, backend bind on all addresses*/ - if (list_length(elemlist) > 1 || strcmp(ListenAddresses, "*") == 0) - BackendListenAddress = NULL; - else - BackendListenAddress = ListenAddresses; - foreach(l, elemlist) { char *curhost = (char *) lfirst(l); diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h index c4185add2eec..4cb5646bf8dd 100644 --- a/src/include/postmaster/postmaster.h +++ b/src/include/postmaster/postmaster.h @@ -21,7 +21,6 @@ extern int Unix_socket_permissions; extern char *Unix_socket_group; extern char *Unix_socket_directories; extern char *ListenAddresses; -extern char *BackendListenAddress; extern bool ClientAuthInProgress; extern int PreAuthDelay; extern int AuthenticationTimeout; From 5483f9876f1fffa17bb764a9503ac4eb9a6c52df Mon Sep 17 00:00:00 2001 From: Sambitesh Dash Date: Wed, 26 Feb 2020 12:18:08 -0800 Subject: [PATCH 045/102] Bump ORCA to v3.93.0 --- concourse/tasks/compile_gpdb.yml | 2 +- config/orca.m4 | 4 ++-- configure | 4 ++-- depends/conanfile_orca.txt | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/concourse/tasks/compile_gpdb.yml b/concourse/tasks/compile_gpdb.yml index a3eeddbb9e0c..dc5d1a3d3a5e 100644 --- a/concourse/tasks/compile_gpdb.yml +++ b/concourse/tasks/compile_gpdb.yml @@ -19,5 +19,5 @@ params: BLD_TARGETS: OUTPUT_ARTIFACT_DIR: gpdb_artifacts CONFIGURE_FLAGS: - ORCA_TAG: v3.92.0 + ORCA_TAG: v3.93.0 RC_BUILD_TYPE_GCS: diff --git a/config/orca.m4 b/config/orca.m4 index 041b65f4c752..ed9328808193 100644 --- a/config/orca.m4 +++ b/config/orca.m4 @@ -40,10 +40,10 @@ AC_RUN_IFELSE([AC_LANG_PROGRAM([[ #include ]], [ -return strncmp("3.92.", GPORCA_VERSION_STRING, 5); +return strncmp("3.93.", GPORCA_VERSION_STRING, 5); ])], [AC_MSG_RESULT([[ok]])], -[AC_MSG_ERROR([Your ORCA version is expected to be 3.92.XXX])] +[AC_MSG_ERROR([Your ORCA version is expected to be 3.93.XXX])] ) AC_LANG_POP([C++]) ])# PGAC_CHECK_ORCA_VERSION diff --git a/configure b/configure index 0975b877be31..e908f34d4af8 100755 --- a/configure +++ b/configure @@ -14948,7 +14948,7 @@ int main () { -return strncmp("3.92.", GPORCA_VERSION_STRING, 5); +return strncmp("3.93.", GPORCA_VERSION_STRING, 5); ; return 0; @@ -14958,7 +14958,7 @@ if ac_fn_cxx_try_run "$LINENO"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 $as_echo "ok" >&6; } else - as_fn_error $? "Your ORCA version is expected to be 3.92.XXX" "$LINENO" 5 + as_fn_error $? "Your ORCA version is expected to be 3.93.XXX" "$LINENO" 5 fi rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ diff --git a/depends/conanfile_orca.txt b/depends/conanfile_orca.txt index 1fe3e26e6211..e144cc0c49a0 100644 --- a/depends/conanfile_orca.txt +++ b/depends/conanfile_orca.txt @@ -1,5 +1,5 @@ [requires] -orca/v3.92.0@gpdb/stable +orca/v3.93.0@gpdb/stable [imports] include, * -> build/include From 74e6151e1c865c8773813c0c59e308d681a247a9 Mon Sep 17 00:00:00 2001 From: David Yozie Date: Wed, 26 Feb 2020 15:50:06 -0800 Subject: [PATCH 046/102] Docs: 'V2' updates to cloud best practices --- .../cloud/gpdb-cloud-tech-rec.html.md.erb | 131 +++++++----------- 1 file changed, 51 insertions(+), 80 deletions(-) diff --git a/gpdb-doc/markdown/cloud/gpdb-cloud-tech-rec.html.md.erb b/gpdb-doc/markdown/cloud/gpdb-cloud-tech-rec.html.md.erb index e6bb5f050ace..73bcbe320c20 100644 --- a/gpdb-doc/markdown/cloud/gpdb-cloud-tech-rec.html.md.erb +++ b/gpdb-doc/markdown/cloud/gpdb-cloud-tech-rec.html.md.erb @@ -1,6 +1,6 @@ --- title: "Greenplum Database Cloud Technical Recommendations" -date: January 17, 2019 +date: February 18, 2020 author: Jon Roberts --- @@ -14,7 +14,7 @@ Add the following line to `sysctl.conf`: net.ipv4.ip_local_reserved_ports=65330 ``` -AWS requires loading network drivers and also altering the AMI to use the faster networking capabilities. More information on this is provided in the AWS documentation. +AWS requires loading network drivers and also altering the Amazon Machine Image (AMI) to use the faster networking capabilities. More information on this is provided in the AWS documentation. ## Storage @@ -26,19 +26,18 @@ The disk settings for cloud deployments are the same as on-premise with a few mo ```
    **Note:** The `nobarrier` option is not supported on Ubuntu nodes. - Use mq-deadline instead of the deadline scheduler for the R5 series instance type in AWS -- For clusters requiring software RAID, use level 0 and chunk size of 256 - Use a swap disk per VM (32GB size works well) ## Security -It is highly encouraged to disable password authentication to the virtual machines in the cloud and use SSH keys instead. Using MD5-encrypted passwords for Greenplum Database is also a good practice. +It is highly encouraged to disable SSH password authentication to the virtual machines in the cloud and use SSH keys instead. Using MD5-encrypted passwords for Greenplum Database is also a good practice. ## Amazon Web Services (AWS) ### Virtual Machine Type -AWS provides a wide variety of virtual machine types and sizes to address virtually every use case. Testing in AWS has found that the optimal instance types for Greenplum are "Memory Optimized" and "Storage Optimized". These provide the ideal balance of Memory, Network, and Storage throughput, and Compute capabilities. +AWS provides a wide variety of virtual machine types and sizes to address virtually every use case. Testing in AWS has found that the optimal instance types for Greenplum are "Memory Optimized". These provide the ideal balance of Price, Memory, Network, and Storage throughput, and Compute capabilities. -Virtual Machine network and disk throughput limits increase as CPU and memory sizes increase. This means the larger instance types are recommended for Greenplum so that it can provide 10Gbit or better network performance. +Price, Memory, and number of cores typically increase in a linear fashion, but the network speed and disk throughput limits do not. You may be tempted to use the largest instance type to get the highest network and disk speed possible per VM, but better overall performance for the same spend on compute resources can be obtained by using more VMs that are smaller in size. #### Compute AWS uses Hyperthreading when reporting the number of vCPUs, therefore 2 vCPUs equates to 1 Core. The processor types are frequently getting faster so using the latest instance type will be not only faster, but usually less expensive. For example, the R5 series provides faster cores at a lower cost compared to R4. @@ -47,76 +46,50 @@ AWS uses Hyperthreading when reporting the number of vCPUs, therefore 2 vCPUs eq This variable is pretty simple. Greenplum needs at least 8GB of RAM per segment process to work optimally. More RAM per segment helps with concurrency and also helps hide disk performance deficiencies. #### Network -AWS provides 25Gbit network performance on the largest instance types. This alleviates any network bottleneck concerns for large Greenplum clusters in AWS. Additionally, instance types provide 10Gbit as well as "up to 10Gbit" network performance. For production workloads, Pivotal requires 10Gbit or better network performance. +AWS provides 25Gbit network performance on the largest instance types, but the network is typically not the bottleneck in AWS. The "up to 10Gbit" network is sufficient in AWS. -Loading network drivers is also required in AWS and depends on the instance type. Some instance types use an Intel driver while others use an Amazon ENA driver. Loading the driver requires modifying the machine image (AMI). +Installing network drivers in the VM is also required in AWS, and depends on the instance type. Some instance types use an Intel driver while others use an Amazon ENA driver. Loading the driver requires modifying the machine image (AMI) to take advantage of the driver. ### Storage -#### EBS -The AWS default disk type is General Performance (GP2) which is ideal for IOP dependent applications. It has relatively poor performance for throughput. The operating system and swap volumes are ideal for GP2. GP2 uses SSD disks and, relative to other disk types in AWS, is expensive. +#### Elastic Block Storage (EBS) +The AWS default disk type is General Performance (GP2) which is ideal for IOP dependent applications. GP2 uses SSD disks and relative to other disk types in AWS, is expensive. The operating system and swap volumes are ideal for GP2 disks because of the size and higher random I/O needs. -Throughput Optimized Disks (ST1) is ideal for throughput and thus, ideal for Greenplum. These disks are based on HDD rather than SSD and are less expensive than GP2. Performance of ST1 disks is influenced by the disk size and peaks at 12.5TB. However, the larger instance types have throughput limits that are larger than what a single ST1 disk can provide. Therefore, up to 4 ST1 disks are needed to reach the throughput limit of a given virtual machine. +Throughput Optimized Disks (ST1) are a disk type designed for high throughput needs such as Greenplum. These disks are based on HDD rather than SSD, and are less expensive than GP2. Use this disk type for the optimal performance of loading and querying data in AWS. -EBS storage is durable so data is not lost when a virtual machine is stopped. EBS also provides infrastructure snapshot capabilities that can be used to create volume backups. These snapshots can be copied to different regions to provide a disaster recovery solution. The Greenplum Cloud utility `gpsnap`, available in the AWS Cloud Marketplace, automates backup, restore, delete, and copy functions using EBS snapshots. - -#### Ephemeral - -Ephemeral storage is the last storage option and is available on the Storage Optimized instance types. These disks are directly attached and have up to 24 2TB disks per virtual machine. These are the instance types that Redshift uses. - -The main problem with Ephemeral storage is the durability. If you stop a VM with Ephemeral storage, all data is lost. The second problem is the number of disks. There are more disks (24) than Greenplum can have segments per host. Therefore, software RAID is needed to match the number of mounts to the number of segments. Note that mount options for software RAID can greatly impact performance. - -Instance types that have Ephemeral storage also are more expensive than Instance types that do not. You have to balance the cost of compute plus storage to determine which is the best for a particular use case. - -#### EBS vs Ephemeral +Cold Storage (SC1) provides the best value for EBS storage in AWS. Using multiple 2TB or larger disks provides enough disk throughput to reach the throughput limit of many different instance types. Therefore, it is possible to reach the throughput limit of a VM by using SC1 disks. -Performance testing has found that there is virtually no performance difference for Greenplum when comparing EBS vs Ephemeral storage. Therefore, Pivotal recommends using EBS storage so that the disks are durable and disk snapshots can be used for backup and disaster recovery. +EBS storage is durable so data is not lost when a virtual machine is stopped. EBS also provides infrastructure snapshot capabilities that can be used to create volume backups. These snapshots can be copied to different regions to provide a disaster recovery solution. The Greenplum Cloud utility `gpsnap`, available in the AWS Cloud Marketplace, automates backup, restore, delete, and copy functions using EBS snapshots. -### AWS Recommendations -#### R5 Series +Storage can be grown in AWS with "gpgrow". This tool is included with the Greenplum on AWS deployment and allows you to grow the storage independently of compute. This is an online operation in AWS too. -| Instance Type | Storage Type | Storage Size | Memory | vCPUs | Network Speed | Use | -|---------------|---------------|------------|--------|------|---------------|-------------------------------| -| r5.xlarge | EBS | 6TB | 32 | 4 | Up to 10GBit | Dev/Test | -| r5.xlarge | EBS Encrypted | 6TB | 32 | 4 | Up to 10GBit | Dev/Test | -| r5.2xlarge | EBS | 12TB | 64 | 8 | Up to 10GBit | Dev/Test | -| r5.2xlarge | EBS Encrypted | 12TB | 64 | 8 | Up to 10GBit | Dev/Test | -| r5.4xlarge | EBS | 24TB | 128 | 16 | Up to 10GBit | Dev/Test | -| r5.4xlarge | EBS Encrypted | 24TB | 128 | 16 | Up to 10GBit | Dev/Test | -| r5.12xlarge | EBS | 48TB | 384 | 48 | 10GBit | Production | -| r5.12xlarge | EBS Encrypted | 48TB | 384 | 48 | 10GBit | Production | -| r5.24xlarge | EBS | 48TB | 768 | 96 | 25GBit | Production - High Concurrency | -| r5.24xlarge | EBS Encrypted | 48TB | 768 | 96 | 25GBit | Production - High Concurrency | +#### Ephemeral -#### R4 Series +Ephemeral Storage is directly attached to VMs, but has many drawbacks: +- Data loss when stopping a VM with ephemeral storage +- Encryption is not supported +- No Snapshots +- Same speed can be achieved with EBS storage +- Not recommended -| Instance Type | Storage Type | Storage Size | Memory | vCPUs | Network Speed | Use | -|---------------|---------------|--------------|--------|-------|---------------|-------------------------------| -| r4.xlarge | EBS | 6TB | 30.5 | 4 | Up to 10GBit | Dev/Test | -| r4.xlarge | EBS Encrypted | 6TB | 30.5 | 4 | Up to 10GBit | Dev/Test | -| r4.2xlarge | EBS | 12TB | 61 | 8 | Up to 10GBit | Dev/Test | -| r4.2xlarge | EBS Encrypted | 12TB | 61 | 8 | Up to 10GBit | Dev/Test | -| r4.4xlarge | EBS | 24TB | 122 | 16 | Up to 10GBit | Dev/Test | -| r4.4xlarge | EBS Encrypted | 24TB | 122 | 16 | Up to 10GBit | Dev Test | -| r4.8xlarge | EBS | 48TB | 244 | 32 | 10GBit | Production | -| r4.8xlarge | EBS Encrypted | 48TB | 244 | 32 | 10GBit | Production | -| r4.16xlarge | EBS | 48TB | 488 | 64 | 25GBit | Production - High Concurrency | -| r4.16xlarge | EBS Encrypted | 48TB | 488 | 64 | 25GBit | Production - High Concurrency | +#### AWS Recommendations -#### D2 Series +##### Master +| Instance Type | Memory | vCPUs | Data Disks | +| ------------- | ------ | ----- | ---------- | +| r5.xlarge | 32 | 4 | 1 | +| r5.2xlarge | 64 | 8 | 1 | +| r5.4xlarge | 128 | 16 | 1 | -Storage Optimized instances with local HDD ephemeral storage that is optimized for throughput. Ephemeral storage is lost if the nodes are stopped. +##### Segments +| Instance Type | Memory | vCPUs | Data Disks | +| ------------- | ------ | ----- | ---------- | +| r5.4xlarge | 128 | 16 | 3 | -- Does not support snapshot backups using the Greenplum Cloud `gpsnap` utility -- Data loss when nodes are stopped +Performance testing has indicated that the Master node can be deployed on the smallest r5.xlarge instance type to save money without a measurable difference in performance. Testing was performed using the TPC-DS benchmark. -| Instance Type | Storage Type | Storage Size | Memory | vCPUs | Network Speed | Use | -|---------------|--------------|--------------|--------|-------|---------------|------------| -| d2.xlarge | Ephemeral | 6TB | 30.5 | 4 | Moderate | Dev/Test | -| d2.2xlarge | Ephemeral | 12TB | 61 | 8 | High | Dev/Test | -| d2.4xlarge | Ephemeral | 24TB | 122 | 16 | High | Dev/Test | -| d2.8xlarge | Ephemeral | 48TB | 244 | 36 | 10GBit | Production | +The Segment instances run optimally on the r5.4xlarge instance type. This provides the highest performance given the cost of the AWS resources. ## Google Compute Platform (GCP) @@ -142,12 +115,11 @@ Testing has revealed that _while using the same number of vCPUs_, a cluster usin The HighMem instance type is slightly faster for higher concurrency. Furthermore, SSD disks are slightly faster also but come at a cost. -| Instance Type | Storage Type | Storage Size | Memory | vCPUs | Network Speed | Use | -|---------------|--------------|--------------|--------|-------|---------------|----------------------------------------| -| n1-standard-8 | HDD | 6TB or 3TB | 30 | 8 | 10Gbit | Dev/Test - Production | -| n1-standard-8 | SSD | 1.4TB | 30 | 8 | 10Gbit | Dev/Test - Production | -| n1-highmem-8 | HDD | 6TB or 3TB | 52 | 8 | 10Gbit | Dev/Test - Production | -| n1-highmem-8 | SSD | 1.4TB | 52 | 8 | 10Gbit | Dev/Test - Production High Concurrency | +##### Master and Segment Instances +| Instance Type | Memory | vCPUs | Data Disks | +| ------------- | ------ | ----- | ---------- | +| n1-standard-8 | 30 | 8 | 1 | +| n1-highmem-8 | 52 | 8 | 1 | ## Azure @@ -183,18 +155,17 @@ Software RAID not only is a little bit slower, but it also requires `umount` to Disks use the same network as the VMs so you start running into the Azure limits in bigger clusters when using big virtual machines with 32 disks on each one. The overall throughput drops as you hit this limit and is most noticeable during concurrency testing. ### Azure Recommendations -The best instance type to use in Azure is "Standard_H16" which is one of their High Performance Compute instance types. This instance series is the only one utilizing InfiniBand, but this does not include IP traffic. - - -| Instance Type | Storage Type | Storage Size | Memory | vCPUs3 | Use | Notes | -|---------------------------|--------------|-------------------|--------|-------------------|-----------------------|--------------------------| -| Standard_D14_v2 | HDD | 8x2TB2 | 112 | 16 | Dev/Test - Production | Use if HPC not available | -| Standard_H81 | HDD | 4x2TB2 | 56 | 8 | Dev/Test - Production | | -| Standard_H161 | HDD | 8x2TB2 | 112 | 16 | Dev/Test - Production | Fastest | - -1 Not all regions have HPC instance types. - -2 Use 2 disks in each RAID 0 volume. - -3 Some, but not all, Azure VMs have hyperthreading. If hyperthreading is not enabled, 1 vCPU = 1 Core. +The best instance type to use in Azure is "Standard_H8" which is one of the High Performance Compute instance types. This instance series is the only one utilizing InfiniBand, but this does not include IP traffic. Because this instance type is n0t available in all regions, the "Standard_D13_v2" is also available. + +##### Master +| Instance Type | Memory | vCPUs | Data Disks | +| ------------- | ------ | ----- | ---------- | +| D13_v2 | 56 | 8 | 1 | +| H8 | 56 | 8 | 1 | + +##### Segments +| Instance Type | Memory | vCPUs | Data Disks | +| ------------- | ------ | ----- | ---------- | +| D13_v2 | 56 | 8 | 2 | +| H8 | 56 | 8 | 2 | From 39a2262c949aa23dbd45010807711dded98a1da2 Mon Sep 17 00:00:00 2001 From: "Huiliang.liu" Date: Thu, 27 Feb 2020 15:28:26 +0800 Subject: [PATCH 047/102] Don't report internal error if gpfdist session is closed (#9592) In the case of something like "select ... limit 1;", there may be attach request after session is closed. Then we should ignore the error code of session and just return a empty HTTP OK. --- src/bin/gpfdist/gpfdist.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/bin/gpfdist/gpfdist.c b/src/bin/gpfdist/gpfdist.c index a208eafe54ac..6489716e2f2d 100644 --- a/src/bin/gpfdist/gpfdist.c +++ b/src/bin/gpfdist/gpfdist.c @@ -1638,14 +1638,6 @@ static int session_attach(request_t* r) /* found a session in hashtable*/ - /* if error, send an error and close */ - if (session->is_error) - { - http_error(r, FDIST_INTERNAL_ERROR, "session error"); - request_end(r, 1, 0); - return -1; - } - /* session already ended. send an empty response, and close. */ if (NULL == session->fstream) { @@ -1656,6 +1648,14 @@ static int session_attach(request_t* r) return -1; } + /* if error, send an error and close */ + if (session->is_error) + { + http_error(r, FDIST_INTERNAL_ERROR, "session error"); + request_end(r, 1, 0); + return -1; + } + /* * disallow mixing GET and POST requests in one session. * this will protect us from an infinitely running From 26211fe29815587864616aea6702ec6ce2c6c4b3 Mon Sep 17 00:00:00 2001 From: Daniel Gustafsson Date: Thu, 27 Feb 2020 12:11:14 +0100 Subject: [PATCH 048/102] Fix incorrect spelling in docs/comments This fixes multiple occurrences of duplicated words in sentences, like "the the" and "is is" etc. Backported from master 1d44a0c5ac078b41137ddb6fd99282ad6913159a Reviewed-by: Mel Kiyama Reviewed-by: Heikki Linnakangas --- .../scripts/configurations/pg_upgrade_gpinitsystem_config | 2 +- doc/src/sgml/ref/create_role.sgml | 2 +- gpMgmt/bin/gpexpand | 2 +- gpMgmt/bin/gppylib/gparray.py | 2 +- gpMgmt/bin/gppylib/system/info.py | 2 +- gpMgmt/bin/gpssh-exkeys | 2 +- gpMgmt/doc/gpconfigs/gpinitsystem_config | 2 +- gpMgmt/doc/gpconfigs/gpinitsystem_test | 2 +- gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature | 2 +- gpdb-doc/dita/admin_guide/kerberos-lin-client.xml | 2 +- gpdb-doc/dita/admin_guide/managing/maintain.xml | 2 +- gpdb-doc/dita/best_practices/encryption.xml | 2 +- gpdb-doc/dita/install_guide/prep_os.xml | 2 +- gpdb-doc/dita/ref_guide/sql_commands/CREATE_ROLE.xml | 2 +- gpdb-doc/dita/security-guide/topics/Authorization.xml | 2 +- gpdb-doc/dita/utility_guide/ref/gpmapreduce.xml | 2 +- gpdb-doc/markdown/pxf/cfg_server.html.md.erb | 2 +- gpdb-doc/markdown/pxf/hdfs_json.html.md.erb | 2 +- src/backend/access/transam/xlogreader.c | 2 +- src/backend/cdb/cdbappendonlystorageread.c | 2 +- src/backend/cdb/cdbsreh.c | 2 +- src/backend/cdb/cdbvarblock.c | 2 +- src/backend/cdb/dispatcher/cdbconn.c | 2 +- src/backend/cdb/dispatcher/cdbdisp_query.c | 2 +- src/backend/cdb/motion/ic_udpifc.c | 2 +- src/backend/commands/explain.c | 4 ++-- src/backend/commands/tablecmds.c | 2 +- src/backend/executor/execMain.c | 2 +- src/backend/executor/nodeShareInputScan.c | 2 +- src/backend/fts/README | 4 ++-- src/backend/gpopt/translate/CTranslatorQueryToDXL.cpp | 2 +- src/backend/replication/logical/reorderbuffer.c | 2 +- src/backend/storage/ipc/procarray.c | 2 +- src/backend/utils/adt/complex_type.c | 2 +- src/backend/utils/adt/datetime.c | 2 +- src/backend/utils/datumstream/test/datumstreamblock_test.c | 4 ++-- src/backend/utils/hyperloglog/gp_hyperloglog.c | 4 ++-- src/backend/utils/misc/guc_gp.c | 2 +- src/backend/utils/resscheduler/resscheduler.c | 2 +- src/include/executor/execdesc.h | 2 +- src/include/gppc/gppc_config.h | 2 +- src/test/regress/expected/bfv_aggregate.out | 2 +- src/test/regress/expected/bfv_aggregate_optimizer.out | 2 +- src/test/regress/expected/gp_constraints.out | 2 +- src/test/regress/explain.pl | 4 ++-- src/test/regress/gpsourcify.pl | 2 +- src/test/regress/sql/bfv_aggregate.sql | 2 +- src/test/regress/sql/gp_constraints.sql | 2 +- 48 files changed, 53 insertions(+), 53 deletions(-) diff --git a/concourse/scripts/configurations/pg_upgrade_gpinitsystem_config b/concourse/scripts/configurations/pg_upgrade_gpinitsystem_config index 49cb72756311..56e2da0ef0de 100644 --- a/concourse/scripts/configurations/pg_upgrade_gpinitsystem_config +++ b/concourse/scripts/configurations/pg_upgrade_gpinitsystem_config @@ -66,5 +66,5 @@ ENCODING=UNICODE #DATABASE_NAME=name_of_database #### Specify the location of the host address file here instead of -#### with the the -h option of gpinitsystem. +#### with the -h option of gpinitsystem. #MACHINE_LIST_FILE=/home/gpadmin/gpconfigs/hostfile_gpinitsystem diff --git a/doc/src/sgml/ref/create_role.sgml b/doc/src/sgml/ref/create_role.sgml index 0843ad6f22e3..941e96695777 100755 --- a/doc/src/sgml/ref/create_role.sgml +++ b/doc/src/sgml/ref/create_role.sgml @@ -320,7 +320,7 @@ CREATE ROLE name [ [ WITH ] RESOURCE GROUP group_name - The name of the resource group to assign to the the new role. The + The name of the resource group to assign to the new role. The role will be subject to the concurrent transaction, memory, and CPU limits configured for the resource group. You can assign a single resource group to one or more roles. diff --git a/gpMgmt/bin/gpexpand b/gpMgmt/bin/gpexpand index a99a4caeca34..ac818d3103bf 100755 --- a/gpMgmt/bin/gpexpand +++ b/gpMgmt/bin/gpexpand @@ -1574,7 +1574,7 @@ class gpexpand: self.statusLogger.set_status('PREPARE_EXPANSION_SCHEMA_DONE') self.statusLogger.set_status('EXPANSION_PREPARE_DONE') - # At this point, no rollback is possible and the the system + # At this point, no rollback is possible and the system # including new segments has been started once before so finalize self.finalize_prepare() diff --git a/gpMgmt/bin/gppylib/gparray.py b/gpMgmt/bin/gppylib/gparray.py index 1aa8b8d942a9..50171736a3c7 100755 --- a/gpMgmt/bin/gppylib/gparray.py +++ b/gpMgmt/bin/gppylib/gparray.py @@ -1499,7 +1499,7 @@ def addExpansionSeg(self, content, preferred_role, dbid, role, def reOrderExpansionSegs(self): """ The expansion segments content ID may have changed during the expansion. - This method will re-order the the segments into their proper positions. + This method will re-order the segments into their proper positions. Since there can be no gaps in the content id (see validateExpansionSegs), the self.expansionSegmentPairs list is the same length. """ diff --git a/gpMgmt/bin/gppylib/system/info.py b/gpMgmt/bin/gppylib/system/info.py index a542d9e92419..24d2212d6971 100644 --- a/gpMgmt/bin/gppylib/system/info.py +++ b/gpMgmt/bin/gppylib/system/info.py @@ -29,7 +29,7 @@ def get_max_available_thread_count(): # assuming a generous 10K bytes per line of error output, # 20 MB allows 2000 errors in a single run; if user has more, # we will explain in the manual - # the the user can always set batch (number of threads) manually + # the user can always set batch (number of threads) manually thread_size = 20 * MB + stack_size mem = psutil.virtual_memory() diff --git a/gpMgmt/bin/gpssh-exkeys b/gpMgmt/bin/gpssh-exkeys index 3a7af059e6aa..f864f5f9bde4 100755 --- a/gpMgmt/bin/gpssh-exkeys +++ b/gpMgmt/bin/gpssh-exkeys @@ -541,7 +541,7 @@ try: ###################### # step 1 # - # Creates an SSH id_rsa key pair for for the current user if not already available + # Creates an SSH id_rsa key pair for the current user if not already available # and appends the id_rsa.pub key to the local authorized_keys file. # print '[STEP 1 of 5] create local ID and authorize on local host' diff --git a/gpMgmt/doc/gpconfigs/gpinitsystem_config b/gpMgmt/doc/gpconfigs/gpinitsystem_config index 19c7f748ea2a..d343a3ad87a1 100644 --- a/gpMgmt/doc/gpconfigs/gpinitsystem_config +++ b/gpMgmt/doc/gpconfigs/gpinitsystem_config @@ -66,5 +66,5 @@ ENCODING=UNICODE #DATABASE_NAME=name_of_database #### Specify the location of the host address file here instead of -#### with the the -h option of gpinitsystem. +#### with the -h option of gpinitsystem. #MACHINE_LIST_FILE=/home/gpadmin/gpconfigs/hostfile_gpinitsystem diff --git a/gpMgmt/doc/gpconfigs/gpinitsystem_test b/gpMgmt/doc/gpconfigs/gpinitsystem_test index 9a1fa79dff19..7436ae743a98 100644 --- a/gpMgmt/doc/gpconfigs/gpinitsystem_test +++ b/gpMgmt/doc/gpconfigs/gpinitsystem_test @@ -66,5 +66,5 @@ declare -a MIRROR_DATA_DIRECTORY=(/media/ephemeral1/mirror /media/ephemeral1/mir #DATABASE_NAME=name_of_database #### Specify the location of the host address file here instead of -#### with the the -h option of gpinitsystem. +#### with the -h option of gpinitsystem. MACHINE_LIST_FILE=hostfile_gpinitsystem diff --git a/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature b/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature index 0dd154db173b..6768ad1cdf0d 100644 --- a/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature +++ b/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature @@ -241,7 +241,7 @@ Feature: gprecoverseg tests And gprecoverseg should print "Heap checksum setting is consistent between master and the segments that are candidates for recoverseg" to stdout And all the segments are running And the segments are synchronized - # validate the the new segment has the correct setting by getting admin connection to that segment + # validate the new segment has the correct setting by getting admin connection to that segment Then the saved primary segment reports the same value for sql "show data_checksums" db "template1" as was saved @concourse_cluster diff --git a/gpdb-doc/dita/admin_guide/kerberos-lin-client.xml b/gpdb-doc/dita/admin_guide/kerberos-lin-client.xml index 9e84a5230482..34eb9ed41d4a 100644 --- a/gpdb-doc/dita/admin_guide/kerberos-lin-client.xml +++ b/gpdb-doc/dita/admin_guide/kerberos-lin-client.xml @@ -103,7 +103,7 @@ for the Greenplum Database user.
  • Run kinit specifying the keytab file to create a ticket on the client machine. For this example, the keytab file - gpdb-kerberos.keytab is in the the current directory. The ticket cache + gpdb-kerberos.keytab is in the current directory. The ticket cache file is in the gpadmin user home directory. > kinit -k -t gpdb-kerberos.keytab -c /home/gpadmin/cache.txt    gpadmin/kerberos-gpdb@KRB.EXAMPLE.COM
  • diff --git a/gpdb-doc/dita/admin_guide/managing/maintain.xml b/gpdb-doc/dita/admin_guide/managing/maintain.xml index efc4f5c8d71f..161530143d4d 100644 --- a/gpdb-doc/dita/admin_guide/managing/maintain.xml +++ b/gpdb-doc/dita/admin_guide/managing/maintain.xml @@ -35,7 +35,7 @@ VACUUM FULL ignores the value of gp_appendonly_compaction_threshold and rewrites the segment file regardless of the ratio.

    -

    You can use the __gp_aovisimap_compaction_info() function in the the +

    You can use the __gp_aovisimap_compaction_info() function in the gp_toolkit schema to investigate the effectiveness of a VACUUM operation on append-optimized tables.

    For information about the __gp_aovisimap_compaction_info() function see, diff --git a/gpdb-doc/dita/best_practices/encryption.xml b/gpdb-doc/dita/best_practices/encryption.xml index 4ca0590dbf74..e0b8e734dd25 100644 --- a/gpdb-doc/dita/best_practices/encryption.xml +++ b/gpdb-doc/dita/best_practices/encryption.xml @@ -277,7 +277,7 @@ ssb 2048R/4FD2EFBB 2015-01-13 # gpg -a --export-secret-keys 2027CC30 > secret.key

    See the pgcrypto documentation for for more information about PGP + scope="external">pgcrypto documentation for more information about PGP encryption functions.

    diff --git a/gpdb-doc/dita/install_guide/prep_os.xml b/gpdb-doc/dita/install_guide/prep_os.xml index 688801844fa7..ac67750ac57b 100644 --- a/gpdb-doc/dita/install_guide/prep_os.xml +++ b/gpdb-doc/dita/install_guide/prep_os.xml @@ -243,7 +243,7 @@ kernel.shmmax = 810810728448 65535, set the Greenplum Database base port numbers to these values.

    PORT_BASE = 6000 MIRROR_PORT_BASE = 7000 -

    For information about the the gpinitsystem cluster configuration file, +

    For information about the gpinitsystem cluster configuration file, see Initializing a Greenplum Database System.

    For Azure deployments with Greenplum Database avoid using port 65330; add the following diff --git a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_ROLE.xml b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_ROLE.xml index 584cbd0df124..920589dbfaaf 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_ROLE.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_ROLE.xml @@ -185,7 +185,7 @@ RESOURCE GROUP group_name - The name of the resource group to assign to the the new role. The role will + The name of the resource group to assign to the new role. The role will be subject to the concurrent transaction, memory, and CPU limits configured for the resource group. You can assign a single resource group to one or more roles. diff --git a/gpdb-doc/dita/security-guide/topics/Authorization.xml b/gpdb-doc/dita/security-guide/topics/Authorization.xml index f91c92ac7ce1..6e46da70a24f 100644 --- a/gpdb-doc/dita/security-guide/topics/Authorization.xml +++ b/gpdb-doc/dita/security-guide/topics/Authorization.xml @@ -343,7 +343,7 @@ DENY DAY 'Sunday'

    The following examples demonstrate creating a role with time-based constraints and modifying a role to add time-based constraints. Only the statements needed for time-based constraints are shown. For more details on creating and altering roles see the descriptions - of CREATE ROLE and ALTER ROLE in in the Greenplum + of CREATE ROLE and ALTER ROLE in the Greenplum Database Reference Guide.

    Example 1 – Create a New Role with Time-based Constraints diff --git a/gpdb-doc/dita/utility_guide/ref/gpmapreduce.xml b/gpdb-doc/dita/utility_guide/ref/gpmapreduce.xml index 705dc544609b..4b6c78dd76ac 100644 --- a/gpdb-doc/dita/utility_guide/ref/gpmapreduce.xml +++ b/gpdb-doc/dita/utility_guide/ref/gpmapreduce.xml @@ -34,7 +34,7 @@
  • You must be a Greenplum Database superuser to run MapReduce jobs with EXEC and FILE inputs.
  • You must be a Greenplum Database superuser to run MapReduce jobs with - GPFDIST input unless the the user has the appropriate rights granted. + GPFDIST input unless the user has the appropriate rights granted.
  • diff --git a/gpdb-doc/markdown/pxf/cfg_server.html.md.erb b/gpdb-doc/markdown/pxf/cfg_server.html.md.erb index 57fdd31d0771..9e0c13c59e04 100644 --- a/gpdb-doc/markdown/pxf/cfg_server.html.md.erb +++ b/gpdb-doc/markdown/pxf/cfg_server.html.md.erb @@ -94,7 +94,7 @@ PXF includes a template file named `pxf-site.xml`. You use the `pxf-site.xml` te You configure properties in the `pxf-site.xml` file for a PXF server when one or more of the following conditions hold: - The remote Hadoop system utilizes Kerberos authentication. -- You want to enable/disable user impersonation on the the remote Hadoop or external database system. +- You want to enable/disable user impersonation on the remote Hadoop or external database system. `pxf-site.xml` includes the following properties: diff --git a/gpdb-doc/markdown/pxf/hdfs_json.html.md.erb b/gpdb-doc/markdown/pxf/hdfs_json.html.md.erb index e99ec0a2509c..5dee2dbced05 100644 --- a/gpdb-doc/markdown/pxf/hdfs_json.html.md.erb +++ b/gpdb-doc/markdown/pxf/hdfs_json.html.md.erb @@ -115,7 +115,7 @@ The single-JSON-record-per-line data set follows: "id":287819058,"location":""}, "coordinates": null} ``` -This is the data set for for the multi-line JSON record data set: +This is the data set for the multi-line JSON record data set: ``` json { diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 04520c91f318..52ebd28701b6 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -899,7 +899,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr, } /* - * In GPDB, this is used in in the test in src/test/walrep, so we need it in the + * In GPDB, this is used in the test in src/test/walrep, so we need it in the * backend, too. */ /* #ifdef FRONTEND */ diff --git a/src/backend/cdb/cdbappendonlystorageread.c b/src/backend/cdb/cdbappendonlystorageread.c index 457c8a65107b..1acbd892fcf1 100755 --- a/src/backend/cdb/cdbappendonlystorageread.c +++ b/src/backend/cdb/cdbappendonlystorageread.c @@ -336,7 +336,7 @@ AppendOnlyStorageRead_OpenFile(AppendOnlyStorageRead *storageRead, Assert(filePathName != NULL); /* - * The EOF must be be greater than 0, otherwise we risk transactionally + * The EOF must be greater than 0, otherwise we risk transactionally * created segment files from disappearing if a concurrent write * transaction aborts. */ diff --git a/src/backend/cdb/cdbsreh.c b/src/backend/cdb/cdbsreh.c index af6d451114a9..141d7699fd62 100644 --- a/src/backend/cdb/cdbsreh.c +++ b/src/backend/cdb/cdbsreh.c @@ -478,7 +478,7 @@ PreprocessByteaData(char *src) /* * IsRejectLimitValid * - * verify that the the reject limit specified by the user is within the + * verify that the reject limit specified by the user is within the * allowed values for ROWS or PERCENT. */ void diff --git a/src/backend/cdb/cdbvarblock.c b/src/backend/cdb/cdbvarblock.c index 550b3c093f90..df98bce36bed 100644 --- a/src/backend/cdb/cdbvarblock.c +++ b/src/backend/cdb/cdbvarblock.c @@ -524,7 +524,7 @@ VarBlockIsValid( } /* - * Verify the data security zero pad between the last item and the the + * Verify the data security zero pad between the last item and the * offset array. */ for (z = VARBLOCK_HEADER_LEN + itemLenSum; z < offsetToOffsetArray; z++) diff --git a/src/backend/cdb/dispatcher/cdbconn.c b/src/backend/cdb/dispatcher/cdbconn.c index 25b7598427a9..74bab1bf477a 100644 --- a/src/backend/cdb/dispatcher/cdbconn.c +++ b/src/backend/cdb/dispatcher/cdbconn.c @@ -463,7 +463,7 @@ cdbconn_get_motion_listener_port(PGconn *conn) * * The callback is very limited in what it can do, so it cannot directly * forward the Notice to the user->QD connection. Instead, it queues the - * Notices as a list of QENotice structs. Later, when we are out of of the + * Notices as a list of QENotice structs. Later, when we are out of the * callback, forwardQENotices() sends the queued Notices to the client. *------------------------------------------------------------------------- */ diff --git a/src/backend/cdb/dispatcher/cdbdisp_query.c b/src/backend/cdb/dispatcher/cdbdisp_query.c index 9490c764c91b..fe92c4833263 100644 --- a/src/backend/cdb/dispatcher/cdbdisp_query.c +++ b/src/backend/cdb/dispatcher/cdbdisp_query.c @@ -1063,7 +1063,7 @@ cdbdisp_dispatchX(QueryDesc* queryDesc, * allocate gangs, and associate them with slices. * * On return, gangs have been allocated and CDBProcess lists have - * been filled in in the slice table.) + * been filled in the slice table.) * * Notice: This must be done before cdbdisp_buildPlanQueryParms */ diff --git a/src/backend/cdb/motion/ic_udpifc.c b/src/backend/cdb/motion/ic_udpifc.c index 72b7ec880668..8cae59d360af 100644 --- a/src/backend/cdb/motion/ic_udpifc.c +++ b/src/backend/cdb/motion/ic_udpifc.c @@ -919,7 +919,7 @@ initCursorICHistoryTable(CursorICHistoryTable *t) /* * addCursorIcEntry - * Add an entry to the the cursor ic table. + * Add an entry to the cursor ic table. */ static void addCursorIcEntry(CursorICHistoryTable *t, uint32 icId, uint32 cid) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index ad45f95c2d68..7c5334f87fb3 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -767,7 +767,7 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc) /* * GPDB_91_MERGE_FIXME: If the target is a partitioned table, we * should also report information on the triggers in the partitions. - * I.e. we should scan the the 'ri_partition_hash' of each + * I.e. we should scan the 'ri_partition_hash' of each * ResultRelInfo as well. This is somewhat academic, though, as long * as we don't support triggers in GPDB in general.. */ @@ -2976,7 +2976,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es) /* * Unlike in a FunctionScan, in a TableFunctionScan the call - * should always be a a function call of a single function. + * should always be a function call of a single function. * Get the real name of the function. */ { diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 7164ff437bd9..c1483a295136 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -15370,7 +15370,7 @@ ATExecSetDistributedBy(Relation rel, Node *node, AlterTableCmd *cmd) ldistro->numsegments); /* - * See if the the old policy is the same as the new one but + * See if the old policy is the same as the new one but * remember, we still might have to rebuild if there are new * storage options. */ diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index c3537bb15ac1..3f6054421c56 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1697,7 +1697,7 @@ InitPlan(QueryDesc *queryDesc, int eflags) { /* * On QD, the lock on the table has already been taken during parsing, so if it's a child - * partition, we don't need to take a lock. If there a a deadlock GDD will come in place + * partition, we don't need to take a lock. If there a deadlock GDD will come in place * and resolve the deadlock. ORCA Update / Delete plans only contains the root relation, so * no locks on leaf partition are taken here. The below changes makes planner as well to not * take locks on leaf partitions with GDD on. diff --git a/src/backend/executor/nodeShareInputScan.c b/src/backend/executor/nodeShareInputScan.c index c85d06aff500..d8d622d69b57 100644 --- a/src/backend/executor/nodeShareInputScan.c +++ b/src/backend/executor/nodeShareInputScan.c @@ -904,7 +904,7 @@ ExecEagerFreeShareInputScan(ShareInputScanState *node) } /* - * Reset our copy of the pointer to the the ts_state. The tuplestore can still be accessed by + * Reset our copy of the pointer to the ts_state. The tuplestore can still be accessed by * the other consumers, but we don't have a pointer to it anymore */ node->ts_state = NULL; diff --git a/src/backend/fts/README b/src/backend/fts/README index 72e7c3da8b5b..c2ba7ca76896 100644 --- a/src/backend/fts/README +++ b/src/backend/fts/README @@ -15,7 +15,7 @@ Fault Tolerance Service (FTS): Two functions pointers are important members of the BackgroundWorker structure. One points to main entry function of - the GP background process. The other points to the the function + the GP background process. The other points to the function that determine if the process should be started or not. For FTS, these two functions are FtsProbeMain() and FtsProbeStartRule(), respectively. This is hard-coded in postmaster.c. @@ -154,7 +154,7 @@ requests 1, 2, and 3 which should share the same results since they request before the start of a new fts loop, and after the results of the previous probe - that is in the lower portion. -2) Ensuring fresh results from an external probe. This is depicted as as request +2) Ensuring fresh results from an external probe. This is depicted as request 4 incoming during a current probe in progress. This request should get fresh results rather than using the current results (ie: "piggybacking"). diff --git a/src/backend/gpopt/translate/CTranslatorQueryToDXL.cpp b/src/backend/gpopt/translate/CTranslatorQueryToDXL.cpp index 70be42d05a52..c50bd8b5799d 100644 --- a/src/backend/gpopt/translate/CTranslatorQueryToDXL.cpp +++ b/src/backend/gpopt/translate/CTranslatorQueryToDXL.cpp @@ -4208,7 +4208,7 @@ CTranslatorQueryToDXL::TranslateExprToDXLProject if (IsA(expr, Var) && !insist_new_colids) { - // project elem is a a reference to a column - use the colref id + // project elem is a reference to a column - use the colref id GPOS_ASSERT(EdxlopScalarIdent == child_dxlnode->GetOperator()->GetDXLOperator()); CDXLScalarIdent *dxl_ident = (CDXLScalarIdent *) child_dxlnode->GetOperator(); project_elem_id = dxl_ident->GetDXLColRef()->Id(); diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 063899febf1c..4de52218ad5e 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -1515,7 +1515,7 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid, /* * Mapped catalog tuple without data, emitted while * catalog table was in the process of being rewritten. We - * can fail to look up the relfilenode, because the the + * can fail to look up the relfilenode, because the * relmapper has no "historic" view, in contrast to normal * the normal catalog during decoding. Thus repeated * rewrites can cause a lookup failure. That's OK because diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 56aef6ebd16a..1bfcda1037fb 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1281,7 +1281,7 @@ GetOldestXmin(Relation rel, bool ignoreVacuum) * In QD node, all distributed transactions have an entry in the proc array, * so we're done. * - * During binary upgrade and in in maintenance mode, we don't have + * During binary upgrade and in maintenance mode, we don't have * distributed transactions, so we're done there too. This ensures correct * operation of VACUUM FREEZE during pg_upgrade and maintenance mode. * diff --git a/src/backend/utils/adt/complex_type.c b/src/backend/utils/adt/complex_type.c index ac45b6419fde..0d0541692a96 100644 --- a/src/backend/utils/adt/complex_type.c +++ b/src/backend/utils/adt/complex_type.c @@ -777,7 +777,7 @@ pg_cpow_n(Complex x, int k) * Loop invariant: r = z*x^k * x is the base * k is the power - * z is the the remaining which makes the loop invariant valid + * z is the remaining which makes the loop invariant valid * End condition: k == 0, r = z*x^0, so r = z * * while k > 1: diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 4791b49d30e0..20f66a4f6fd3 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -3936,7 +3936,7 @@ EncodeTimezone(char *str, int tz, int style) /* * Convenience routine for encoding dates faster than sprintf does. - * tm is the the timestamp structure, str is the string, pos is position in + * tm is the timestamp structure, str is the string, pos is position in * the string which we are at. Upon returning, it is set to the offset of the * last character we set in str. */ diff --git a/src/backend/utils/datumstream/test/datumstreamblock_test.c b/src/backend/utils/datumstream/test/datumstreamblock_test.c index 437d2a12cd30..99c40a54335e 100644 --- a/src/backend/utils/datumstream/test/datumstreamblock_test.c +++ b/src/backend/utils/datumstream/test/datumstreamblock_test.c @@ -58,7 +58,7 @@ test__DeltaCompression__Core(void **state) assert_int_equal(dsw->delta_bitmap.bitOnCount, 0); assert_int_equal(dsw->delta_bitmap.bitCount, 0); - /* Since physical datum, test the the routines for processing the same */ + /* Since physical datum, test the routines for processing the same */ DatumStreamBlockWrite_DenseIncrItem(dsw, 0, 4); DatumStreamBlockWrite_DeltaMaintain(dsw, UInt32GetDatum(32)); assert_int_equal(dsw->physical_datum_count, 1); @@ -131,7 +131,7 @@ test__DeltaCompression__Core(void **state) assert_true(DatumStreamBitMapWrite_CurrentIsOn(&dsw->delta_bitmap)); assert_true(dsw->not_first_datum); - /* Again since physical datum, test the the routines for processing the same */ + /* Again since physical datum, test the routines for processing the same */ DatumStreamBlockWrite_DenseIncrItem(dsw, 0, 4); DatumStreamBlockWrite_DeltaMaintain(dsw, UInt32GetDatum(23 + MAX_DELTA_SUPPORTED_DELTA_COMPRESSION + 1)); assert_int_equal(dsw->physical_datum_count, 2); diff --git a/src/backend/utils/hyperloglog/gp_hyperloglog.c b/src/backend/utils/hyperloglog/gp_hyperloglog.c index 7986379fb5c6..c5d9c2d7a254 100644 --- a/src/backend/utils/hyperloglog/gp_hyperloglog.c +++ b/src/backend/utils/hyperloglog/gp_hyperloglog.c @@ -818,7 +818,7 @@ gp_hyperloglog_merge_counters(GpHLLCounter counter1, GpHLLCounter counter2) return gp_hll_copy(counter2); } else if (counter2 == NULL) { - /* if second counter is null just return the the first estimator */ + /* if second counter is null just return the first estimator */ return gp_hll_copy(counter1); } else @@ -1166,7 +1166,7 @@ gp_hyperloglog_merge(PG_FUNCTION_ARGS) counter1_merged = PG_GETARG_HLL_P_COPY(1); } else if (PG_ARGISNULL(1)) { - /* if second counter is null just return the the first estimator */ + /* if second counter is null just return the first estimator */ counter1_merged = PG_GETARG_HLL_P_COPY(0); } else { diff --git a/src/backend/utils/misc/guc_gp.c b/src/backend/utils/misc/guc_gp.c index 8529a074e2b8..0d7c28737567 100644 --- a/src/backend/utils/misc/guc_gp.c +++ b/src/backend/utils/misc/guc_gp.c @@ -4464,7 +4464,7 @@ struct config_string ConfigureNamesString_gp[] = }, { {"pljava_classpath", PGC_SUSET, CUSTOM_OPTIONS, - gettext_noop("classpath used by the the JVM"), + gettext_noop("classpath used by the JVM"), NULL, GUC_NOT_IN_SAMPLE }, diff --git a/src/backend/utils/resscheduler/resscheduler.c b/src/backend/utils/resscheduler/resscheduler.c index df22dacf25a5..a0715c54eb28 100644 --- a/src/backend/utils/resscheduler/resscheduler.c +++ b/src/backend/utils/resscheduler/resscheduler.c @@ -419,7 +419,7 @@ ResAlterQueue(Oid queueid, Cost limits[NUM_RES_LIMIT_TYPES], bool overcommit, } /* - * If threshold and overcommit alterations are all ok, do the the changes. + * If threshold and overcommit alterations are all ok, do the changes. */ if (result == ALTERQUEUE_OK) { diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h index 468b973d12a5..c82153845440 100644 --- a/src/include/executor/execdesc.h +++ b/src/include/executor/execdesc.h @@ -98,7 +98,7 @@ typedef struct Slice * A list of CDBProcess nodes corresponding to the worker processes * allocated to implement this plan slice. * - * The number of processes must agree with the the plan slice to be + * The number of processes must agree with the plan slice to be * implemented. */ List *primaryProcesses; diff --git a/src/include/gppc/gppc_config.h b/src/include/gppc/gppc_config.h index ee7db6ae47f6..74e944a9bb26 100644 --- a/src/include/gppc/gppc_config.h +++ b/src/include/gppc/gppc_config.h @@ -15,7 +15,7 @@ * * The number consists of the first number as ten thousands, * the second number as hundreds and the third number. For example, - * if the the backend is 4.2.1, the number is 40201 + * if the backend is 4.2.1, the number is 40201 */ #define GP_VERSION_NUM 40305 #endif diff --git a/src/test/regress/expected/bfv_aggregate.out b/src/test/regress/expected/bfv_aggregate.out index afd10136bba7..561a8b336da2 100644 --- a/src/test/regress/expected/bfv_aggregate.out +++ b/src/test/regress/expected/bfv_aggregate.out @@ -1615,7 +1615,7 @@ select avg('1000000000000000000'::int8) from generate_series(1, 100000); -- Test cases where the planner would like to distribute on a column, to implement -- grouping or distinct, but can't because the datatype isn't GPDB-hashable. --- These are all variants of the the same issue; all of these used to miss the +-- These are all variants of the same issue; all of these used to miss the -- check on whether the column is GPDB_hashble, producing an assertion failure. create table int2vectortab (distkey int, t int2vector,t2 int2vector) distributed by (distkey); insert into int2vectortab values diff --git a/src/test/regress/expected/bfv_aggregate_optimizer.out b/src/test/regress/expected/bfv_aggregate_optimizer.out index a7081eae388d..161f25d57aae 100644 --- a/src/test/regress/expected/bfv_aggregate_optimizer.out +++ b/src/test/regress/expected/bfv_aggregate_optimizer.out @@ -1615,7 +1615,7 @@ select avg('1000000000000000000'::int8) from generate_series(1, 100000); -- Test cases where the planner would like to distribute on a column, to implement -- grouping or distinct, but can't because the datatype isn't GPDB-hashable. --- These are all variants of the the same issue; all of these used to miss the +-- These are all variants of the same issue; all of these used to miss the -- check on whether the column is GPDB_hashble, producing an assertion failure. create table int2vectortab (distkey int, t int2vector,t2 int2vector) distributed by (distkey); insert into int2vectortab values diff --git a/src/test/regress/expected/gp_constraints.out b/src/test/regress/expected/gp_constraints.out index e726c0c46d85..06c3c89f145e 100644 --- a/src/test/regress/expected/gp_constraints.out +++ b/src/test/regress/expected/gp_constraints.out @@ -3,7 +3,7 @@ -- As of Postgres 9.2, the executor provides details in errors for offending -- tuples when constraints are violated during an INSERT / UPDATE. However, we -- are generally masking out these details (using matchsubs) in upstream tests --- because failing tuples might land on multiple segments, and the the precise +-- because failing tuples might land on multiple segments, and the precise -- error becomes time-sensitive and less predictable. -- To preserve coverage, we test those error details here (with greater care). -- diff --git a/src/test/regress/explain.pl b/src/test/regress/explain.pl index 7e480e13013d..e4f39b0f2546 100755 --- a/src/test/regress/explain.pl +++ b/src/test/regress/explain.pl @@ -155,8 +155,8 @@ =head1 DESCRIPTION explain.pl reads EXPLAIN output from a text file (or standard input) and formats it in several ways. The text file must contain -output in one of the the following formats. The first is a regular -EXPLAIN format, starting the the QUERY PLAN header and ending with the +output in one of the following formats. The first is a regular +EXPLAIN format, starting the QUERY PLAN header and ending with the number of rows in parentheses. Indenting must be on: QUERY PLAN diff --git a/src/test/regress/gpsourcify.pl b/src/test/regress/gpsourcify.pl index 8706de9a9e3d..072ea54192d6 100755 --- a/src/test/regress/gpsourcify.pl +++ b/src/test/regress/gpsourcify.pl @@ -168,7 +168,7 @@ sub LoadTokens exit(0) unless (defined($bigh)); # make an array of the token names (keys), sorted descending by - # the length of of the replacement value. + # the length of the replacement value. my @sortlen; @sortlen = sort {length($bigh->{$b}) <=> length($bigh->{$a})} keys %{$bigh}; diff --git a/src/test/regress/sql/bfv_aggregate.sql b/src/test/regress/sql/bfv_aggregate.sql index d59a9d311efa..54477c9110b7 100644 --- a/src/test/regress/sql/bfv_aggregate.sql +++ b/src/test/regress/sql/bfv_aggregate.sql @@ -1397,7 +1397,7 @@ select avg('1000000000000000000'::int8) from generate_series(1, 100000); -- Test cases where the planner would like to distribute on a column, to implement -- grouping or distinct, but can't because the datatype isn't GPDB-hashable. --- These are all variants of the the same issue; all of these used to miss the +-- These are all variants of the same issue; all of these used to miss the -- check on whether the column is GPDB_hashble, producing an assertion failure. create table int2vectortab (distkey int, t int2vector,t2 int2vector) distributed by (distkey); insert into int2vectortab values diff --git a/src/test/regress/sql/gp_constraints.sql b/src/test/regress/sql/gp_constraints.sql index f91ee1072aa3..0cb69e2c2afb 100644 --- a/src/test/regress/sql/gp_constraints.sql +++ b/src/test/regress/sql/gp_constraints.sql @@ -3,7 +3,7 @@ -- As of Postgres 9.2, the executor provides details in errors for offending -- tuples when constraints are violated during an INSERT / UPDATE. However, we -- are generally masking out these details (using matchsubs) in upstream tests --- because failing tuples might land on multiple segments, and the the precise +-- because failing tuples might land on multiple segments, and the precise -- error becomes time-sensitive and less predictable. -- To preserve coverage, we test those error details here (with greater care). -- From afb6455bf3879a0ed39f9bcd5a60589c061d9beb Mon Sep 17 00:00:00 2001 From: Daniel Gustafsson Date: Thu, 27 Feb 2020 12:12:07 +0100 Subject: [PATCH 049/102] docs: Fix typos Backported from master 9aa9dc0ab1b4bb62ddf9f1b5c40b7cb9f3702542 pickReviewed-by: Mel Kiyama Reviewed-by: Heikki Linnakangas --- gpdb-doc/dita/install_guide/ansible-example.xml | 2 +- gpdb-doc/markdown/pxf/gphdfs-pxf-migrate.html.md.erb | 2 +- gpdb-doc/markdown/pxf/jdbc_cfg.html.md.erb | 2 +- gpdb-doc/markdown/pxf/pxfuserimpers.html.md.erb | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gpdb-doc/dita/install_guide/ansible-example.xml b/gpdb-doc/dita/install_guide/ansible-example.xml index 1dd57407e5b0..65d36c650850 100644 --- a/gpdb-doc/dita/install_guide/ansible-example.xml +++ b/gpdb-doc/dita/install_guide/ansible-example.xml @@ -21,7 +21,7 @@

    Following are steps to use this Ansible playbook.

    1. Install Ansible on the control node using your package manager. See the Ansible documention + href="https://docs.ansible.com" format="html" scope="external">Ansible documentation for help with installation.
    2. Set up passwordless SSH from the control node to all hosts that will be a part of the Greenplum Database cluster. You can use the ssh-copy-id command to install diff --git a/gpdb-doc/markdown/pxf/gphdfs-pxf-migrate.html.md.erb b/gpdb-doc/markdown/pxf/gphdfs-pxf-migrate.html.md.erb index bee994c2844e..bcaf0556fd8d 100644 --- a/gpdb-doc/markdown/pxf/gphdfs-pxf-migrate.html.md.erb +++ b/gpdb-doc/markdown/pxf/gphdfs-pxf-migrate.html.md.erb @@ -218,7 +218,7 @@ Ensure that you can read from, or write to, each `pxf` external table that you h ## Removing the gphdfs External Tables -You must remove all `gphdfs` external tables before you can successfuly migrate a Greenplum Database 4 or 5 database to Greenplum 6. +You must remove all `gphdfs` external tables before you can successfully migrate a Greenplum Database 4 or 5 database to Greenplum 6. Drop an external table as follows: diff --git a/gpdb-doc/markdown/pxf/jdbc_cfg.html.md.erb b/gpdb-doc/markdown/pxf/jdbc_cfg.html.md.erb index 057bff1406bf..504ea31e4c1f 100644 --- a/gpdb-doc/markdown/pxf/jdbc_cfg.html.md.erb +++ b/gpdb-doc/markdown/pxf/jdbc_cfg.html.md.erb @@ -218,7 +218,7 @@ By default, PXF JDBC user impersonation is disabled. Perform the following proc pxf.service.user.impersonation true - + ``` 7. Save the `jdbc-site.xml` file and exit the editor. diff --git a/gpdb-doc/markdown/pxf/pxfuserimpers.html.md.erb b/gpdb-doc/markdown/pxf/pxfuserimpers.html.md.erb index 247653a5be85..fedb737c5292 100644 --- a/gpdb-doc/markdown/pxf/pxfuserimpers.html.md.erb +++ b/gpdb-doc/markdown/pxf/pxfuserimpers.html.md.erb @@ -96,7 +96,7 @@ PXF user impersonation is enabled by default for Hadoop servers. You can configu pxf.service.user.impersonation true - + ``` 3. If you enabled user impersonation, you must configure Hadoop proxying as described in [Configure Hadoop Proxying](#hadoop). You must also configure [Hive User Impersonation](#hive) and [HBase User Impersonation](#hbase) if you plan to use those services. From 8e51649a201ae83a9a32767b920b824b07db7633 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Thu, 27 Feb 2020 13:47:57 -0800 Subject: [PATCH 050/102] Use separate table in qp_dml_oids forappendonly_use_no_toast test Binary swap test fails as lower version doesn't have fix from commit a478ed38434372b31998b3cb5aecf55c795215b7. Hence, creating separate table for the test and dropping the same at end, to avoid the failure. --- src/test/regress/expected/qp_dml_oids.out | 10 +++++++--- src/test/regress/expected/qp_dml_oids_optimizer.out | 10 +++++++--- src/test/regress/sql/qp_dml_oids.sql | 8 ++++---- 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/src/test/regress/expected/qp_dml_oids.out b/src/test/regress/expected/qp_dml_oids.out index c7d9db753e82..86484264f828 100644 --- a/src/test/regress/expected/qp_dml_oids.out +++ b/src/test/regress/expected/qp_dml_oids.out @@ -348,14 +348,18 @@ SELECT COUNT(distinct oid) FROM dml_ao where a = 10; -- Check that 'toast' is disabled by GUC. -- set debug_appendonly_use_no_toast to on; -INSERT INTO dml_ao (a, b, c) VALUES (10, 3, repeat('x', 50000)); -INSERT INTO dml_ao (a, b, c) VALUES (10, 4, repeat('x', 50000)); +CREATE TABLE dml_ao1 (a int , b int default -1, c text) WITH (appendonly = true, oids = true) DISTRIBUTED BY (a); +NOTICE: OIDS=TRUE is not recommended for user-created tables +HINT: Use OIDS=FALSE to prevent wrap-around of the OID counter. +INSERT INTO dml_ao1 (a, b, c) VALUES (10, 3, repeat('x', 50000)); +INSERT INTO dml_ao1 (a, b, c) VALUES (10, 4, repeat('x', 50000)); SELECT COUNT(distinct oid) FROM dml_ao where a = 10; count ------- - 4 + 2 (1 row) +DROP TABLE dml_ao1; reset debug_appendonly_use_no_toast; -- -- Check that new OIDs are generated even if the tuple being inserted came from diff --git a/src/test/regress/expected/qp_dml_oids_optimizer.out b/src/test/regress/expected/qp_dml_oids_optimizer.out index 0706dc007e6c..21be2a2ffb74 100644 --- a/src/test/regress/expected/qp_dml_oids_optimizer.out +++ b/src/test/regress/expected/qp_dml_oids_optimizer.out @@ -350,14 +350,18 @@ SELECT COUNT(distinct oid) FROM dml_ao where a = 10; -- Check that 'toast' is disabled by GUC. -- set debug_appendonly_use_no_toast to on; -INSERT INTO dml_ao (a, b, c) VALUES (10, 3, repeat('x', 50000)); -INSERT INTO dml_ao (a, b, c) VALUES (10, 4, repeat('x', 50000)); +CREATE TABLE dml_ao1 (a int , b int default -1, c text) WITH (appendonly = true, oids = true) DISTRIBUTED BY (a); +NOTICE: OIDS=TRUE is not recommended for user-created tables +HINT: Use OIDS=FALSE to prevent wrap-around of the OID counter. +INSERT INTO dml_ao1 (a, b, c) VALUES (10, 3, repeat('x', 50000)); +INSERT INTO dml_ao1 (a, b, c) VALUES (10, 4, repeat('x', 50000)); SELECT COUNT(distinct oid) FROM dml_ao where a = 10; count ------- - 4 + 2 (1 row) +DROP TABLE dml_ao1; reset debug_appendonly_use_no_toast; -- -- Check that new OIDs are generated even if the tuple being inserted came from diff --git a/src/test/regress/sql/qp_dml_oids.sql b/src/test/regress/sql/qp_dml_oids.sql index 600d3cb4c55a..d4ce0b9bba28 100644 --- a/src/test/regress/sql/qp_dml_oids.sql +++ b/src/test/regress/sql/qp_dml_oids.sql @@ -197,12 +197,12 @@ SELECT COUNT(distinct oid) FROM dml_ao where a = 10; -- Check that 'toast' is disabled by GUC. -- set debug_appendonly_use_no_toast to on; - -INSERT INTO dml_ao (a, b, c) VALUES (10, 3, repeat('x', 50000)); -INSERT INTO dml_ao (a, b, c) VALUES (10, 4, repeat('x', 50000)); +CREATE TABLE dml_ao1 (a int , b int default -1, c text) WITH (appendonly = true, oids = true) DISTRIBUTED BY (a); +INSERT INTO dml_ao1 (a, b, c) VALUES (10, 3, repeat('x', 50000)); +INSERT INTO dml_ao1 (a, b, c) VALUES (10, 4, repeat('x', 50000)); SELECT COUNT(distinct oid) FROM dml_ao where a = 10; - +DROP TABLE dml_ao1; reset debug_appendonly_use_no_toast; -- From 41704899d655d27ebc4325ecfdb004f71d3c056c Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Thu, 27 Feb 2020 11:26:23 -0800 Subject: [PATCH 051/102] docs - fix some xrefs and conditionalize (#9626) * docs - fix some xrefs and conditionalize * update a link --- gpdb-doc/dita/admin_guide/load/load.ditamap | 28 ++++++++++++++++++ .../topics/g-loading-and-unloading-data.xml | 10 +++++-- gpdb-doc/dita/analytics/overview.xml | 2 +- .../install_guide/platform-requirements.xml | 18 ++++++------ .../dita/utility_guide/utility-programs.xml | 29 ++++++++++--------- .../dita/utility_guide/utility_guide.ditamap | 14 +++------ 6 files changed, 66 insertions(+), 35 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/load/load.ditamap b/gpdb-doc/dita/admin_guide/load/load.ditamap index c82d07b53023..787f467b1f6d 100644 --- a/gpdb-doc/dita/admin_guide/load/load.ditamap +++ b/gpdb-doc/dita/admin_guide/load/load.ditamap @@ -32,9 +32,37 @@ +<<<<<<< HEAD + + + + + + + + + + + + + + + + + + >>>>>> 86c1ce66f8... docs - fix some xrefs and conditionalize (#9626) type="topic"/> diff --git a/gpdb-doc/dita/admin_guide/load/topics/g-loading-and-unloading-data.xml b/gpdb-doc/dita/admin_guide/load/topics/g-loading-and-unloading-data.xml index bcb80d8b82b2..8ecd27c67d9a 100644 --- a/gpdb-doc/dita/admin_guide/load/topics/g-loading-and-unloading-data.xml +++ b/gpdb-doc/dita/admin_guide/load/topics/g-loading-and-unloading-data.xml @@ -35,8 +35,14 @@ PXF, refer to Accessing External Data with PXF.
    3. The Greenplum-Kafka Integration provides high-speed, parallel data transfer from Kafka to Greenplum Database. For information about using - these tools, refer to the documentation at Pivotal Greenplum-Kafka Integration.
    4. + these tools, refer to the Greenplum-Kafka Integration + documentation. +
    5. The Greenplum Streaming Server is an ETL tool and API that you + can use to load data into Greenplum Database. For information about using this tool, + refer to the Greenplum Streaming Server + documentation.
    6. The Greenplum-Spark Connector provides high speed, parallel data transfer between Pivotal Greenplum Database and Apache Spark. For information about using the Greenplum-Spark Connector, refer to the documentation at
    7. Use a multitude of data extensions. Greenplum supports Apache Kafka integration, extensions for HDFS, Hive, and HBase as well as reading/writing data from/to cloud storage, including Amazon S3 objects. Review the capabilities of Greenplum

        -
      • +
      • @@ -290,7 +290,7 @@

      - + Client Tools

      Greenplum Database 6 releases a Clients tool package on various platforms that can be @@ -310,7 +310,7 @@

      The Greenplum 6 Clients package includes the client and loader programs provided in the Greenplum 5 packages plus the addition of database/role/language commands and the - Greenplum-Kafka Integration and Greenplum Stream Server command utilities. Refer to Greenplum Client and Loader Tools Package for installation and usage details of the Greenplum 6 Client tools.

      @@ -404,14 +404,14 @@ speed, parallel data transfer from a Kafka cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. It requires Kafka version 0.11 or newer for exactly-once delivery assurance. Refer to the Pivotal - Greenplum-Kafka Integration Documentation for more information about this - feature.
    8. -
    9. Greenplum Stream Server v1.3.1 - The Pivotal Greenplum Stream Server is an ETL tool + href="https://greenplum.docs.pivotal.io/streaming-server/1-3-1/kafka/intro.html" scope="external" format="html"> + Pivotal Greenplum-Kafka Integration Documentation for more information about this + feature.
    10. +
    11. Greenplum Streaming Server v1.3.1 - The Pivotal Greenplum Streaming Server is an ETL tool that provides high speed, parallel data transfer from Informatica, Kafka, and custom client data sources to a Pivotal Greenplum Database cluster. Refer to the - Performing ETL Operations with the Pivotal Greenplum Stream Server + href="https://greenplum.docs.pivotal.io/streaming-server/1-3-1/intro.html" scope="external" format="html"> + Pivotal Greenplum Streaming Server Documentation for more information about this feature.
    12. Greenplum Informatica Connector v1.0.5 - The Pivotal Greenplum Informatica Connector supports high speed data transfer from an Informatica PowerCenter cluster to a Pivotal diff --git a/gpdb-doc/dita/utility_guide/utility-programs.xml b/gpdb-doc/dita/utility_guide/utility-programs.xml index 6f9464b9aae3..fd51ff79a99b 100644 --- a/gpdb-doc/dita/utility_guide/utility-programs.xml +++ b/gpdb-doc/dita/utility_guide/utility-programs.xml @@ -16,8 +16,14 @@
    13. The Pivotal Greenplum Backup and Restore utilities.
    14. -
    15. The The Pivotal Greenplum gpcopy utility.
    16. +
    17. The Pivotal Greenplum-Kafka Integration + utilities.
    18. +
    19. The Pivotal Greenplum Streaming Server + utilities.
    20. Additionally, the Pivotal

      - gpkafka4

      -

      - gpkafka-v2.yaml

      3

      @@ -76,12 +79,8 @@

      1

      -

      gpss4

      -

      gpss.json

      -

      gpsscli4

      +

      gpss4

      @@ -119,8 +118,12 @@ from the Greenplum Database Greenplum Clients filegroup on Pivotal Network.

      -

      4 The utility program is also installed with the Greenplum Client - and Loader Tools Package for Linux.

      +

      4The utility program is also installed with the Greenplum Client + and Loader Tools Package for Linux. You can obtain the most up-to-date + version of the Greenplum Streaming Server and + Greenplum-Kafka Integration from + Pivotal Network.

      diff --git a/gpdb-doc/dita/utility_guide/utility_guide.ditamap b/gpdb-doc/dita/utility_guide/utility_guide.ditamap index 16eaab4bda9e..bed878c79db6 100644 --- a/gpdb-doc/dita/utility_guide/utility_guide.ditamap +++ b/gpdb-doc/dita/utility_guide/utility_guide.ditamap @@ -29,10 +29,8 @@ - - + @@ -47,12 +45,8 @@ - - - + From b7be5e40692109505a07311ef736482e58407f64 Mon Sep 17 00:00:00 2001 From: David Yozie Date: Thu, 27 Feb 2020 14:28:40 -0800 Subject: [PATCH 052/102] Docs - conflict resolution from previous merge --- gpdb-doc/dita/admin_guide/load/load.ditamap | 6 ------ 1 file changed, 6 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/load/load.ditamap b/gpdb-doc/dita/admin_guide/load/load.ditamap index 787f467b1f6d..4c0ce6013db6 100644 --- a/gpdb-doc/dita/admin_guide/load/load.ditamap +++ b/gpdb-doc/dita/admin_guide/load/load.ditamap @@ -32,11 +32,6 @@ -<<<<<<< HEAD - - >>>>>> 86c1ce66f8... docs - fix some xrefs and conditionalize (#9626) type="topic"/> From c48004251fe9b0b9ed9b690ecd42d40cd386f39f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=D0=A0=D0=BE=D0=BC=D0=B0=D0=BD=20=D0=97=D0=BE=D1=82=D0=BE?= =?UTF-8?q?=D0=B2?= Date: Wed, 26 Feb 2020 14:16:02 -0800 Subject: [PATCH 053/102] Fix mis-merge of skew hashtable reset before advancing batch Resetting the skew hashtable-related variables should always happen after batch 0. This was a mis-merge through which the skew hashtable reset code was added as the "else" condition in an unrelated conditional. In rare cases the "if" condition would be hit instead, causing a segfault upon retrieving the next outer tuple when checking to see if the outer tuple is in the skew hashtable in ExecHashGetSkewBucket(). This would only happen, however, for plans in which a fast path rescan hint has been explicitly set in a sub-tree containing a hashjoin, and, specifically, a hashjoin which spills. This condition will only trigger when the skew hashtable is in use, which will only happen on segments with local statistics for the table in question. (cherry picked from commit d3ccff0434e4d005aefcaa8c6b9f7b0f770718c2) --- src/backend/executor/nodeHashjoin.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index ce56d7dc785b..1c9b6cc7c6a9 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -922,6 +922,20 @@ ExecHashJoinNewBatch(HashJoinState *hjstate) BufFileClose(hashtable->outerBatchFile[curbatch]); hashtable->outerBatchFile[curbatch] = NULL; } + else /* we just finished the first batch */ + { + /* + * Reset some of the skew optimization state variables, since we no + * longer need to consider skew tuples after the first batch. The + * memory context reset we are about to do will release the skew + * hashtable itself. + */ + hashtable->skewEnabled = false; + hashtable->skewBucket = NULL; + hashtable->skewBucketNums = NULL; + hashtable->nSkewBuckets = 0; + hashtable->spaceUsedSkew = 0; + } /* * If we want to keep the hash table around, for re-scan, then write @@ -941,21 +955,6 @@ ExecHashJoinNewBatch(HashJoinState *hjstate) { SpillCurrentBatch(hjstate); } - else /* we just finished the first batch */ - { - /* - * Reset some of the skew optimization state variables, since we no - * longer need to consider skew tuples after the first batch. The - * memory context reset we are about to do will release the skew - * hashtable itself. - */ - hashtable->skewEnabled = false; - hashtable->skewBucket = NULL; - hashtable->skewBucketNums = NULL; - hashtable->nSkewBuckets = 0; - hashtable->spaceUsedSkew = 0; - } - /* * We can always skip over any batches that are completely empty on both * sides. We can sometimes skip over batches that are empty on only one From f5eec626d5f9ec01b855e8854af64ee27a70214e Mon Sep 17 00:00:00 2001 From: David Yozie Date: Thu, 27 Feb 2020 16:21:09 -0800 Subject: [PATCH 054/102] Docs - correcting load .ditamap gpss/gpkafka refs --- gpdb-doc/dita/admin_guide/load/load.ditamap | 33 +++++---------------- 1 file changed, 7 insertions(+), 26 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/load/load.ditamap b/gpdb-doc/dita/admin_guide/load/load.ditamap index 4c0ce6013db6..dcc7b816195f 100644 --- a/gpdb-doc/dita/admin_guide/load/load.ditamap +++ b/gpdb-doc/dita/admin_guide/load/load.ditamap @@ -20,7 +20,10 @@ - + + @@ -32,31 +35,9 @@ - - - - - - - - - - - - - - - - - - - + From 51b10c959f10ceef964a62b7128b4263d5e46951 Mon Sep 17 00:00:00 2001 From: Peifeng Qiu Date: Fri, 28 Feb 2020 10:23:24 +0900 Subject: [PATCH 055/102] gpcloud: support raw deflate compressed format (#9624) (#9632) Raw deflate file uses the same algorithm as gz file without the gz magic header. Zlib can handle this automatically. Make gpcloud to recognize all files with extension name ".deflate" as raw deflate file. --- gpcontrib/gpcloud/include/s3interface.h | 1 + gpcontrib/gpcloud/include/s3url.h | 2 ++ .../regress/input/1_20_deflate_data.source | 12 ++++++++ .../regress/output/1_20_deflate_data.source | 18 ++++++++++++ gpcontrib/gpcloud/regress/regress_schedule | 2 +- gpcontrib/gpcloud/src/s3common_reader.cpp | 1 + gpcontrib/gpcloud/src/s3interface.cpp | 5 ++++ gpcontrib/gpcloud/src/s3url.cpp | 12 ++++++++ gpcontrib/gpcloud/test/s3interface_test.cpp | 5 ++++ gpcontrib/gpcloud/test/s3url_test.cpp | 28 ++++++++++++++++++- 10 files changed, 84 insertions(+), 2 deletions(-) create mode 100644 gpcontrib/gpcloud/regress/input/1_20_deflate_data.source create mode 100644 gpcontrib/gpcloud/regress/output/1_20_deflate_data.source diff --git a/gpcontrib/gpcloud/include/s3interface.h b/gpcontrib/gpcloud/include/s3interface.h index 79764bb73d2e..7ff1576ba231 100644 --- a/gpcontrib/gpcloud/include/s3interface.h +++ b/gpcontrib/gpcloud/include/s3interface.h @@ -18,6 +18,7 @@ enum S3CompressionType { S3_COMPRESSION_GZIP, S3_COMPRESSION_PLAIN, + S3_COMPRESSION_DEFLATE, }; struct BucketContent { diff --git a/gpcontrib/gpcloud/include/s3url.h b/gpcontrib/gpcloud/include/s3url.h index 5fd0154e6c58..163e65140e75 100644 --- a/gpcontrib/gpcloud/include/s3url.h +++ b/gpcontrib/gpcloud/include/s3url.h @@ -49,6 +49,8 @@ class S3Url { bool isValidUrl() const; + string getExtension() const; + private: string extractField(const struct http_parser_url *urlParser, http_parser_url_fields i); diff --git a/gpcontrib/gpcloud/regress/input/1_20_deflate_data.source b/gpcontrib/gpcloud/regress/input/1_20_deflate_data.source new file mode 100644 index 000000000000..cb25a2566455 --- /dev/null +++ b/gpcontrib/gpcloud/regress/input/1_20_deflate_data.source @@ -0,0 +1,12 @@ +CREATE READABLE EXTERNAL TABLE s3regress_deflate (date text, time text, open float, high float, + low float, volume int) LOCATION('s3://s3-us-west-2.amazonaws.com/@read_prefix@/deflate_normal1/ config=@config_file@') FORMAT 'csv'; + +SELECT count(*) count, round(sum(open)) sum, round(avg(open)) avg FROM s3regress_deflate; + +DROP EXTERNAL TABLE s3regress_deflate; + +CREATE READABLE EXTERNAL TABLE s3regress_deflate (Year text, Month text, DayofMonth text, DayOfWeek text, DepTime text, CRSDepTime text, ArrTime text,CRSArrTime text, UniqueCarrier text, FlightNum text,TailNum text, ActualElapsedTime text, CRSElapsedTime text, AirTime text, ArrDelay text, DepDelay text, Origin text, Dest text, Distance text, TaxiIn text, TaxiOut text, Cancelled text, CancellationCode text, Diverted text, CarrierDelay text, WeatherDelay text, NASDelay text, SecurityDelay text, LateAircraftDelay text) LOCATION('s3://s3-us-west-2.amazonaws.com/@read_prefix@/deflate_2002and2003/ config=@config_file@') format 'csv' SEGMENT REJECT LIMIT 100 PERCENT; + +SELECT count(*) FROM s3regress_deflate; + +DROP EXTERNAL TABLE s3regress_deflate; diff --git a/gpcontrib/gpcloud/regress/output/1_20_deflate_data.source b/gpcontrib/gpcloud/regress/output/1_20_deflate_data.source new file mode 100644 index 000000000000..faee116476a1 --- /dev/null +++ b/gpcontrib/gpcloud/regress/output/1_20_deflate_data.source @@ -0,0 +1,18 @@ +CREATE READABLE EXTERNAL TABLE s3regress_deflate (date text, time text, open float, high float, + low float, volume int) LOCATION('s3://s3-us-west-2.amazonaws.com/@read_prefix@/deflate_normal1/ config=@config_file@') FORMAT 'csv'; +SELECT count(*) count, round(sum(open)) sum, round(avg(open)) avg FROM s3regress_deflate; + count | sum | avg +----------+------------+----- + 31033039 | 1490754474 | 48 +(1 row) + +DROP EXTERNAL TABLE s3regress_deflate; +CREATE READABLE EXTERNAL TABLE s3regress_deflate (Year text, Month text, DayofMonth text, DayOfWeek text, DepTime text, CRSDepTime text, ArrTime text,CRSArrTime text, UniqueCarrier text, FlightNum text,TailNum text, ActualElapsedTime text, CRSElapsedTime text, AirTime text, ArrDelay text, DepDelay text, Origin text, Dest text, Distance text, TaxiIn text, TaxiOut text, Cancelled text, CancellationCode text, Diverted text, CarrierDelay text, WeatherDelay text, NASDelay text, SecurityDelay text, LateAircraftDelay text) LOCATION('s3://s3-us-west-2.amazonaws.com/@read_prefix@/deflate_2002and2003/ config=@config_file@') format 'csv' SEGMENT REJECT LIMIT 100 PERCENT; +SELECT count(*) FROM s3regress_deflate; +NOTICE: found 335925 data formatting errors (335925 or more input rows), rejected related input data + count +---------- + 11423976 +(1 row) + +DROP EXTERNAL TABLE s3regress_deflate; diff --git a/gpcontrib/gpcloud/regress/regress_schedule b/gpcontrib/gpcloud/regress/regress_schedule index 01419d35e214..0d3cf6ee3d1d 100644 --- a/gpcontrib/gpcloud/regress/regress_schedule +++ b/gpcontrib/gpcloud/regress/regress_schedule @@ -4,7 +4,7 @@ test: 0_00_prepare_protocols test: 1_03_bad_data 1_04_empty_prefix 1_05_one_line 1_06_1correct_1wrong 2_02_invalid_region 2_03_invalid_config 2_04_invalid_header 2_05_limit_zero 3_01_create_wet 3_02_quick_shoot_wet 3_11_write_with_encryption 4_01_create_invalid_wet 2_06_invalid_sub_query 1_17_no_eol_at_eof 2_07_wrong_proxy # tens of seconds -test: 1_01_normal 1_02_log_error 1_10_all_regions 1_11_gzipped_data 1_12_no_prefix 1_13_parallel1 1_13_parallel2 1_09_partition 3_09_write_big_row 3_10_write_mixed_length_rows 1_15_normal_sub_query 1_16_multiple_files_with_header_line 1_18_all_regions_version2 +test: 1_01_normal 1_02_log_error 1_10_all_regions 1_11_gzipped_data 1_12_no_prefix 1_13_parallel1 1_13_parallel2 1_09_partition 3_09_write_big_row 3_10_write_mixed_length_rows 1_15_normal_sub_query 1_16_multiple_files_with_header_line 1_18_all_regions_version2 1_20_deflate_data # heavy loads, > 100s test: 1_07_huge_bad_data 1_08_huge_correct_data 1_14_thousands_of_files 3_03_insert_lots_of_rows 3_07_write_lots_of_files 3_04_insert_mixed_workload 3_05_insert_to_wet_from_ret 3_06_special_characters 3_08_join_query_wet_local_tbl 4_02_wet_with_mixed_format diff --git a/gpcontrib/gpcloud/src/s3common_reader.cpp b/gpcontrib/gpcloud/src/s3common_reader.cpp index 6da57f91cf7a..b6f1e1baae36 100644 --- a/gpcontrib/gpcloud/src/s3common_reader.cpp +++ b/gpcontrib/gpcloud/src/s3common_reader.cpp @@ -6,6 +6,7 @@ void S3CommonReader::open(const S3Params ¶ms) { S3CompressionType compressionType = s3InterfaceService->checkCompressionType(params.getS3Url()); switch (compressionType) { + case S3_COMPRESSION_DEFLATE: case S3_COMPRESSION_GZIP: this->upstreamReader = &this->decompressReader; this->decompressReader.setReader(&this->keyReader); diff --git a/gpcontrib/gpcloud/src/s3interface.cpp b/gpcontrib/gpcloud/src/s3interface.cpp index 12dff7b2805a..98f9cdbaa3cf 100644 --- a/gpcontrib/gpcloud/src/s3interface.cpp +++ b/gpcontrib/gpcloud/src/s3interface.cpp @@ -353,6 +353,11 @@ uint64_t S3InterfaceService::fetchData(uint64_t offset, S3VectorUInt8 &data, uin } S3CompressionType S3InterfaceService::checkCompressionType(const S3Url &s3Url) { + string ext = s3Url.getExtension(); + if (ext == ".deflate") { + return S3_COMPRESSION_DEFLATE; + } + HTTPHeaders headers; char rangeBuf[S3_RANGE_HEADER_STRING_LEN] = {0}; diff --git a/gpcontrib/gpcloud/src/s3url.cpp b/gpcontrib/gpcloud/src/s3url.cpp index 0f1d0da28570..3dd416cca512 100644 --- a/gpcontrib/gpcloud/src/s3url.cpp +++ b/gpcontrib/gpcloud/src/s3url.cpp @@ -140,3 +140,15 @@ string S3Url::extractField(const struct http_parser_url *urlParser, http_parser_ return this->sourceUrl.substr(urlParser->field_data[i].off, urlParser->field_data[i].len); } + +string S3Url::getExtension() const { + const string& path = this->prefix; + std::string::size_type pos = path.find_last_of('/'); + string filename = (pos == path.npos) ? path : path.substr(pos + 1); + + pos = filename.find_last_of('.'); + if (pos == filename.npos) { + return ""; + } + return filename.substr(pos); +} \ No newline at end of file diff --git a/gpcontrib/gpcloud/test/s3interface_test.cpp b/gpcontrib/gpcloud/test/s3interface_test.cpp index 71e9a942ffe7..cfcc24d56c19 100644 --- a/gpcontrib/gpcloud/test/s3interface_test.cpp +++ b/gpcontrib/gpcloud/test/s3interface_test.cpp @@ -397,6 +397,11 @@ TEST_F(S3InterfaceServiceTest, checkSmallFile) { EXPECT_EQ(S3_COMPRESSION_PLAIN, this->checkCompressionType(s3Url.getFullUrlForCurl())); } +TEST_F(S3InterfaceServiceTest, checkItsDeflateCompressed) { + S3Url s3Url("https://s3-us-west-2.amazonaws.com/s3test.pivotal.io/whatever.deflate"); + EXPECT_EQ(S3_COMPRESSION_DEFLATE, this->checkCompressionType(s3Url)); +} + TEST_F(S3InterfaceServiceTest, checkItsGzipCompressed) { vector raw; raw.resize(4); diff --git a/gpcontrib/gpcloud/test/s3url_test.cpp b/gpcontrib/gpcloud/test/s3url_test.cpp index 079278d17b31..8d9a59ae7916 100644 --- a/gpcontrib/gpcloud/test/s3url_test.cpp +++ b/gpcontrib/gpcloud/test/s3url_test.cpp @@ -92,4 +92,30 @@ TEST(S3UrlTest, Region_apnortheast2) { EXPECT_EQ("ap-northeast-2", s3Url.getRegion()); EXPECT_EQ("s3test.pivotal.io", s3Url.getBucket()); EXPECT_EQ("dataset1/normal", s3Url.getPrefix()); -} \ No newline at end of file +} + +TEST(S3UrlTest, Extension) { + S3Url s3Url("http://s3-us-west-2.amazonaws.com/bucket/"); + EXPECT_EQ("", s3Url.getPrefix()); + EXPECT_EQ("", s3Url.getExtension()); + s3Url.setPrefix("abc"); + EXPECT_EQ("", s3Url.getExtension()); + s3Url.setPrefix("a.b.c"); + EXPECT_EQ(".c", s3Url.getExtension()); + s3Url.setPrefix("/"); + EXPECT_EQ("", s3Url.getExtension()); + s3Url.setPrefix("//"); + EXPECT_EQ("", s3Url.getExtension()); + s3Url.setPrefix("./"); + EXPECT_EQ("", s3Url.getExtension()); + s3Url.setPrefix("/a.b/"); + EXPECT_EQ("", s3Url.getExtension()); + s3Url.setPrefix("/."); + EXPECT_EQ(".", s3Url.getExtension()); + s3Url.setPrefix("a.b"); + EXPECT_EQ(".b", s3Url.getExtension()); + s3Url.setPrefix("/a.b"); + EXPECT_EQ(".b", s3Url.getExtension()); + s3Url.setPrefix("ab/a.b"); + EXPECT_EQ(".b", s3Url.getExtension()); +} From cd87c8e76bdce6b7eb8ec530da583f9d3f3d578d Mon Sep 17 00:00:00 2001 From: Huiliang Liu Date: Mon, 24 Feb 2020 17:08:01 +0800 Subject: [PATCH 056/102] Change WARNING log to normal log on sending request to gpfdist Warning log will print in client/frontend, and some ODBC API returns "SQL success with info" value. It may be considered as exception in some client. So we change it to normal log and report error if it really fails. --- src/backend/access/external/url_curl.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/backend/access/external/url_curl.c b/src/backend/access/external/url_curl.c index edff5a9111e0..b7c80c3e127c 100644 --- a/src/backend/access/external/url_curl.c +++ b/src/backend/access/external/url_curl.c @@ -217,7 +217,7 @@ destroy_curlhandle(curlhandle_t *h) CURLMcode e = curl_multi_remove_handle(multi_handle, h->handle); if (CURLM_OK != e) - elog(WARNING, "internal error curl_multi_remove_handle (%d - %s)", e, curl_easy_strerror(e)); + elog(LOG, "internal error curl_multi_remove_handle (%d - %s)", e, curl_easy_strerror(e)); h->in_multi_handle = false; } @@ -257,7 +257,7 @@ url_curl_abort_callback(ResourceReleasePhase phase, if (curr->owner == CurrentResourceOwner) { if (isCommit) - elog(WARNING, "url_curl reference leak: %p still referenced", curr); + elog(LOG, "url_curl reference leak: %p still referenced", curr); destroy_curlhandle(curr); } @@ -569,6 +569,7 @@ gp_curl_easy_perform_backoff_and_check_response(URL_CURL_FILE *file) CURLcode e = curl_easy_perform(file->curl->handle); if (CURLE_OK != e) { + /* For curl timeout, retry 2 times before reporting error */ if (CURLE_OPERATION_TIMEDOUT == e) { timeout_count++; @@ -584,7 +585,7 @@ gp_curl_easy_perform_backoff_and_check_response(URL_CURL_FILE *file) } else { - elog(WARNING, "%s error (%d - %s)", file->curl_url, e, curl_easy_strerror(e)); + elog(LOG, "%s error (%d - %s)", file->curl_url, e, curl_easy_strerror(e)); } } else @@ -612,6 +613,10 @@ gp_curl_easy_perform_backoff_and_check_response(URL_CURL_FILE *file) response_string = NULL; } + /* + * For FDIST_TIMEOUT and curl errors except CURLE_OPERATION_TIMEDOUT + * Retry until MAX_TRY_WAIT_TIME + */ if (wait_time > MAX_TRY_WAIT_TIME) { ereport(ERROR, @@ -621,7 +626,7 @@ gp_curl_easy_perform_backoff_and_check_response(URL_CURL_FILE *file) } else { - elog(WARNING, "failed to send request to gpfdist (%s), will retry after %d seconds", file->curl_url, wait_time); + elog(LOG, "failed to send request to gpfdist (%s), will retry after %d seconds", file->curl_url, wait_time); unsigned int for_wait = 0; while (for_wait++ < wait_time) { From b6b9e032513f4166ec5a8cab770e0d344b94556f Mon Sep 17 00:00:00 2001 From: Ning Yu Date: Fri, 28 Feb 2020 14:05:28 +0800 Subject: [PATCH 057/102] Fix data reorganization when gp_use_legacy_hashops is true The ALTER TABLE command reorganizes the data by using a temporary table, if a "distributed by" clause is specified without the opclasses, the default opclasses will be chosen. There was a bug that the non-legacy opclasses are always chosen, regarding the setting of gp_use_legacy_hashops. However the table's new opclasses are determined with gp_use_legacy_hashops, so when gp_use_legacy_hashops is true the data will be incorrectly redistributed. The issue also exists in CTAS. Fixed by choosing the default opclasses for the temporary table according to gp_use_legacy_hashops. Reviewed-by: Heikki Linnakangas Reviewed-by: Ashwin Agrawal Reviewed-by: Shreedhar Hardikar (cherry picked from commit aa52bd5099b512c23e46746c9b9284bf70397116) --- src/backend/gpopt/gpdbwrappers.cpp | 16 ++++ .../gpopt/translate/CContextDXLToPlStmt.cpp | 21 +++- src/backend/parser/analyze.c | 3 +- src/include/gpopt/gpdbwrappers.h | 3 + .../expected/gpdist_legacy_opclasses.out | 96 +++++++++++++++++++ .../gpdist_legacy_opclasses_optimizer.out | 96 +++++++++++++++++++ .../regress/sql/gpdist_legacy_opclasses.sql | 63 ++++++++++++ 7 files changed, 295 insertions(+), 3 deletions(-) diff --git a/src/backend/gpopt/gpdbwrappers.cpp b/src/backend/gpopt/gpdbwrappers.cpp index bee161f468a5..e30ce4ef1751 100644 --- a/src/backend/gpopt/gpdbwrappers.cpp +++ b/src/backend/gpopt/gpdbwrappers.cpp @@ -1455,6 +1455,22 @@ gpdb::GetDefaultDistributionOpclassForType return false; } +Oid +gpdb::GetColumnDefOpclassForType + ( + List *opclassName, + Oid typid + ) +{ + GP_WRAP_START; + { + /* catalog tables: pg_type, pg_opclass */ + return cdb_get_opclass_for_column_def(opclassName, typid); + } + GP_WRAP_END; + return false; +} + Oid gpdb::GetHashProcInOpfamily ( diff --git a/src/backend/gpopt/translate/CContextDXLToPlStmt.cpp b/src/backend/gpopt/translate/CContextDXLToPlStmt.cpp index 3500a078c4dc..0da43bb92d70 100644 --- a/src/backend/gpopt/translate/CContextDXLToPlStmt.cpp +++ b/src/backend/gpopt/translate/CContextDXLToPlStmt.cpp @@ -421,8 +421,25 @@ CContextDXLToPlStmt::GetDistributionHashOpclassForType(Oid typid) // added to the range table only later. If it uses // legacy ops, we have already decided to use default // ops here, and we fall back unnecessarily. - m_distribution_hashops = DistrUseDefaultHashOps; - opclass = gpdb::GetDefaultDistributionOpclassForType(typid); + // + // On the other hand, when the opclass is not specified in the + // distributed-by clause one should be decided according to the + // gp_use_legacy_hashops setting. + opclass = gpdb::GetColumnDefOpclassForType(NIL, typid); + // update m_distribution_hashops accordingly + if (opclass == gpdb::GetDefaultDistributionOpclassForType(typid)) + { + m_distribution_hashops = DistrUseDefaultHashOps; + } + else if (opclass == gpdb::GetLegacyCdbHashOpclassForBaseType(typid)) + { + m_distribution_hashops = DistrUseLegacyHashOps; + } + else + { + GPOS_RAISE(gpdxl::ExmaMD, gpdxl::ExmiMDObjUnsupported, + GPOS_WSZ_LIT("Unsupported distribution hashops policy")); + } break; } diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index d884a2d346ce..8cf030bbb36c 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -3766,7 +3766,8 @@ setQryDistributionPolicy(IntoClause *into, Query *qry) tle = list_nth(qry->targetList, keyindex - 1); keytype = exprType((Node *) tle->expr); - keyopclass = GetIndexOpClass(ielem->opclass, keytype, "hash", HASH_AM_OID); + keyopclass = cdb_get_opclass_for_column_def(ielem->opclass, + keytype); policykeys = lappend_int(policykeys, keyindex); policyopclasses = lappend_oid(policyopclasses, keyopclass); diff --git a/src/include/gpopt/gpdbwrappers.h b/src/include/gpopt/gpdbwrappers.h index eb6a64889bc7..540f2a6af5c5 100644 --- a/src/include/gpopt/gpdbwrappers.h +++ b/src/include/gpopt/gpdbwrappers.h @@ -329,6 +329,9 @@ namespace gpdb { // get the default hash opclass for type Oid GetDefaultDistributionOpclassForType(Oid typid); + // get the column-definition hash opclass for type + Oid GetColumnDefOpclassForType(List *opclassName, Oid typid); + // get the hash function in an opfamily for given datatype Oid GetHashProcInOpfamily(Oid opfamily, Oid typid); diff --git a/src/test/regress/expected/gpdist_legacy_opclasses.out b/src/test/regress/expected/gpdist_legacy_opclasses.out index 5bfdf6f74650..b094bd09f220 100644 --- a/src/test/regress/expected/gpdist_legacy_opclasses.out +++ b/src/test/regress/expected/gpdist_legacy_opclasses.out @@ -1,6 +1,9 @@ -- -- Tests for legacy cdbhash opclasses -- +drop schema if exists gpdist_legacy_opclasses; +create schema gpdist_legacy_opclasses; +set search_path to gpdist_legacy_opclasses; -- Basic sanity check of all the legacy hash opclasses. Create a table that -- uses all of them in the distribution key, and insert a value. set gp_use_legacy_hashops=on; @@ -287,3 +290,96 @@ select * from legacy_enum a inner join legacy_enum b on a.color = b.color; green | green (3 rows) +-- +-- A regression issue that the data is reorganized incorrectly when +-- gp_use_legacy_hashops has non-default value. +-- +-- The ALTER TABLE command reorganizes the data by using a temporary table, if +-- a "distributed by" clause is specified without the opclasses, the default +-- opclasses will be chosen. There was a bug that the non-legacy opclasses are +-- always chosen, regarding the setting of gp_use_legacy_hashops. However the +-- table's new opclasses are determined with gp_use_legacy_hashops, so when +-- gp_use_legacy_hashops is true the data will be incorrectly redistributed. +-- +-- set the guc to the non-default value +set gp_use_legacy_hashops to on; +create table legacy_data_reorg (c1 int) distributed by (c1); +insert into legacy_data_reorg select i from generate_series(1, 10) i; +-- verify the opclass and data distribution +select gp_segment_id, c1 from legacy_data_reorg order by 1, 2; + gp_segment_id | c1 +---------------+---- + 0 | 1 + 0 | 2 + 1 | 3 + 1 | 4 + 1 | 5 + 1 | 6 + 1 | 7 + 2 | 8 + 2 | 9 + 2 | 10 +(10 rows) + +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid = 'legacy_data_reorg'::regclass::oid; + name | opcname +-------------------+------------------ + legacy_data_reorg | cdbhash_int4_ops +(1 row) + +-- when reorganizing the table we set the distributed-by without an explicit +-- opclass, so the default one should be chosen according to +-- gp_use_legacy_hashops. +alter table legacy_data_reorg set with (reorganize) distributed by (c1); +-- double-check the opclass and data distribution +select gp_segment_id, c1 from legacy_data_reorg order by 1, 2; + gp_segment_id | c1 +---------------+---- + 0 | 1 + 0 | 2 + 1 | 3 + 1 | 4 + 1 | 5 + 1 | 6 + 1 | 7 + 2 | 8 + 2 | 9 + 2 | 10 +(10 rows) + +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid = 'legacy_data_reorg'::regclass::oid; + name | opcname +-------------------+------------------ + legacy_data_reorg | cdbhash_int4_ops +(1 row) + +-- +-- A regression issue similar to previous one, with CTAS. +-- +-- The default opclasses in CTAS should also be determined with +-- gp_use_legacy_hashops. +-- +set gp_use_legacy_hashops=off; +create table ctastest_off as select 123 as col distributed by (col); +set gp_use_legacy_hashops=on; +create table ctastest_on as select 123 as col distributed by (col); +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid in ('ctastest_on'::regclass::oid, + 'ctastest_off'::regclass::oid); + name | opcname +--------------+------------------ + ctastest_off | int4_ops + ctastest_on | cdbhash_int4_ops +(2 rows) + diff --git a/src/test/regress/expected/gpdist_legacy_opclasses_optimizer.out b/src/test/regress/expected/gpdist_legacy_opclasses_optimizer.out index b9155a532985..16e9cdcca355 100644 --- a/src/test/regress/expected/gpdist_legacy_opclasses_optimizer.out +++ b/src/test/regress/expected/gpdist_legacy_opclasses_optimizer.out @@ -1,6 +1,9 @@ -- -- Tests for legacy cdbhash opclasses -- +drop schema if exists gpdist_legacy_opclasses; +create schema gpdist_legacy_opclasses; +set search_path to gpdist_legacy_opclasses; -- Basic sanity check of all the legacy hash opclasses. Create a table that -- uses all of them in the distribution key, and insert a value. set gp_use_legacy_hashops=on; @@ -286,3 +289,96 @@ select * from legacy_enum a inner join legacy_enum b on a.color = b.color; green | green (3 rows) +-- +-- A regression issue that the data is reorganized incorrectly when +-- gp_use_legacy_hashops has non-default value. +-- +-- The ALTER TABLE command reorganizes the data by using a temporary table, if +-- a "distributed by" clause is specified without the opclasses, the default +-- opclasses will be chosen. There was a bug that the non-legacy opclasses are +-- always chosen, regarding the setting of gp_use_legacy_hashops. However the +-- table's new opclasses are determined with gp_use_legacy_hashops, so when +-- gp_use_legacy_hashops is true the data will be incorrectly redistributed. +-- +-- set the guc to the non-default value +set gp_use_legacy_hashops to on; +create table legacy_data_reorg (c1 int) distributed by (c1); +insert into legacy_data_reorg select i from generate_series(1, 10) i; +-- verify the opclass and data distribution +select gp_segment_id, c1 from legacy_data_reorg order by 1, 2; + gp_segment_id | c1 +---------------+---- + 0 | 1 + 0 | 2 + 1 | 3 + 1 | 4 + 1 | 5 + 1 | 6 + 1 | 7 + 2 | 8 + 2 | 9 + 2 | 10 +(10 rows) + +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid = 'legacy_data_reorg'::regclass::oid; + name | opcname +-------------------+------------------ + legacy_data_reorg | cdbhash_int4_ops +(1 row) + +-- when reorganizing the table we set the distributed-by without an explicit +-- opclass, so the default one should be chosen according to +-- gp_use_legacy_hashops. +alter table legacy_data_reorg set with (reorganize) distributed by (c1); +-- double-check the opclass and data distribution +select gp_segment_id, c1 from legacy_data_reorg order by 1, 2; + gp_segment_id | c1 +---------------+---- + 0 | 1 + 0 | 2 + 1 | 3 + 1 | 4 + 1 | 5 + 1 | 6 + 1 | 7 + 2 | 8 + 2 | 9 + 2 | 10 +(10 rows) + +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid = 'legacy_data_reorg'::regclass::oid; + name | opcname +-------------------+------------------ + legacy_data_reorg | cdbhash_int4_ops +(1 row) + +-- +-- A regression issue similar to previous one, with CTAS. +-- +-- The default opclasses in CTAS should also be determined with +-- gp_use_legacy_hashops. +-- +set gp_use_legacy_hashops=off; +create table ctastest_off as select 123 as col distributed by (col); +set gp_use_legacy_hashops=on; +create table ctastest_on as select 123 as col distributed by (col); +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid in ('ctastest_on'::regclass::oid, + 'ctastest_off'::regclass::oid); + name | opcname +--------------+------------------ + ctastest_off | int4_ops + ctastest_on | cdbhash_int4_ops +(2 rows) + diff --git a/src/test/regress/sql/gpdist_legacy_opclasses.sql b/src/test/regress/sql/gpdist_legacy_opclasses.sql index 793cfdc898ce..fb9309879eb4 100644 --- a/src/test/regress/sql/gpdist_legacy_opclasses.sql +++ b/src/test/regress/sql/gpdist_legacy_opclasses.sql @@ -2,6 +2,10 @@ -- Tests for legacy cdbhash opclasses -- +drop schema if exists gpdist_legacy_opclasses; +create schema gpdist_legacy_opclasses; +set search_path to gpdist_legacy_opclasses; + -- Basic sanity check of all the legacy hash opclasses. Create a table that -- uses all of them in the distribution key, and insert a value. set gp_use_legacy_hashops=on; @@ -169,3 +173,62 @@ insert into legacy_enum values ('red'), ('green'), ('blue'); explain (costs off) select * from legacy_enum a inner join legacy_enum b on a.color = b.color; select * from legacy_enum a inner join legacy_enum b on a.color = b.color; + +-- +-- A regression issue that the data is reorganized incorrectly when +-- gp_use_legacy_hashops has non-default value. +-- +-- The ALTER TABLE command reorganizes the data by using a temporary table, if +-- a "distributed by" clause is specified without the opclasses, the default +-- opclasses will be chosen. There was a bug that the non-legacy opclasses are +-- always chosen, regarding the setting of gp_use_legacy_hashops. However the +-- table's new opclasses are determined with gp_use_legacy_hashops, so when +-- gp_use_legacy_hashops is true the data will be incorrectly redistributed. +-- + +-- set the guc to the non-default value +set gp_use_legacy_hashops to on; + +create table legacy_data_reorg (c1 int) distributed by (c1); +insert into legacy_data_reorg select i from generate_series(1, 10) i; + +-- verify the opclass and data distribution +select gp_segment_id, c1 from legacy_data_reorg order by 1, 2; +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid = 'legacy_data_reorg'::regclass::oid; + +-- when reorganizing the table we set the distributed-by without an explicit +-- opclass, so the default one should be chosen according to +-- gp_use_legacy_hashops. +alter table legacy_data_reorg set with (reorganize) distributed by (c1); + +-- double-check the opclass and data distribution +select gp_segment_id, c1 from legacy_data_reorg order by 1, 2; +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid = 'legacy_data_reorg'::regclass::oid; + +-- +-- A regression issue similar to previous one, with CTAS. +-- +-- The default opclasses in CTAS should also be determined with +-- gp_use_legacy_hashops. +-- + +set gp_use_legacy_hashops=off; +create table ctastest_off as select 123 as col distributed by (col); + +set gp_use_legacy_hashops=on; +create table ctastest_on as select 123 as col distributed by (col); + +select dp.localoid::regclass::name as name, oc.opcname + from gp_distribution_policy dp + join pg_opclass oc + on oc.oid::text = dp.distclass::text + where dp.localoid in ('ctastest_on'::regclass::oid, + 'ctastest_off'::regclass::oid); From a436c5a11a91f27e13f09260439f3bc8ad7f777f Mon Sep 17 00:00:00 2001 From: Ning Yu Date: Fri, 28 Feb 2020 13:26:38 +0800 Subject: [PATCH 058/102] tuplestore: support backward scanning of spill files When tuplestore cannot store all the data in memory it will spill some of the data to temporary files. In gpdb we used to disable the backward scanning from these spill files because we could not determine the tuple type, memtup or heaptup, correctly. The tuple type is stored in the tuple header in the tuple length field as the leading bit. After the tuple there is also a tuple length, called trailing length, for backward scanning. The problem is that the trailing length was checked for the leading bit, which does not contain this bit at all. Fixed by reading the tuple length inside the tuple, so the tuple type can be determined correctly. Reviewed-by: Ashwin Agrawal Reviewed-by: Heikki Linnakangas Reviewed-by: Alexandra Wang Reviewed-by: Hubert Zhang (cherry picked from commit 70298b393bac7c931435321a9ec76da297f8d291) --- src/backend/utils/sort/tuplestore.c | 24 +++++++++++++--- src/test/regress/expected/misc_jiras.out | 36 ++++++++++++++++++++++++ src/test/regress/greenplum_schedule | 2 +- src/test/regress/sql/misc_jiras.sql | 34 ++++++++++++++++++++++ 4 files changed, 91 insertions(+), 5 deletions(-) create mode 100644 src/test/regress/expected/misc_jiras.out create mode 100644 src/test/regress/sql/misc_jiras.sql diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c index f74cda56f840..7e27ac49e64b 100644 --- a/src/backend/utils/sort/tuplestore.c +++ b/src/backend/utils/sort/tuplestore.c @@ -1054,9 +1054,6 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward, * word. If seek fails, assume we are at start of file. */ - ereport(ERROR, (errmsg("Backward scanning of tuplestores are not supported at this time"))); - return NULL; -#if 0 if (BufFileSeek(state->myfile, readptr->file, -(long) sizeof(unsigned int), SEEK_CUR) != 0) { @@ -1112,7 +1109,6 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward, errmsg("could not seek in tuplestore temporary file: %m"))); tup = READTUP(state, tuplen); return tup; -#endif default: elog(ERROR, "invalid tuplestore state"); return NULL; /* keep compiler quiet */ @@ -1596,6 +1592,26 @@ readtup_heap(Tuplestorestate *state, unsigned int len) void *tup = NULL; uint32 tuplen = 0; + /* + * CDB: in backward mode the passed-in len is the trailing length, it does + * not contain the leading bit as the leading length used in forward mode. + * The leading bit is necessary to determine the tuple type, a memory tuple + * or a heap tuple, so we must re-read the leading length to make this + * decision. + */ + if (state->backward) + { + TSReadPointer *readptr = &state->readptrs[state->activeptr]; + + if (BufFileSeek(state->myfile, readptr->file, + -(long) sizeof(unsigned int), SEEK_CUR) != 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not seek in tuplestore temporary file: %m"))); + + len = getlen(state, false); + } + if (is_len_memtuplen(len)) { tuplen = memtuple_size_from_uint32(len); diff --git a/src/test/regress/expected/misc_jiras.out b/src/test/regress/expected/misc_jiras.out new file mode 100644 index 000000000000..0921a9fd5f4a --- /dev/null +++ b/src/test/regress/expected/misc_jiras.out @@ -0,0 +1,36 @@ +drop schema if exists misc_jiras; +create schema misc_jiras; +-- +-- Test backward scanning of tuplestore spill files. +-- +-- When tuplestore cannot store all the data in memory it will spill some of +-- the data to temporary files. In gpdb we used to disable the backward +-- scanning from these spill files because we could not determine the tuple +-- type, memtup or heaptup, correctly. The issue is fixed, the backward +-- scanning should be supported now. +-- +create table misc_jiras.t1 (c1 int, c2 text, c3 smallint) distributed by (c1); +insert into misc_jiras.t1 select i % 13, md5(i::text), i % 3 + from generate_series(1, 20000) i; +-- tuplestore uses work_mem to control the in-memory data size, set a small +-- value to trigger the spilling. +set work_mem to '64kB'; +WARNING: "work_mem": setting is deprecated, and may be removed in a future release. +select sum(cc) from ( + select c1 + , c2 + , case when count(c3) = 0 then -1.0 + else cume_dist() over (partition by c1, + case when count(c3) > 0 then 1 else 0 end + order by count(c3), c2) + end as cc + from misc_jiras.t1 + group by 1, 2 +) tt; + sum +--------- + 10006.5 +(1 row) + +reset work_mem; +WARNING: "work_mem": setting is deprecated, and may be removed in a future release. diff --git a/src/test/regress/greenplum_schedule b/src/test/regress/greenplum_schedule index 1e49f8e791e7..f8403e0743ef 100755 --- a/src/test/regress/greenplum_schedule +++ b/src/test/regress/greenplum_schedule @@ -41,7 +41,7 @@ test: leastsquares opr_sanity_gp decode_expr bitmapscan bitmapscan_ao case_gp li # below test(s) inject faults so each of them need to be in a separate group test: gpcopy -test: filter gpctas gpdist gpdist_opclasses gpdist_legacy_opclasses matrix toast sublink table_functions olap_setup complex opclass_ddl information_schema guc_env_var gp_explain distributed_transactions explain_format +test: filter gpctas gpdist gpdist_opclasses gpdist_legacy_opclasses matrix toast sublink table_functions olap_setup complex opclass_ddl information_schema guc_env_var gp_explain distributed_transactions explain_format misc_jiras test: guc_gp # namespace_gp test will show diff if concurrent tests use temporary tables. # So run it separately. diff --git a/src/test/regress/sql/misc_jiras.sql b/src/test/regress/sql/misc_jiras.sql new file mode 100644 index 000000000000..d4de7418a84a --- /dev/null +++ b/src/test/regress/sql/misc_jiras.sql @@ -0,0 +1,34 @@ +drop schema if exists misc_jiras; +create schema misc_jiras; + +-- +-- Test backward scanning of tuplestore spill files. +-- +-- When tuplestore cannot store all the data in memory it will spill some of +-- the data to temporary files. In gpdb we used to disable the backward +-- scanning from these spill files because we could not determine the tuple +-- type, memtup or heaptup, correctly. The issue is fixed, the backward +-- scanning should be supported now. +-- + +create table misc_jiras.t1 (c1 int, c2 text, c3 smallint) distributed by (c1); +insert into misc_jiras.t1 select i % 13, md5(i::text), i % 3 + from generate_series(1, 20000) i; + +-- tuplestore uses work_mem to control the in-memory data size, set a small +-- value to trigger the spilling. +set work_mem to '64kB'; + +select sum(cc) from ( + select c1 + , c2 + , case when count(c3) = 0 then -1.0 + else cume_dist() over (partition by c1, + case when count(c3) > 0 then 1 else 0 end + order by count(c3), c2) + end as cc + from misc_jiras.t1 + group by 1, 2 +) tt; + +reset work_mem; From 655eaa049f3354fd8005f33b75479e7137362bf8 Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Fri, 28 Feb 2020 16:04:11 -0800 Subject: [PATCH 059/102] docs - fixes for gpmovemirrors, gpaddmirrors (#9646) * docs - fixes for gpmovemirrors, gpaddmirrors -fix incorrect syntax -fix examples * docs - fix typo. --- .../dita/utility_guide/ref/gpaddmirrors.xml | 45 +++++++++---------- .../dita/utility_guide/ref/gpmovemirrors.xml | 16 +++---- 2 files changed, 30 insertions(+), 31 deletions(-) diff --git a/gpdb-doc/dita/utility_guide/ref/gpaddmirrors.xml b/gpdb-doc/dita/utility_guide/ref/gpaddmirrors.xml index da65b0803ce6..5e5099628fb9 100644 --- a/gpdb-doc/dita/utility_guide/ref/gpaddmirrors.xml +++ b/gpdb-doc/dita/utility_guide/ref/gpaddmirrors.xml @@ -2,7 +2,7 @@ - + gpaddmirrors

      Adds mirror segments to a Greenplum Database system that was initially configured without @@ -53,23 +53,23 @@ Enter mirror segment data directory location 2 of 2 > /gpdb/m2 detailed configuration file using the -i option. This is useful if you want your mirror segments on a completely different set of hosts than your primary segments. The format of the mirror configuration file is:

      - <row_id>=<contentID>|<address>|<port>|<data_dir> -

      Where row_id is the row in the file, contentID is - the segment instance content ID, address is the host name or IP - address of the segment host, port is the communication port, and - data_dir is the segment instance data directory.

      + <contentID>|<address>|<port>|<data_dir> +

      Where <contentID> is the segment instance content ID, + <address> is the host name or IP address of the segment + host, <port> is the communication port, and + <data_dir> is the segment instance data directory.

      For - example:0=0|sdw1-1|60000|/gpdata/mir1/gp0 -1=1|sdw1-1|60001|/gpdata/mir2/gp1

      + example:0|sdw1-1|60000|/gpdata/m1/gp0 +1|sdw1-1|60001|/gpdata/m2/gp1

      The gp_segment_configuration system catalog table can help you determine your current primary segment configuration so that you can plan your mirror segment configuration. For example, run the following query:

      =# SELECT dbid, content, address as host_address, port, datadir    FROM gp_segment_configuration    ORDER BY dbid; -

      If creating your mirrors on alternate mirror hosts, the new mirror segment hosts must - be pre-installed with the Greenplum Database software and configured exactly the - same as the existing primary segment hosts.

      +

      If you are creating mirrors on alternate mirror hosts, the new mirror segment hosts + must be pre-installed with the Greenplum Database software and configured exactly + the same as the existing primary segment hosts.

      You must make sure that the user who runs gpaddmirrors (the gpadmin user) has permissions to write to the data directory locations specified. You may want to create these directories on the segment hosts @@ -110,14 +110,13 @@ Enter mirror segment data directory location 2 of 2 > /gpdb/m2 the system. The format of this file is as follows (as per attributes in the gp_segment_configuration catalog table): - <row_id>=<contentID>|<address>|<port>|<data_dir> + <contentID>|<address>|<port>|<data_dir> -

      Where row_id is the row in the file, - contentID is the segment instance content ID, - address is the host name or IP address of the - segment host, port is the communication port, and - data_dir is the segment instance data +

      Where <contentID> is the segment instance content ID, + <address> is the host name or IP address of the + segment host, <port> is the communication port, and + <data_dir> is the segment instance data directory.

      @@ -153,8 +152,8 @@ Enter mirror segment data directory location 2 of 2 > /gpdb/m2 -p port_offset Optional. This number is used to calculate the database ports used for mirror segments. The default offset is 1000. Mirror port assignments are - calculated as follows: - primary port + offset = mirror database port + calculated as + follows:primary_port + offset = mirror_database_port For example, if a primary segment has port 50001, then its mirror will use a database port of 51001, by default. @@ -191,10 +190,10 @@ Enter mirror segment data directory location 2 of 2 > /gpdb/m2 from your primary data:

      $ gpaddmirrors -i mirror_config_file

      Where mirror_config_file looks something like this:

      - 0=0|sdw1-1|52001|/gpdata/mir1/gp0 -1=1|sdw1-2|52002|/gpdata/mir2/gp1 -2=2|sdw2-1|52001|/gpdata/mir1/gp2 -3=3|sdw2-2|52002|/gpdata/mir2/gp3 + 0|sdw1-1|52001|/gpdata/m1/gp0 +1|sdw1-2|52002|/gpdata/m2/gp1 +2|sdw2-1|52001|/gpdata/m1/gp2 +3|sdw2-2|52002|/gpdata/m2/gp3

      Generate a sample mirror configuration file with the -o option to use with gpaddmirrors -i:

      $ gpaddmirrors -o /home/gpadmin/sample_mirror_config diff --git a/gpdb-doc/dita/utility_guide/ref/gpmovemirrors.xml b/gpdb-doc/dita/utility_guide/ref/gpmovemirrors.xml index a7ddcf2a4e3c..b8d13fa7458b 100644 --- a/gpdb-doc/dita/utility_guide/ref/gpmovemirrors.xml +++ b/gpdb-doc/dita/utility_guide/ref/gpmovemirrors.xml @@ -51,12 +51,12 @@ line inside the configuration file has the following format (as per attributes in the gp_segment_configuration catalog table): - contentID|address|port|data_dir new_address|port|data_dir + <old_address>|<port>|<data_dir> <new_address>|<port>|<data_dir> - Where contentID is the segment instance content ID, - address is the host name or IP address of the segment host, - port is the communication port, and data_dir is - the segment instance data directory. + Where <old_address> and <new_address> are the + host names or IP addresses of the segment hosts, <port> is the + communication port, and <data_dir> is the segment instance data + directory. -l logfile_directory @@ -81,9 +81,9 @@

      Moves mirrors from an existing Greenplum Database system to a different set of hosts:

      $ gpmovemirrors -i move_config_file

      Where the move_config_file looks something like this:

      - 1|sdw2|50001|/data2/mirror/gpseg1 sdw3|50001|/data/mirror/gpseg1 -2|sdw2|50001|/data2/mirror/gpseg2 sdw4|50001|/data/mirror/gpseg2 -3|sdw3|50001|/data2/mirror/gpseg3 sdw1|50001|/data/mirror/gpseg3 + sdw2|50000|/data2/mirror/gpseg0 sdw3|50000|/data/mirror/gpseg0 +sdw2|50001|/data2/mirror/gpseg1 sdw4|50001|/data/mirror/gpseg1 +sdw3|50002|/data2/mirror/gpseg2 sdw1|50002|/data/mirror/gpseg2
    From 1bb52ea807dbbd2f457009a08a710d0c38feca93 Mon Sep 17 00:00:00 2001 From: "Huiliang.liu" Date: Fri, 28 Feb 2020 17:13:33 +0800 Subject: [PATCH 060/102] Add max_retries flag for gpload (#9606) Add max_retries flag for gpload. It indicates the max times on connecting to GPDB timed out. max_retries default value is 0 which means no retry. If max_retries is -1 or other negative value, it means retry forever. Test has been done manually. ( cherry pick from master commit: b891b85ba075c0be22c6215f82923e6d950c062c) --- gpMgmt/bin/gpload.py | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/gpMgmt/bin/gpload.py b/gpMgmt/bin/gpload.py index 7f18cd01aa1b..e88e1fc6a75c 100755 --- a/gpMgmt/bin/gpload.py +++ b/gpMgmt/bin/gpload.py @@ -18,6 +18,7 @@ -l logfile: log output to logfile --no_auto_trans: do not wrap gpload in transaction --gpfdist_timeout timeout: gpfdist timeout value + --max_retries retry_times: max retry times on gpdb connection timed out. 0 means disabled, -1 means forever --version: print version number and exit -?: help ''' @@ -1157,6 +1158,7 @@ def __init__(self,argv): self.startTimestamp = time.time() self.error_table = False self.gpdb_version = "" + self.options.max_retries = 0 seenv = False seenq = False @@ -1217,6 +1219,9 @@ def __init__(self,argv): elif argv[0]=='-f': configFilename = argv[1] argv = argv[2:] + elif argv[0]=='--max_retries': + self.options.max_retries = int(argv[1]) + argv = argv[2:] elif argv[0]=='--no_auto_trans': self.options.no_auto_trans = True argv = argv[1:] @@ -1841,6 +1846,18 @@ def setup_connection(self, recurse = 0): if recurse > 10: self.log(self.ERROR, "too many login attempt failures") self.setup_connection(recurse) + elif errorMessage.find("Connection timed out") != -1 and self.options.max_retries != 0: + recurse += 1 + if self.options.max_retries > 0: + if recurse > self.options.max_retries: # retry failed + self.log(self.ERROR, "could not connect to database after retry %d times, " \ + "error message:\n %s" % (recurse-1, errorMessage)) + else: + self.log(self.INFO, "retry to connect to database, %d of %d times" % (recurse, + self.options.max_retries)) + else: # max_retries < 0, retry forever + self.log(self.INFO, "retry to connect to database.") + self.setup_connection(recurse) else: self.log(self.ERROR, "could not connect to database: %s. Is " \ "the Greenplum Database running on port %i?" % (errorMessage, From 4507c6f8eefd14a6b08ae2def95b4ed312f7ffc5 Mon Sep 17 00:00:00 2001 From: xiong-gang Date: Mon, 2 Mar 2020 16:19:59 +0800 Subject: [PATCH 061/102] Include dtx bits in HEAP2_XACT_MASK Truncate table command updates the pg_class entry and the new tuple is a copy of the old one, we need to take care of the header infomation that introduced by Greenplum. --- src/include/access/htup_details.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h index fae76ed94aae..448521fc780e 100644 --- a/src/include/access/htup_details.h +++ b/src/include/access/htup_details.h @@ -262,7 +262,10 @@ struct HeapTupleHeaderData #define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */ #define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */ -#define HEAP2_XACT_MASK 0xE000 /* visibility-related bits */ +#define HEAP2_XACT_MASK 0xF800 /* visibility-related bits + * GPDB: include HEAP_XMIN_DISTRIBUTED_SNAPSHOT_IGNORE + * and HEAP_XMAX_DISTRIBUTED_SNAPSHOT_IGNORE + */ /* * HEAP_TUPLE_HAS_MATCH is a temporary flag used during hash joins. It is From 6e1342d2ccfaf150af375ccfe3077ea49f5f57b7 Mon Sep 17 00:00:00 2001 From: David Yozie Date: Mon, 2 Mar 2020 14:21:22 -0800 Subject: [PATCH 062/102] Docs - update for 6.5 versioning --- gpdb-doc/book/config.yml | 20 ++-- .../source/subnavs/cloud-subnav.erb | 4 +- .../source/subnavs/gpdb-landing-subnav.erb | 18 +-- .../source/subnavs/pxf-subnav.erb | 108 +++++++++--------- gpdb-doc/book/redirects.rb | 8 +- gpdb-doc/dita/admin_guide/admin_guide.ditamap | 8 +- .../best_practices/best-practices.ditamap | 8 +- gpdb-doc/dita/gpdb-webhelp.ditamap | 4 +- .../dita/install_guide/install_guide.ditamap | 4 +- gpdb-doc/dita/ref_guide/ref_guide.ditamap | 8 +- .../security-guide/security-guide.ditamap | 4 +- .../dita/utility_guide/utility_guide.ditamap | 8 +- 12 files changed, 101 insertions(+), 101 deletions(-) diff --git a/gpdb-doc/book/config.yml b/gpdb-doc/book/config.yml index 66637d999909..351f8c556483 100644 --- a/gpdb-doc/book/config.yml +++ b/gpdb-doc/book/config.yml @@ -6,19 +6,19 @@ sections: - repository: name: markdown at_path: common - directory: 6-4/common + directory: 6-5/common subnav_template: gpdb-landing-subnav - repository: name: markdown at_path: pxf - directory: 6-4/pxf + directory: 6-5/pxf subnav_template: pxf-subnav - repository: name: markdown at_path: cloud - directory: 6-4/cloud + directory: 6-5/cloud subnav_template: cloud-subnav @@ -26,7 +26,7 @@ dita_sections: - repository: name: dita at_path: install_guide - directory: 6-4/install_guide + directory: 6-5/install_guide ditamap_location: install_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval @@ -34,7 +34,7 @@ dita_sections: - repository: name: dita at_path: analytics - directory: 6-4/analytics + directory: 6-5/analytics ditamap_location: analytics.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval @@ -42,35 +42,35 @@ dita_sections: - repository: name: dita at_path: admin_guide - directory: 6-4/admin_guide + directory: 6-5/admin_guide ditamap_location: admin_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: security-guide - directory: 6-4/security-guide + directory: 6-5/security-guide ditamap_location: security-guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: best_practices - directory: 6-4/best_practices + directory: 6-5/best_practices ditamap_location: best-practices.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: utility_guide - directory: 6-4/utility_guide + directory: 6-5/utility_guide ditamap_location: utility_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval - repository: name: dita at_path: ref_guide - directory: 6-4/ref_guide + directory: 6-5/ref_guide ditamap_location: ref_guide.ditamap ditaval_location: ../gpdb-oss-webhelp.ditaval diff --git a/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb b/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb index 972d56ddab12..a0d782e53abe 100644 --- a/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb +++ b/gpdb-doc/book/master_middleman/source/subnavs/cloud-subnav.erb @@ -3,10 +3,10 @@ diff --git a/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb b/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb index f00f749ed9e3..807aeaca2602 100644 --- a/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb +++ b/gpdb-doc/book/master_middleman/source/subnavs/gpdb-landing-subnav.erb @@ -3,32 +3,32 @@ diff --git a/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb b/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb index 85dee6aa8b7b..33ce9fd9c852 100644 --- a/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb +++ b/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb @@ -3,114 +3,114 @@ diff --git a/gpdb-doc/book/redirects.rb b/gpdb-doc/book/redirects.rb index a88f8df2f326..acec9fd5cf40 100644 --- a/gpdb-doc/book/redirects.rb +++ b/gpdb-doc/book/redirects.rb @@ -1,4 +1,4 @@ -r301 '/', '/6-4/common/gpdb-features.html' -r301 '/index.html', '/6-4/common/gpdb-features.html' -r301 '/6-4/index.html', '/6-4/common/gpdb-features.html' -r301 %r{(.*)/homenav.html}, '/6-4/common/gpdb-features.html' \ No newline at end of file +r301 '/', '/6-5/common/gpdb-features.html' +r301 '/index.html', '/6-5/common/gpdb-features.html' +r301 '/6-5/index.html', '/6-5/common/gpdb-features.html' +r301 %r{(.*)/homenav.html}, '/6-5/common/gpdb-features.html' \ No newline at end of file diff --git a/gpdb-doc/dita/admin_guide/admin_guide.ditamap b/gpdb-doc/dita/admin_guide/admin_guide.ditamap index 5529e9793dfa..0c92207bc4c8 100644 --- a/gpdb-doc/dita/admin_guide/admin_guide.ditamap +++ b/gpdb-doc/dita/admin_guide/admin_guide.ditamap @@ -1,10 +1,10 @@ - - + + diff --git a/gpdb-doc/dita/best_practices/best-practices.ditamap b/gpdb-doc/dita/best_practices/best-practices.ditamap index 20b8a607b51e..77d758d99199 100644 --- a/gpdb-doc/dita/best_practices/best-practices.ditamap +++ b/gpdb-doc/dita/best_practices/best-practices.ditamap @@ -2,10 +2,10 @@ Greenplum Database Best Practices - - + + diff --git a/gpdb-doc/dita/gpdb-webhelp.ditamap b/gpdb-doc/dita/gpdb-webhelp.ditamap index 1af349ce6a50..dd4019f37298 100644 --- a/gpdb-doc/dita/gpdb-webhelp.ditamap +++ b/gpdb-doc/dita/gpdb-webhelp.ditamap @@ -2,8 +2,8 @@ Greenplum Database - + diff --git a/gpdb-doc/dita/install_guide/install_guide.ditamap b/gpdb-doc/dita/install_guide/install_guide.ditamap index bbd46c7aa36e..bd97764c2192 100644 --- a/gpdb-doc/dita/install_guide/install_guide.ditamap +++ b/gpdb-doc/dita/install_guide/install_guide.ditamap @@ -1,8 +1,8 @@ - + diff --git a/gpdb-doc/dita/ref_guide/ref_guide.ditamap b/gpdb-doc/dita/ref_guide/ref_guide.ditamap index 6a8278a14ca8..b1804eb1055a 100644 --- a/gpdb-doc/dita/ref_guide/ref_guide.ditamap +++ b/gpdb-doc/dita/ref_guide/ref_guide.ditamap @@ -1,10 +1,10 @@ - - + + diff --git a/gpdb-doc/dita/security-guide/security-guide.ditamap b/gpdb-doc/dita/security-guide/security-guide.ditamap index 03046c731792..5e7a31daaffc 100644 --- a/gpdb-doc/dita/security-guide/security-guide.ditamap +++ b/gpdb-doc/dita/security-guide/security-guide.ditamap @@ -1,9 +1,9 @@ Security Configuration Guide - - diff --git a/gpdb-doc/dita/utility_guide/utility_guide.ditamap b/gpdb-doc/dita/utility_guide/utility_guide.ditamap index bed878c79db6..47b4aaf1b03c 100644 --- a/gpdb-doc/dita/utility_guide/utility_guide.ditamap +++ b/gpdb-doc/dita/utility_guide/utility_guide.ditamap @@ -1,10 +1,10 @@ - - + + From b583950a855c6d1fa37a79600233c6b65df648a8 Mon Sep 17 00:00:00 2001 From: Zhenghua Lyu Date: Tue, 3 Mar 2020 10:40:37 +0800 Subject: [PATCH 063/102] Fix plan when join lateral inner plan contains limit clause. Fix plan when join lateral inner plan contains limit clause. Previously, when join lateral inner plan contains limit clause and the exec params are in targetlist of the query, for the inner plan it may put a gather motion and then do limit. This is not correct since it leads to passing params across motion nodes. A typical case is shown below: ``` create table t1(a int, b int, c int) distributed by (a); create table t2(a int, b int, c int) distributed by (a); explain verbose select * from t1 join lateral (select t1.b + t2.a from t2 limit 1)x on true; QUERY PLAN -------------------------------------------------------- Nested Loop Output: t1.a, t1.b, t1.c, ((t1.b + t2.a)) -> Gather Motion 3:1 Output: t1.a, t1.b, t1.c -> Seq Scan on public.t1 Output: t1.a, t1.b, t1.c -> Materialize Output: ((t1.b + t2.a)) -> Limit Output: ((t1.b + t2.a)) -> Gather Motion 3:1 Output: ((t1.b + t2.a)) -> Limit Output: ((t1.b + t2.a)) -> Seq Scan on public.t2 Output: (t1.b + t2.a) ``` The above plan is invalid because NestLoop has to pass params down to the scan of t2. Greenplum does not support this yet. This commit fixes the bug by gathering the table firstly. When the subquery contains outer params and it has limit clause, planner will try this. After this commit, the above plan becomes: ``` explain verbose select * from t1 join lateral (select t1.b + t2.a from t2 limit 1)x on true; QUERY PLAN ------------------------------------------------------------ Nested Loop Output: t1.a, t1.b, t1.c, ((t1.b + t2.a)) -> Materialize Output: t1.a, t1.b, t1.c -> Gather Motion 3:1 Output: t1.a, t1.b, t1.c -> Seq Scan on public.t1 Output: t1.a, t1.b, t1.c -> Materialize Output: ((t1.b + t2.a)) -> Limit Output: ((t1.b + t2.a)) -> Result Output: (t1.b + t2.a) -> Materialize Output: t2.a -> Gather Motion 3:1 Output: t2.a -> Seq Scan on public.t2 Output: t2.a ``` --- src/backend/cdb/cdbmutate.c | 39 ++++++ src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/outfuncs.c | 17 +++ src/backend/optimizer/path/allpaths.c | 90 ++++++++++++++ src/backend/optimizer/plan/createplan.c | 112 +++++++++++++++++- src/backend/optimizer/plan/initsplan.c | 74 +++++++++++- src/backend/optimizer/plan/planmain.c | 2 + src/backend/optimizer/util/pathnode.c | 77 ++++++++++++ src/include/cdb/cdbmutate.h | 2 +- src/include/nodes/nodes.h | 1 + src/include/nodes/plannerconfig.h | 1 + src/include/nodes/relation.h | 38 ++++++ src/include/optimizer/pathnode.h | 5 + src/include/optimizer/planmain.h | 2 + src/test/regress/expected/join_gp.out | 83 +++++++++++++ .../regress/expected/join_gp_optimizer.out | 83 +++++++++++++ src/test/regress/sql/join_gp.sql | 44 +++++++ 17 files changed, 663 insertions(+), 8 deletions(-) diff --git a/src/backend/cdb/cdbmutate.c b/src/backend/cdb/cdbmutate.c index 662ec9cfb0aa..7be96aec1d03 100644 --- a/src/backend/cdb/cdbmutate.c +++ b/src/backend/cdb/cdbmutate.c @@ -3577,3 +3577,42 @@ sri_optimize_for_result(PlannerInfo *root, Plan *plan, RangeTblEntry *rte, } } } + +/* + * Does the given expression contain Params that are passed down from + * outer query? + */ +bool +contains_outer_params(Node *node, void *context) +{ + PlannerInfo *root = (PlannerInfo *) context; + + if (node == NULL) + return false; + if (IsA(node, Param)) + { + Param *param = (Param *) node; + + if (param->paramkind == PARAM_EXEC) + { + /* Does this Param refer to a value that an outer query provides? */ + PlannerInfo *parent = root->parent_root; + + while (parent) + { + ListCell *lc; + + foreach (lc, parent->plan_params) + { + PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc); + + if (ppi->paramId == param->paramid) + return true; /* abort the tree traversal and return true */ + } + + parent = parent->parent_root; + } + } + } + return expression_tree_walker(node, contains_outer_params, context); +} diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index b2e1346bcaac..2d7801cd0d90 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2392,6 +2392,7 @@ _copyRestrictInfo(const RestrictInfo *from) COPY_SCALAR_FIELD(outerjoin_delayed); COPY_SCALAR_FIELD(can_join); COPY_SCALAR_FIELD(pseudoconstant); + COPY_SCALAR_FIELD(contain_outer_query_references); COPY_BITMAPSET_FIELD(clause_relids); COPY_BITMAPSET_FIELD(required_relids); COPY_BITMAPSET_FIELD(outer_relids); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 4bad87cb2f35..afc61bf50fd3 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2205,6 +2205,19 @@ _outHashPath(StringInfo str, const HashPath *node) WRITE_INT_FIELD(num_batches); } +#ifndef COMPILING_BINARY_FUNCS +static void +_outProjectionPath(StringInfo str, const ProjectionPath *node) +{ + WRITE_NODE_TYPE("PROJECTIONPATH"); + + _outPathInfo(str, (const Path *) node); + + WRITE_NODE_FIELD(subpath); + WRITE_BOOL_FIELD(dummypp); +} +#endif /* COMPILING_BINARY_FUNCS */ + static void _outCdbMotionPath(StringInfo str, const CdbMotionPath *node) { @@ -2436,6 +2449,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node) WRITE_BOOL_FIELD(outerjoin_delayed); WRITE_BOOL_FIELD(can_join); WRITE_BOOL_FIELD(pseudoconstant); + WRITE_BOOL_FIELD(contain_outer_query_references); WRITE_BITMAPSET_FIELD(clause_relids); WRITE_BITMAPSET_FIELD(required_relids); WRITE_BITMAPSET_FIELD(outer_relids); @@ -5051,6 +5065,9 @@ _outNode(StringInfo str, const void *obj) case T_HashPath: _outHashPath(str, obj); break; + case T_ProjectionPath: + _outProjectionPath(str, obj); + break; case T_CdbMotionPath: _outCdbMotionPath(str, obj); break; diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index af27d02ee221..a48456714c2c 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -120,6 +120,7 @@ static void subquery_push_qual(Query *subquery, RangeTblEntry *rte, Index rti, Node *qual); static void recurse_push_qual(Node *setOp, Query *topquery, RangeTblEntry *rte, Index rti, Node *qual); +static void bring_to_singleQE(PlannerInfo *root, RelOptInfo *rel, List *outer_quals); /* @@ -368,6 +369,73 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel, Assert(rel->rows > 0 || IS_DUMMY_REL(rel)); } +/* + * Decorate the Paths of 'rel' with Motions to bring the relation's + * result to SingleQE locus. The final plan will look something like + * this: + * + * Result (with quals from 'outer_quals') + * \ + * \_Material + * \ + * \_ Gather + * \ + * \_SeqScan (with quals from 'baserestrictinfo') + */ +static void +bring_to_singleQE(PlannerInfo *root, RelOptInfo *rel, List *outer_quals) +{ + List *origpathlist; + ListCell *lc; + + origpathlist = rel->pathlist; + rel->cheapest_startup_path = NULL; + rel->cheapest_total_path = NULL; + rel->cheapest_unique_path = NULL; + rel->cheapest_parameterized_paths = NIL; + rel->pathlist = NIL; + + foreach(lc, origpathlist) + { + Path *origpath = (Path *) lfirst(lc); + Path *path; + CdbPathLocus target_locus; + + if (CdbPathLocus_IsGeneral(origpath->locus) || + CdbPathLocus_IsSingleQE(origpath->locus)) + path = origpath; + else + { + /* + * Cannot pass a param through motion, so if this is a parameterized + * path, we can't use it. + */ + if (origpath->param_info) + continue; + + CdbPathLocus_MakeSingleQE(&target_locus, + origpath->locus.numsegments); + + path = cdbpath_create_motion_path(root, + origpath, + NIL, // DESTROY pathkeys + false, + target_locus); + + path = (Path *) create_material_path(root, rel, path); + + if (outer_quals) + path = (Path *) create_projection_path_with_quals(root, + rel, + path, + outer_quals); + } + + add_path(rel, path); + } + set_cheapest(rel); +} + /* * set_rel_pathlist * Build access paths for a base relation @@ -424,6 +492,18 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, } } + if (root->config->force_singleQE) + { + /* + * CDB: we cannot pass parameters across motion, + * if this is the inner plan of a lateral join and + * it contains limit clause, we will reach here. + * Planner will gather all the data into singleQE + * and materialize it. + */ + bring_to_singleQE(root, rel, rel->upperrestrictinfo); + } + #ifdef OPTIMIZER_DEBUG debug_print_rel(root, rel); #endif @@ -1371,6 +1451,16 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Generate the plan for the subquery */ config = CopyPlannerConfig(root->config); config->honor_order_by = false; /* partial order is enough */ + /* + * CDB: if this subquery is the inner plan of a lateral + * join and if it contains a limit, we can only gather + * it to singleQE and materialize the data because we + * cannot pass params across motion. + */ + config->force_singleQE = false; + if ((!bms_is_empty(required_outer)) && + (subquery->limitCount || subquery->limitOffset)) + config->force_singleQE = true; rel->subplan = subquery_planner(root->glob, subquery, root, diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index f98d44765b63..cc9a4eb9e063 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -76,6 +76,7 @@ static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_p static Result *create_result_plan(PlannerInfo *root, ResultPath *best_path); static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path); static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path); +static Plan *create_projection_plan(PlannerInfo *root, ProjectionPath *best_path); static Plan *create_motion_plan(PlannerInfo *root, CdbMotionPath *path); static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path, List *tlist, List *scan_clauses); @@ -309,8 +310,16 @@ create_plan_recurse(PlannerInfo *root, Path *best_path) (MergeAppendPath *) best_path); break; case T_Result: - plan = (Plan *) create_result_plan(root, - (ResultPath *) best_path); + if (IsA(best_path, ProjectionPath)) + { + plan = create_projection_plan(root, + (ProjectionPath *) best_path); + } + else + { + plan = (Plan *) create_result_plan(root, + (ResultPath *) best_path); + } break; case T_Material: plan = (Plan *) create_material_plan(root, @@ -793,6 +802,16 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path) best_path->outerjoinpath->motionHazard) ((Join *) plan)->prefetch_joinqual = true; + /* CDB: if the join's locus is bottleneck which means the + * join gang only contains one process, so there is no + * risk for motion deadlock. + */ + if (CdbPathLocus_IsBottleneck(best_path->path.locus)) + { + ((Join *) plan)->prefetch_inner = false; + ((Join *) plan)->prefetch_joinqual = false; + } + plan->flow = cdbpathtoplan_create_flow(root, best_path->path.locus, best_path->path.parent ? best_path->path.parent->relids @@ -1244,6 +1263,95 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path) return plan; } +/* + * create_projection_plan + * + * Create a plan tree to do a projection step and (recursively) plans + * for its subpaths. We may need a Result node for the projection, + * but sometimes we can just let the subplan do the work. + */ +static Plan * +create_projection_plan(PlannerInfo *root, ProjectionPath *best_path) +{ + Plan *plan; + Plan *subplan; + List *tlist; + + /* Since we intend to project, we don't need to constrain child tlist */ + subplan = create_plan_recurse(root, best_path->subpath); + + tlist = build_path_tlist(root, &best_path->path); + + /* + * We might not really need a Result node here, either because the subplan + * can project or because it's returning the right list of expressions + * anyway. Usually create_projection_path will have detected that and set + * dummypp if we don't need a Result; but its decision can't be final, + * because some createplan.c routines change the tlists of their nodes. + * (An example is that create_merge_append_plan might add resjunk sort + * columns to a MergeAppend.) So we have to recheck here. If we do + * arrive at a different answer than create_projection_path did, we'll + * have made slightly wrong cost estimates; but label the plan with the + * cost estimates we actually used, not "corrected" ones. (XXX this could + * be cleaned up if we moved more of the sortcolumn setup logic into Path + * creation, but that would add expense to creating Paths we might end up + * not using.) + */ + if (!best_path->cdb_restrict_clauses && + (is_projection_capable_plan(subplan) || + tlist_same_exprs(tlist, subplan->targetlist))) + { + /* Don't need a separate Result, just assign tlist to subplan */ + plan = subplan; + plan->targetlist = tlist; + + /* Label plan with the estimated costs we actually used */ + plan->startup_cost = best_path->path.startup_cost; + plan->total_cost = best_path->path.total_cost; + plan->plan_rows = best_path->path.rows; + plan->plan_width = subplan->plan_width; + /* ... but be careful not to munge subplan's parallel-aware flag */ + } + else + { + List *scan_clauses = NIL; + List *pseudoconstants = NIL; + + if (best_path->cdb_restrict_clauses) + { + List *all_clauses = best_path->cdb_restrict_clauses; + + /* Replace any outer-relation variables with nestloop params */ + if (best_path->path.param_info) + { + all_clauses = (List *) + replace_nestloop_params(root, (Node *) all_clauses); + } + + /* Sort clauses into best execution order */ + all_clauses = order_qual_clauses(root, all_clauses); + + /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */ + scan_clauses = extract_actual_clauses(all_clauses, false); + + /* but we actually also want the pseudoconstants */ + pseudoconstants = extract_actual_clauses(all_clauses, true); + } + + /* We need a Result node */ + plan = (Plan *) make_result(root, tlist, (Node *) pseudoconstants, subplan); + + plan->qual = scan_clauses; + + copy_path_costsize(root, plan, (Path *) best_path); + plan->flow = cdbpathtoplan_create_flow(root, + best_path->path.locus, + best_path->path.parent ? best_path->path.parent->relids : NULL, + plan); + } + + return plan; +} /* * create_motion_plan diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index e15561079973..27e54e651682 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -17,6 +17,7 @@ #include "postgres.h" #include "catalog/pg_type.h" +#include "cdb/cdbmutate.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" @@ -174,8 +175,8 @@ build_base_rel_tlists(PlannerInfo *root, List *final_tlist) * be true before deconstruct_jointree begins, and false after that.) */ void -add_vars_to_targetlist(PlannerInfo *root, List *vars, - Relids where_needed, bool create_new_ph) +add_vars_to_targetlist_x(PlannerInfo *root, List *vars, + Relids where_needed, bool create_new_ph, bool force) { ListCell *temp; @@ -210,7 +211,7 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, } /* System-defined attribute, whole row, or user-defined attribute */ - if (bms_is_subset(where_needed, rel->relids)) + if (bms_is_subset(where_needed, rel->relids) && !force) continue; Assert(attno >= rel->min_attr && attno <= rel->max_attr); attno -= rel->min_attr; @@ -238,6 +239,12 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, } } +void +add_vars_to_targetlist(PlannerInfo *root, List *vars, Bitmapset *where_needed, + bool create_new_ph) +{ + add_vars_to_targetlist_x(root, vars, where_needed, create_new_ph, false); +} /***************************************************************************** * @@ -2032,6 +2039,43 @@ check_redundant_nullability_qual(PlannerInfo *root, Node *clause) return false; } +static bool +rel_need_upper(PlannerInfo *root, RelOptInfo *rel) +{ + switch (rel->rtekind) + { + case RTE_RELATION: + return GpPolicyIsPartitioned(rel->cdbpolicy) || + GpPolicyIsReplicated(rel->cdbpolicy); + + case RTE_SUBQUERY: + /* play it safe */ + return true; + + case RTE_FUNCTION: + /* XXX: depends on EXECUTE ON directive */ + return false; + + case RTE_VALUES: + return false; + + case RTE_TABLEFUNCTION: + /* no correlated subqueries are allowed in a tablefunctions. So not sure + * if this can happen */ + return true; + + case RTE_CTE: + /* play it safe */ + return true; + + case RTE_VOID: + case RTE_JOIN: + default: + /* shouldn't happen */ + elog(ERROR, "unexpected RTE kind %d", rel->rtekind); + } +} + /* * distribute_restrictinfo_to_rels * Push a completed RestrictInfo into the proper restriction or join @@ -2048,6 +2092,9 @@ distribute_restrictinfo_to_rels(PlannerInfo *root, Relids relids = restrictinfo->required_relids; RelOptInfo *rel; + if (contains_outer_params((Node *) restrictinfo->clause, root)) + restrictinfo->contain_outer_query_references = true; + switch (bms_membership(relids)) { case BMS_SINGLETON: @@ -2059,8 +2106,25 @@ distribute_restrictinfo_to_rels(PlannerInfo *root, rel = find_base_rel(root, bms_singleton_member(relids)); /* Add clause to rel's restriction list */ - rel->baserestrictinfo = lappend(rel->baserestrictinfo, - restrictinfo); + if (restrictinfo->contain_outer_query_references && + rel_need_upper(root, rel) && + root->config->force_singleQE) + { + List *vars = pull_var_clause((Node *) restrictinfo->clause, + PVC_RECURSE_AGGREGATES, + PVC_RECURSE_PLACEHOLDERS); + + add_vars_to_targetlist_x(root, vars, relids, + false, /* create_new_ph */ + true /* force */); + list_free(vars); + + rel->upperrestrictinfo = lappend(rel->upperrestrictinfo, + restrictinfo); + } + else + rel->baserestrictinfo = lappend(rel->baserestrictinfo, + restrictinfo); break; case BMS_MULTIPLE: diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index c93431670f60..fe1c7b4aad7c 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -378,6 +378,8 @@ PlannerConfig *DefaultPlannerConfig(void) c1->is_under_subplan = false; + c1->force_singleQE = false; + return c1; } diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 62edfa23c79f..a11f92989ea2 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -3608,3 +3608,80 @@ reparameterize_path(PlannerInfo *root, Path *path, } return NULL; } + +/* + * create_projection_path_with_quals + * Creates a pathnode that represents performing a projection and filter. + * + * 'rel' is the parent relation associated with the result + * 'subpath' is the path representing the source of data + */ +ProjectionPath * +create_projection_path_with_quals(PlannerInfo *root, + RelOptInfo *rel, + Path *subpath, + List *restrict_clauses) +{ + ProjectionPath *pathnode = makeNode(ProjectionPath); + + pathnode->path.pathtype = T_Result; + pathnode->path.parent = rel; + /* For now, assume we are above any joins, so no parameterization */ + pathnode->path.param_info = NULL; + /* Projection does not change the sort order */ + pathnode->path.pathkeys = subpath->pathkeys; + + pathnode->subpath = subpath; + + /* + * We might not need a separate Result node. If the input plan node type + * can project, we can just tell it to project something else. Or, if it + * can't project but the desired target has the same expression list as + * what the input will produce anyway, we can still give it the desired + * tlist (possibly changing its ressortgroupref labels, but nothing else). + * Note: in the latter case, create_projection_plan has to recheck our + * conclusion; see comments therein. + * + * GPDB: The 'restrict_clauses' is a GPDB addition. If the subpath supports + * Filters, we could push them down too. But currently this is only used on + * top of Material paths, which don't support it, so it doesn't matter. + * + * GPDB_96_MERGE_FIXME: Until 9.6, this isn't used in any situation where + * we wouldn't need a Result. And we don't have is_projection_capable_path() + * yet. + */ +#if 0 + if (!restrict_clauses && + (is_projection_capable_path(subpath) || + equal(oldtarget->exprs, target->exprs))) + { + /* No separate Result node needed */ + pathnode->dummypp = true; + + /* + * Set cost of plan as subpath's cost, adjusted for tlist replacement. + */ + pathnode->path.rows = subpath->rows; + pathnode->path.startup_cost = subpath->startup_cost + + (target->cost.startup - oldtarget->cost.startup); + pathnode->path.total_cost = subpath->total_cost + + (target->cost.startup - oldtarget->cost.startup) + + (target->cost.per_tuple - oldtarget->cost.per_tuple) * subpath->rows; + } + else +#endif + { + /* We really do need the Result node */ + pathnode->dummypp = false; + + pathnode->path.rows = subpath->rows; + pathnode->path.startup_cost = subpath->startup_cost; + pathnode->path.total_cost = subpath->total_cost; + + pathnode->cdb_restrict_clauses = restrict_clauses; + } + + pathnode->path.locus = subpath->locus; + + return pathnode; +} diff --git a/src/include/cdb/cdbmutate.h b/src/include/cdb/cdbmutate.h index afd075fb5651..bc3fbdd4d8e6 100644 --- a/src/include/cdb/cdbmutate.h +++ b/src/include/cdb/cdbmutate.h @@ -65,6 +65,6 @@ extern void sri_optimize_for_result(PlannerInfo *root, Plan *plan, RangeTblEntry GpPolicy **targetPolicy, List **hashExprs_p, List **hashOpclasses_p); extern SplitUpdate *make_splitupdate(PlannerInfo *root, ModifyTable *mt, Plan *subplan, RangeTblEntry *rte); - +extern bool contains_outer_params(Node *node, void *context); #endif /* CDBMUTATE_H */ diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 892ffca3ac03..1e353fb1618e 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -311,6 +311,7 @@ typedef enum NodeTag T_ResultPath, T_MaterialPath, T_UniquePath, + T_ProjectionPath, T_EquivalenceClass, T_EquivalenceMember, T_PathKey, diff --git a/src/include/nodes/plannerconfig.h b/src/include/nodes/plannerconfig.h index 479509d72788..710d778fa1ab 100644 --- a/src/include/nodes/plannerconfig.h +++ b/src/include/nodes/plannerconfig.h @@ -48,6 +48,7 @@ typedef struct PlannerConfig /* These ones are tricky */ //GpRoleValue Gp_role; // TODO: this one is tricky + bool force_singleQE; /* True means force gather base rel to singleQE */ } PlannerConfig; extern PlannerConfig *DefaultPlannerConfig(void); diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 0c8d42a05780..792a2a18cf3f 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -605,6 +605,15 @@ typedef struct RelOptInfo * involving this rel */ bool has_eclass_joins; /* T means joininfo is incomplete */ + /* + * In a subquery, if this base relation contains quals that must + * be evaluated at "outerquery" locus, and the base relation has a + * different locus, they are kept here in 'upperrestrictinfo', instead of + * 'baserestrictinfo'. + */ + List *upperrestrictinfo; /* RestrictInfo structures (if base + * rel) */ + /* used by foreign scan */ ForeignTable *ftEntry; } RelOptInfo; @@ -1353,6 +1362,29 @@ typedef struct HashPath int num_batches; /* number of batches expected */ } HashPath; +/* + * ProjectionPath represents a projection (that is, targetlist computation) + * + * Nominally, this path node represents using a Result plan node to do a + * projection step. However, if the input plan node supports projection, + * we can just modify its output targetlist to do the required calculations + * directly, and not need a Result. In some places in the planner we can just + * jam the desired PathTarget into the input path node (and adjust its cost + * accordingly), so we don't need a ProjectionPath. But in other places + * it's necessary to not modify the input path node, so we need a separate + * ProjectionPath node, which is marked dummy to indicate that we intend to + * assign the work to the input plan node. The estimated cost for the + * ProjectionPath node will account for whether a Result will be used or not. + */ +typedef struct ProjectionPath +{ + Path path; + Path *subpath; /* path representing input source */ + bool dummypp; /* true if no separate Result is needed */ + + List *cdb_restrict_clauses; +} ProjectionPath; + /* * Restriction clause info. * @@ -1498,6 +1530,12 @@ typedef struct RestrictInfo bool pseudoconstant; /* see comment above */ + /* + * GPDB: does the clause refer to outer query levels? (Which implies that + * it must be evaluted in the same slice as the parent query) + */ + bool contain_outer_query_references; + /* The set of relids (varnos) actually referenced in the clause: */ Relids clause_relids; diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index f8f3d7231157..45fe629f7378 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -161,6 +161,11 @@ extern Path *reparameterize_path(PlannerInfo *root, Path *path, Relids required_outer, double loop_count); +extern ProjectionPath *create_projection_path_with_quals(PlannerInfo *root, + RelOptInfo *rel, + Path *subpath, + List *restrict_clauses); + /* * prototypes for relnode.c */ diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 79643175fd36..b37269d0ff34 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -235,6 +235,8 @@ extern int join_collapse_limit; extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); +extern void add_vars_to_targetlist_x(PlannerInfo *root, List *vars, + Relids where_needed, bool create_new_ph, bool force); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed, bool create_new_ph); extern void find_lateral_references(PlannerInfo *root); diff --git a/src/test/regress/expected/join_gp.out b/src/test/regress/expected/join_gp.out index 2cadaebdea9e..8f5a7a1a9d1b 100644 --- a/src/test/regress/expected/join_gp.out +++ b/src/test/regress/expected/join_gp.out @@ -966,3 +966,86 @@ join t_randomly_dist_table on t_subquery_general.a = t_randomly_dist_table.c; reset enable_hashjoin; reset enable_mergejoin; reset enable_nestloop; +-- test lateral join inner plan contains limit +-- we cannot pass params across motion so we +-- can only generate a plan to gather all the +-- data to singleQE. Here we create a compound +-- data type as params to pass into inner plan. +-- By doing so, if we fail to pass correct params +-- into innerplan, it will throw error because +-- of nullpointer reference. If we only use int +-- type as params, the nullpointer reference error +-- may not happen because we parse null to integer 0. +create type mytype_for_lateral_test as (x int, y int); +create table t1_lateral_limit(a int, b int, c mytype_for_lateral_test); +create table t2_lateral_limit(a int, b int); +insert into t1_lateral_limit values (1, 1, '(1,1)'); +insert into t1_lateral_limit values (1, 2, '(2,2)'); +insert into t2_lateral_limit values (2, 2); +insert into t2_lateral_limit values (3, 3); +explain select * from t1_lateral_limit as t1 cross join lateral +(select ((c).x+t2.b) as n from t2_lateral_limit as t2 order by n limit 1)s; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------- + Nested Loop (cost=10000000001.05..10000000002.11 rows=4 width=41) + -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1.03 rows=1 width=37) + -> Seq Scan on t1_lateral_limit t1 (cost=0.00..1.01 rows=1 width=37) + -> Materialize (cost=1.05..1.07 rows=1 width=4) + -> Limit (cost=1.05..1.05 rows=1 width=4) + -> Sort (cost=1.05..1.05 rows=1 width=4) + Sort Key: (((t1.c).x + t2.b)) + -> Result (cost=0.00..1.04 rows=1 width=4) + -> Materialize (cost=0.00..1.03 rows=1 width=4) + -> Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..1.03 rows=1 width=4) + -> Seq Scan on t2_lateral_limit t2 (cost=0.00..1.01 rows=1 width=4) + Optimizer: Postgres query optimizer +(12 rows) + +select * from t1_lateral_limit as t1 cross join lateral +(select ((c).x+t2.b) as n from t2_lateral_limit as t2 order by n limit 1)s; + a | b | c | n +---+---+-------+--- + 1 | 1 | (1,1) | 3 + 1 | 2 | (2,2) | 4 +(2 rows) + +-- The following case is from Github Issue +-- https://github.com/greenplum-db/gpdb/issues/8860 +-- It is the same issue as the above test suite. +create table t_mylog_issue_8860 (myid int, log_date timestamptz ); +insert into t_mylog_issue_8860 values (1,timestamptz '2000-01-02 03:04'),(1,timestamptz '2000-01-02 03:04'-'1 hour'::interval); +insert into t_mylog_issue_8860 values (2,timestamptz '2000-01-02 03:04'),(2,timestamptz '2000-01-02 03:04'-'2 hour'::interval); +explain select ml1.myid, log_date as first_date, ml2.next_date from t_mylog_issue_8860 ml1 +inner join lateral +(select myid, log_date as next_date + from t_mylog_issue_8860 where myid = ml1.myid and log_date > ml1.log_date order by log_date asc limit 1) ml2 +on true; + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------- + Nested Loop (cost=10000000001.08..10000000002.18 rows=4 width=20) + -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1.06 rows=2 width=12) + -> Seq Scan on t_mylog_issue_8860 ml1 (cost=0.00..1.02 rows=1 width=12) + -> Materialize (cost=1.08..1.10 rows=1 width=8) + -> Subquery Scan on ml2 (cost=1.08..1.09 rows=1 width=8) + -> Limit (cost=1.08..1.08 rows=1 width=12) + -> Sort (cost=1.08..1.08 rows=2 width=12) + Sort Key: t_mylog_issue_8860.log_date + -> Result (cost=0.00..1.07 rows=2 width=12) + Filter: ((t_mylog_issue_8860.log_date > ml1.log_date) AND (t_mylog_issue_8860.myid = ml1.myid)) + -> Materialize (cost=0.00..1.07 rows=1 width=12) + -> Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..1.06 rows=2 width=12) + -> Seq Scan on t_mylog_issue_8860 (cost=0.00..1.02 rows=1 width=12) + Optimizer: Postgres query optimizer +(14 rows) + +select ml1.myid, log_date as first_date, ml2.next_date from t_mylog_issue_8860 ml1 +inner join lateral +(select myid, log_date as next_date + from t_mylog_issue_8860 where myid = ml1.myid and log_date > ml1.log_date order by log_date asc limit 1) ml2 +on true; + myid | first_date | next_date +------+------------------------------+------------------------------ + 2 | Sun Jan 02 01:04:00 2000 PST | Sun Jan 02 03:04:00 2000 PST + 1 | Sun Jan 02 02:04:00 2000 PST | Sun Jan 02 03:04:00 2000 PST +(2 rows) + diff --git a/src/test/regress/expected/join_gp_optimizer.out b/src/test/regress/expected/join_gp_optimizer.out index 0bd50105e854..93100a33d915 100644 --- a/src/test/regress/expected/join_gp_optimizer.out +++ b/src/test/regress/expected/join_gp_optimizer.out @@ -982,3 +982,86 @@ join t_randomly_dist_table on t_subquery_general.a = t_randomly_dist_table.c; reset enable_hashjoin; reset enable_mergejoin; reset enable_nestloop; +-- test lateral join inner plan contains limit +-- we cannot pass params across motion so we +-- can only generate a plan to gather all the +-- data to singleQE. Here we create a compound +-- data type as params to pass into inner plan. +-- By doing so, if we fail to pass correct params +-- into innerplan, it will throw error because +-- of nullpointer reference. If we only use int +-- type as params, the nullpointer reference error +-- may not happen because we parse null to integer 0. +create type mytype_for_lateral_test as (x int, y int); +create table t1_lateral_limit(a int, b int, c mytype_for_lateral_test); +create table t2_lateral_limit(a int, b int); +insert into t1_lateral_limit values (1, 1, '(1,1)'); +insert into t1_lateral_limit values (1, 2, '(2,2)'); +insert into t2_lateral_limit values (2, 2); +insert into t2_lateral_limit values (3, 3); +explain select * from t1_lateral_limit as t1 cross join lateral +(select ((c).x+t2.b) as n from t2_lateral_limit as t2 order by n limit 1)s; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------- + Nested Loop (cost=10000000001.05..10000000002.11 rows=4 width=41) + -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1.03 rows=1 width=37) + -> Seq Scan on t1_lateral_limit t1 (cost=0.00..1.01 rows=1 width=37) + -> Materialize (cost=1.05..1.07 rows=1 width=4) + -> Limit (cost=1.05..1.05 rows=1 width=4) + -> Sort (cost=1.05..1.05 rows=1 width=4) + Sort Key: (((t1.c).x + t2.b)) + -> Result (cost=0.00..1.04 rows=1 width=4) + -> Materialize (cost=0.00..1.03 rows=1 width=4) + -> Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..1.03 rows=1 width=4) + -> Seq Scan on t2_lateral_limit t2 (cost=0.00..1.01 rows=1 width=4) + Optimizer: Postgres query optimizer +(12 rows) + +select * from t1_lateral_limit as t1 cross join lateral +(select ((c).x+t2.b) as n from t2_lateral_limit as t2 order by n limit 1)s; + a | b | c | n +---+---+-------+--- + 1 | 1 | (1,1) | 3 + 1 | 2 | (2,2) | 4 +(2 rows) + +-- The following case is from Github Issue +-- https://github.com/greenplum-db/gpdb/issues/8860 +-- It is the same issue as the above test suite. +create table t_mylog_issue_8860 (myid int, log_date timestamptz ); +insert into t_mylog_issue_8860 values (1,timestamptz '2000-01-02 03:04'),(1,timestamptz '2000-01-02 03:04'-'1 hour'::interval); +insert into t_mylog_issue_8860 values (2,timestamptz '2000-01-02 03:04'),(2,timestamptz '2000-01-02 03:04'-'2 hour'::interval); +explain select ml1.myid, log_date as first_date, ml2.next_date from t_mylog_issue_8860 ml1 +inner join lateral +(select myid, log_date as next_date + from t_mylog_issue_8860 where myid = ml1.myid and log_date > ml1.log_date order by log_date asc limit 1) ml2 +on true; + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------- + Nested Loop (cost=10000000001.08..10000000002.18 rows=4 width=20) + -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1.06 rows=2 width=12) + -> Seq Scan on t_mylog_issue_8860 ml1 (cost=0.00..1.02 rows=1 width=12) + -> Materialize (cost=1.08..1.10 rows=1 width=8) + -> Subquery Scan on ml2 (cost=1.08..1.09 rows=1 width=8) + -> Limit (cost=1.08..1.08 rows=1 width=12) + -> Sort (cost=1.08..1.08 rows=2 width=12) + Sort Key: t_mylog_issue_8860.log_date + -> Result (cost=0.00..1.07 rows=2 width=12) + Filter: ((t_mylog_issue_8860.log_date > ml1.log_date) AND (t_mylog_issue_8860.myid = ml1.myid)) + -> Materialize (cost=0.00..1.07 rows=1 width=12) + -> Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..1.06 rows=2 width=12) + -> Seq Scan on t_mylog_issue_8860 (cost=0.00..1.02 rows=1 width=12) + Optimizer: Postgres query optimizer +(14 rows) + +select ml1.myid, log_date as first_date, ml2.next_date from t_mylog_issue_8860 ml1 +inner join lateral +(select myid, log_date as next_date + from t_mylog_issue_8860 where myid = ml1.myid and log_date > ml1.log_date order by log_date asc limit 1) ml2 +on true; + myid | first_date | next_date +------+------------------------------+------------------------------ + 2 | Sun Jan 02 01:04:00 2000 PST | Sun Jan 02 03:04:00 2000 PST + 1 | Sun Jan 02 02:04:00 2000 PST | Sun Jan 02 03:04:00 2000 PST +(2 rows) + diff --git a/src/test/regress/sql/join_gp.sql b/src/test/regress/sql/join_gp.sql index 364d3834b708..f13cc14c24f9 100644 --- a/src/test/regress/sql/join_gp.sql +++ b/src/test/regress/sql/join_gp.sql @@ -471,3 +471,47 @@ join t_randomly_dist_table on t_subquery_general.a = t_randomly_dist_table.c; reset enable_hashjoin; reset enable_mergejoin; reset enable_nestloop; + +-- test lateral join inner plan contains limit +-- we cannot pass params across motion so we +-- can only generate a plan to gather all the +-- data to singleQE. Here we create a compound +-- data type as params to pass into inner plan. +-- By doing so, if we fail to pass correct params +-- into innerplan, it will throw error because +-- of nullpointer reference. If we only use int +-- type as params, the nullpointer reference error +-- may not happen because we parse null to integer 0. + +create type mytype_for_lateral_test as (x int, y int); +create table t1_lateral_limit(a int, b int, c mytype_for_lateral_test); +create table t2_lateral_limit(a int, b int); +insert into t1_lateral_limit values (1, 1, '(1,1)'); +insert into t1_lateral_limit values (1, 2, '(2,2)'); +insert into t2_lateral_limit values (2, 2); +insert into t2_lateral_limit values (3, 3); + +explain select * from t1_lateral_limit as t1 cross join lateral +(select ((c).x+t2.b) as n from t2_lateral_limit as t2 order by n limit 1)s; + +select * from t1_lateral_limit as t1 cross join lateral +(select ((c).x+t2.b) as n from t2_lateral_limit as t2 order by n limit 1)s; + +-- The following case is from Github Issue +-- https://github.com/greenplum-db/gpdb/issues/8860 +-- It is the same issue as the above test suite. +create table t_mylog_issue_8860 (myid int, log_date timestamptz ); +insert into t_mylog_issue_8860 values (1,timestamptz '2000-01-02 03:04'),(1,timestamptz '2000-01-02 03:04'-'1 hour'::interval); +insert into t_mylog_issue_8860 values (2,timestamptz '2000-01-02 03:04'),(2,timestamptz '2000-01-02 03:04'-'2 hour'::interval); + +explain select ml1.myid, log_date as first_date, ml2.next_date from t_mylog_issue_8860 ml1 +inner join lateral +(select myid, log_date as next_date + from t_mylog_issue_8860 where myid = ml1.myid and log_date > ml1.log_date order by log_date asc limit 1) ml2 +on true; + +select ml1.myid, log_date as first_date, ml2.next_date from t_mylog_issue_8860 ml1 +inner join lateral +(select myid, log_date as next_date + from t_mylog_issue_8860 where myid = ml1.myid and log_date > ml1.log_date order by log_date asc limit 1) ml2 +on true; From 7d272d75a4403b867c5d1b6153a5388ee1f9e489 Mon Sep 17 00:00:00 2001 From: Abhijit Subramanya Date: Fri, 28 Feb 2020 15:17:58 -0800 Subject: [PATCH 064/102] Fix the way we check if a partition table is empty in Relcache translator The relation statistics object stores a flag which indicates if a relation is empty or not. However this was being set based on the value of stats_empty flag. This flag was being set in the function `cdb_estimate_partitioned_numtuples`. It would be set to true if any of the child partitions had their relpages set to 0. But if other partitions had data in them, then the relation should not be considered empty. We should actually be checking for the total number of rows to determine if a partition table is empty or not. Also the stats_empty flag was not being used for anything else so this commit removes it. --- src/backend/gpopt/gpdbwrappers.cpp | 5 +- .../translate/CTranslatorRelcacheToDXL.cpp | 15 +++--- src/backend/optimizer/util/plancat.c | 19 +------ src/include/gpopt/gpdbwrappers.h | 2 +- src/include/optimizer/plancat.h | 2 +- .../regress/expected/AOCO_Compression.out | 4 ++ src/test/regress/expected/alter_table_ao.out | 1 + src/test/regress/expected/gporca.out | 51 +++++++++++++++++++ .../regress/expected/gporca_optimizer.out | 44 ++++++++++++++++ src/test/regress/expected/oid_consistency.out | 2 + src/test/regress/expected/partition.out | 12 +++++ .../regress/expected/partition_optimizer.out | 12 +++++ .../regress/expected/partition_pruning.out | 27 ++++------ .../expected/partition_pruning_optimizer.out | 1 + .../regress/expected/portals_updatable.out | 1 + .../expected/portals_updatable_optimizer.out | 1 + src/test/regress/expected/qp_dropped_cols.out | 18 +++++++ .../expected/qp_dropped_cols_optimizer.out | 18 +++++++ .../input/uao_ddl/alter_ao_part_exch.source | 2 + .../output/uao_ddl/alter_ao_part_exch.source | 1 + src/test/regress/sql/AOCO_Compression.sql | 8 +-- src/test/regress/sql/alter_table_ao.sql | 2 + src/test/regress/sql/gporca.sql | 15 ++++++ src/test/regress/sql/oid_consistency.sql | 4 +- src/test/regress/sql/partition.sql | 12 +++++ src/test/regress/sql/partition_pruning.sql | 1 + src/test/regress/sql/portals_updatable.sql | 1 + src/test/regress/sql/qp_dropped_cols.sql | 18 +++++++ 28 files changed, 249 insertions(+), 50 deletions(-) diff --git a/src/backend/gpopt/gpdbwrappers.cpp b/src/backend/gpopt/gpdbwrappers.cpp index e30ce4ef1751..adda5aad9f60 100644 --- a/src/backend/gpopt/gpdbwrappers.cpp +++ b/src/backend/gpopt/gpdbwrappers.cpp @@ -2452,13 +2452,12 @@ gpdb::CdbEstimateRelationSize double gpdb::CdbEstimatePartitionedNumTuples ( - Relation rel, - bool *stats_missing_p + Relation rel ) { GP_WRAP_START; { - return cdb_estimate_partitioned_numtuples(rel, stats_missing_p); + return cdb_estimate_partitioned_numtuples(rel); } GP_WRAP_END; } diff --git a/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp b/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp index dea900a8c7fa..759961a12449 100644 --- a/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp +++ b/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp @@ -2276,7 +2276,6 @@ CTranslatorRelcacheToDXL::RetrieveRelStats double num_rows = 0.0; CMDName *mdname = NULL; - BOOL stats_empty = false; GPOS_TRY { @@ -2287,7 +2286,7 @@ CTranslatorRelcacheToDXL::RetrieveRelStats // CMDName ctor created a copy of the string GPOS_DELETE(relname_str); - num_rows = gpdb::CdbEstimatePartitionedNumTuples(rel, &stats_empty); + num_rows = gpdb::CdbEstimatePartitionedNumTuples(rel); m_rel_stats_mdid->AddRef(); gpdb::CloseRelation(rel); @@ -2299,9 +2298,14 @@ CTranslatorRelcacheToDXL::RetrieveRelStats } GPOS_CATCH_END; + /* + * relation_empty should be set to true only if the total row + * count of the partition table is 0. + */ + BOOL relation_empty = false; if (num_rows == 0.0) { - stats_empty = true; + relation_empty = true; } CDXLRelStats *dxl_rel_stats = GPOS_NEW(mp) CDXLRelStats @@ -2310,7 +2314,7 @@ CTranslatorRelcacheToDXL::RetrieveRelStats m_rel_stats_mdid, mdname, CDouble(num_rows), - stats_empty + relation_empty ); @@ -2347,9 +2351,8 @@ CTranslatorRelcacheToDXL::RetrieveColStats // number of rows from pg_class double num_rows; - bool stats_empty; - num_rows = gpdb::CdbEstimatePartitionedNumTuples(rel, &stats_empty); + num_rows = gpdb::CdbEstimatePartitionedNumTuples(rel); // extract column name and type CMDName *md_colname = GPOS_NEW(mp) CMDName(mp, md_col->Mdname().GetMDName()); diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 4865039712cc..06b99ce68d39 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -578,13 +578,12 @@ cdb_estimate_rel_size(RelOptInfo *relOptInfo, * analyzed, this returns 0 rather than the default constant estimate. */ double -cdb_estimate_partitioned_numtuples(Relation rel, bool *stats_missing) +cdb_estimate_partitioned_numtuples(Relation rel) { List *inheritors; ListCell *lc; double totaltuples; - *stats_missing = false; if (rel->rd_rel->reltuples > 0) return rel->rd_rel->reltuples; @@ -606,22 +605,6 @@ cdb_estimate_partitioned_numtuples(Relation rel, bool *stats_missing) childtuples = childrel->rd_rel->reltuples; - /* - * relpages == 0 means stats are missing. There's a special - * case in ANALYZE/VACUUM, to set relpages to 1 even if the - * table is completely empty, so relpages is zero only if the - * table hasn't been analyzed yet. - */ - if (childrel->rd_rel->relpages == 0) - { - /* - * In the root partition of a partitioned table, though, - * it's expected. - */ - if (childrel != rel) - *stats_missing = true; - } - if (gp_enable_relsize_collection && childtuples == 0) { RelOptInfo *dummy_reloptinfo; diff --git a/src/include/gpopt/gpdbwrappers.h b/src/include/gpopt/gpdbwrappers.h index 540f2a6af5c5..62ef8efcf711 100644 --- a/src/include/gpopt/gpdbwrappers.h +++ b/src/include/gpopt/gpdbwrappers.h @@ -526,7 +526,7 @@ namespace gpdb { // estimate the relation size using the real number of blocks and tuple density void CdbEstimateRelationSize (RelOptInfo *relOptInfo, Relation rel, int32 *attr_widths, BlockNumber *pages, double *tuples, double *allvisfrac); - double CdbEstimatePartitionedNumTuples (Relation rel, bool *stats_missing); + double CdbEstimatePartitionedNumTuples (Relation rel); // close the given relation void CloseRelation(Relation rel); diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h index 396f099d3b84..691735af2644 100644 --- a/src/include/optimizer/plancat.h +++ b/src/include/optimizer/plancat.h @@ -39,7 +39,7 @@ extern void cdb_estimate_rel_size(RelOptInfo *relOptInfo, BlockNumber *pages, double *tuples, double *allvisfrac); -extern double cdb_estimate_partitioned_numtuples(Relation rel, bool *stats_missing); +extern double cdb_estimate_partitioned_numtuples(Relation rel); extern int32 get_relation_data_width(Oid relid, int32 *attr_widths); diff --git a/src/test/regress/expected/AOCO_Compression.out b/src/test/regress/expected/AOCO_Compression.out index 3bfeee785815..4446c92f03c6 100644 --- a/src/test/regress/expected/AOCO_Compression.out +++ b/src/test/regress/expected/AOCO_Compression.out @@ -216,6 +216,7 @@ TRUNCATE table co_crtb_with_strg_dir_and_col_ref_1; -- Insert data again -- insert into co_crtb_with_strg_dir_and_col_ref_1 select * from co_crtb_with_strg_dir_and_col_ref_1_uncompr order by a1; +analyze co_crtb_with_strg_dir_and_col_ref_1; -- -- Select the data: Using the JOIN as mentioned above -- @@ -351,6 +352,7 @@ NOTICE: building index for child partition "co_cr_sub_partzlib8192_1_1_prt_5_2_ -- INSERT INTO co_cr_sub_partzlib8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); INSERT INTO co_cr_sub_partzlib8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(500,510),'F',2010,'f','b','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','2001-12-25 02:22:11','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child',generate_series(2500,2516),'2011-10-12','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The type integer is the usual choice, as it offers the best balance between range, storage size, and performance The type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer ','1134.26',311353,generate_series(3982,3992),7885,'0101','2002-02-12 01:31:14+1344','2003-11-14 01:41:15','((1,1),(0,1),(1,1))','((2,1)(1,5))','08:00:2b:01:01:03','1-3','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an errorThe types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges.','((6,5)(4,2))','(3,6)',12.233,'((5,4),2)',12,3114,'(1,1,0,3)','2010-03-21',43164,'$1,500.00','192.167.2','126.1.1.1','10:30:55','Parents and other family members are always welcome at Stratford. After the first two weeks ofschool','0','1',33,44); +ANALYZE co_cr_sub_partzlib8192_1; --Create Uncompressed table of same schema definition CREATE TABLE co_cr_sub_partzlib8192_1_uncompr(id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int) WITH (appendonly=true, orientation=column) distributed randomly Partition by range(a1) Subpartition by list(a2) subpartition template ( subpartition sp1 values('M') , subpartition sp2 values('F') ) (start(1) end(5000) every(1000)) ; NOTICE: CREATE TABLE will create partition "co_cr_sub_partzlib8192_1_uncompr_1_prt_1" for table "co_cr_sub_partzlib8192_1_uncompr" @@ -1364,6 +1366,7 @@ NOTICE: building index for child partition "co_wt_sub_partrle_type8192_1_1_prt_ -- INSERT INTO co_wt_sub_partrle_type8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); INSERT INTO co_wt_sub_partrle_type8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(500,510),'F',2010,'f','b','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','2001-12-25 02:22:11','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child',generate_series(2500,2516),'2011-10-12','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The type integer is the usual choice, as it offers the best balance between range, storage size, and performance The type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer ','1134.26',311353,generate_series(3982,3992),7885,'0101','2002-02-12 01:31:14+1344','2003-11-14 01:41:15','((1,1),(0,1),(1,1))','((2,1)(1,5))','08:00:2b:01:01:03','1-3','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an errorThe types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges.','((6,5)(4,2))','(3,6)',12.233,'((5,4),2)',12,3114,'(1,1,0,3)','2010-03-21',43164,'$1,500.00','192.167.2','126.1.1.1','10:30:55','Parents and other family members are always welcome at Stratford. After the first two weeks ofschool','0','1',33,44); +ANALYZE co_wt_sub_partrle_type8192_1; --Create Uncompressed table of same schema definition CREATE TABLE co_wt_sub_partrle_type8192_1_uncompr(id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int) WITH (appendonly=true, orientation=column) distributed randomly Partition by range(a1) Subpartition by list(a2) subpartition template ( subpartition sp1 values('M') , subpartition sp2 values('F') ) (start(1) end(5000) every(1000)) ; NOTICE: CREATE TABLE will create partition "co_wt_sub_partrle_type8192_1_uncompr_1_prt_1" for table "co_wt_sub_partrle_type8192_1_uncompr" @@ -2368,6 +2371,7 @@ NOTICE: building index for child partition "ao_wt_sub_partzlib8192_5_1_prt_5_2_ -- INSERT INTO ao_wt_sub_partzlib8192_5(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); INSERT INTO ao_wt_sub_partzlib8192_5(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(500,510),'F',2010,'f','b','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','2001-12-25 02:22:11','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child',generate_series(2500,2516),'2011-10-12','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The type integer is the usual choice, as it offers the best balance between range, storage size, and performance The type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer ','1134.26',311353,generate_series(3982,3992),7885,'0101','2002-02-12 01:31:14+1344','2003-11-14 01:41:15','((1,1),(0,1),(1,1))','((2,1)(1,5))','08:00:2b:01:01:03','1-3','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an errorThe types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges.','((6,5)(4,2))','(3,6)',12.233,'((5,4),2)',12,3114,'(1,1,0,3)','2010-03-21',43164,'$1,500.00','192.167.2','126.1.1.1','10:30:55','Parents and other family members are always welcome at Stratford. After the first two weeks ofschool','0','1',33,44); +ANALYZE ao_wt_sub_partzlib8192_5; --Create Uncompressed table of same schema definition CREATE TABLE ao_wt_sub_partzlib8192_5_uncompr(id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int) WITH (appendonly=true, orientation=row) distributed randomly Partition by range(a1) Subpartition by list(a2) subpartition template ( subpartition sp1 values('M') , subpartition sp2 values('F') ) (start(1) end(5000) every(1000)) ; NOTICE: CREATE TABLE will create partition "ao_wt_sub_partzlib8192_5_uncompr_1_prt_1" for table "ao_wt_sub_partzlib8192_5_uncompr" diff --git a/src/test/regress/expected/alter_table_ao.out b/src/test/regress/expected/alter_table_ao.out index a733968fb1ff..d58c333d350f 100644 --- a/src/test/regress/expected/alter_table_ao.out +++ b/src/test/regress/expected/alter_table_ao.out @@ -382,6 +382,7 @@ NOTICE: building index for child partition "testbug_char5_1_prt_part201205" insert into testbug_char5 (timest,user_id,to_be_drop) select '201203',1111,'10000'; insert into testbug_char5 (timest,user_id,to_be_drop) select '201204',1111,'10000'; insert into testbug_char5 (timest,user_id,to_be_drop) select '201205',1111,'10000'; +analyze testbug_char5; select * from testbug_char5 order by 1,2; timest | user_id | to_be_drop | tag1 | tag2 --------+---------+------------+------+------ diff --git a/src/test/regress/expected/gporca.out b/src/test/regress/expected/gporca.out index b4dcc51ab35b..e6b94fe41c71 100644 --- a/src/test/regress/expected/gporca.out +++ b/src/test/regress/expected/gporca.out @@ -11898,3 +11898,54 @@ select count(*) as expect_20 from noexp_hash h join gpexp_repl r on h.a=r.a; 20 (1 row) +create table part1(a int, b int) partition by range(b) (start(1) end(5) every(1)); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: CREATE TABLE will create partition "part1_1_prt_1" for table "part1" +NOTICE: CREATE TABLE will create partition "part1_1_prt_2" for table "part1" +NOTICE: CREATE TABLE will create partition "part1_1_prt_3" for table "part1" +NOTICE: CREATE TABLE will create partition "part1_1_prt_4" for table "part1" +create table part2(a int, b int) partition by range(b) (start(1) end(5) every(1)); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: CREATE TABLE will create partition "part2_1_prt_1" for table "part2" +NOTICE: CREATE TABLE will create partition "part2_1_prt_2" for table "part2" +NOTICE: CREATE TABLE will create partition "part2_1_prt_3" for table "part2" +NOTICE: CREATE TABLE will create partition "part2_1_prt_4" for table "part2" +insert into part1 select i, (i % 2) + 1 from generate_series(1, 1000) i; +insert into part2 select i, (i % 2) + 1 from generate_series(1, 100) i; +-- make sure some child partitions have not been analyzed. This just means that +-- stats are missing for some child partition but not necessarily that the relation +-- is empty. So we should not flag this as an empty relation +analyze part1_1_prt_1; +analyze part1_1_prt_2; +analyze part2_1_prt_1; +analyze part2_1_prt_2; +-- the plan should contain a 2 stage limit. If we incorrectly estimate that the +-- relation is empty, we would end up choosing a single stage limit. +explain select * from part1, part2 where part1.b = part2.b limit 5; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------- + Limit (cost=410984.11..410984.89 rows=5 width=16) + -> Gather Motion 3:1 (slice1; segments: 3) (cost=410984.11..410984.89 rows=5 width=16) + -> Limit (cost=410984.11..410984.79 rows=2 width=16) + -> Hash Join (cost=7528.75..4042082.35 rows=9947454 width=16) + Hash Cond: (part1_1_prt_1.b = part2_1_prt_1.b) + -> Redistribute Motion 3:3 (slice2; segments: 3) (cost=0.00..5402.00 rows=57734 width=8) + Hash Key: part1_1_prt_1.b + -> Append (cost=0.00..1938.00 rows=57734 width=8) + -> Seq Scan on part1_1_prt_1 (cost=0.00..8.00 rows=167 width=8) + -> Seq Scan on part1_1_prt_2 (cost=0.00..8.00 rows=167 width=8) + -> Seq Scan on part1_1_prt_3 (cost=0.00..961.00 rows=28700 width=8) + -> Seq Scan on part1_1_prt_4 (cost=0.00..961.00 rows=28700 width=8) + -> Hash (cost=5375.00..5375.00 rows=57434 width=8) + -> Redistribute Motion 3:3 (slice3; segments: 3) (cost=0.00..5375.00 rows=57434 width=8) + Hash Key: part2_1_prt_1.b + -> Append (cost=0.00..1929.00 rows=57434 width=8) + -> Seq Scan on part2_1_prt_1 (cost=0.00..3.50 rows=17 width=8) + -> Seq Scan on part2_1_prt_2 (cost=0.00..3.50 rows=17 width=8) + -> Seq Scan on part2_1_prt_3 (cost=0.00..961.00 rows=28700 width=8) + -> Seq Scan on part2_1_prt_4 (cost=0.00..961.00 rows=28700 width=8) + Optimizer: Postgres query optimizer +(21 rows) + diff --git a/src/test/regress/expected/gporca_optimizer.out b/src/test/regress/expected/gporca_optimizer.out index a1dc34c00025..b39b54bced1c 100644 --- a/src/test/regress/expected/gporca_optimizer.out +++ b/src/test/regress/expected/gporca_optimizer.out @@ -12035,3 +12035,47 @@ DETAIL: Unknown error: Partially Distributed Data 20 (1 row) +create table part1(a int, b int) partition by range(b) (start(1) end(5) every(1)); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: CREATE TABLE will create partition "part1_1_prt_1" for table "part1" +NOTICE: CREATE TABLE will create partition "part1_1_prt_2" for table "part1" +NOTICE: CREATE TABLE will create partition "part1_1_prt_3" for table "part1" +NOTICE: CREATE TABLE will create partition "part1_1_prt_4" for table "part1" +create table part2(a int, b int) partition by range(b) (start(1) end(5) every(1)); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +NOTICE: CREATE TABLE will create partition "part2_1_prt_1" for table "part2" +NOTICE: CREATE TABLE will create partition "part2_1_prt_2" for table "part2" +NOTICE: CREATE TABLE will create partition "part2_1_prt_3" for table "part2" +NOTICE: CREATE TABLE will create partition "part2_1_prt_4" for table "part2" +insert into part1 select i, (i % 2) + 1 from generate_series(1, 1000) i; +insert into part2 select i, (i % 2) + 1 from generate_series(1, 100) i; +-- make sure some child partitions have not been analyzed. This just means that +-- stats are missing for some child partition but not necessarily that the relation +-- is empty. So we should not flag this as an empty relation +analyze part1_1_prt_1; +analyze part1_1_prt_2; +analyze part2_1_prt_1; +analyze part2_1_prt_2; +-- the plan should contain a 2 stage limit. If we incorrectly estimate that the +-- relation is empty, we would end up choosing a single stage limit. +explain select * from part1, part2 where part1.b = part2.b limit 5; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------- + Limit (cost=0.00..862.14 rows=5 width=16) + -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..862.14 rows=5 width=16) + -> Limit (cost=0.00..862.13 rows=2 width=16) + -> Hash Join (cost=0.00..862.13 rows=334 width=16) + Hash Cond: (part1.b = part2.b) + -> Dynamic Seq Scan on part1 (dynamic scan id: 1) (cost=0.00..431.01 rows=334 width=8) + -> Hash (cost=100.00..100.00 rows=34 width=4) + -> Partition Selector for part1 (dynamic scan id: 1) (cost=10.00..100.00 rows=34 width=4) + -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..431.02 rows=100 width=8) + -> Sequence (cost=0.00..431.00 rows=34 width=8) + -> Partition Selector for part2 (dynamic scan id: 2) (cost=10.00..100.00 rows=34 width=4) + Partitions selected: 4 (out of 4) + -> Dynamic Seq Scan on part2 (dynamic scan id: 2) (cost=0.00..431.00 rows=34 width=8) + Optimizer: Pivotal Optimizer (GPORCA) version 3.93.0 +(13 rows) + diff --git a/src/test/regress/expected/oid_consistency.out b/src/test/regress/expected/oid_consistency.out index 043195ca2e10..067aef7b60ea 100644 --- a/src/test/regress/expected/oid_consistency.out +++ b/src/test/regress/expected/oid_consistency.out @@ -259,6 +259,7 @@ INSERT INTO constraint_pt1 SELECT i, '2008-01-13', i FROM generate_series(1,5)i; INSERT INTO constraint_pt1 SELECT i, '2008-02-13', i FROM generate_series(1,5)i; INSERT INTO constraint_pt1 SELECT i, '2008-03-13', i FROM generate_series(1,5)i; INSERT INTO constraint_t1 SELECT i, '2008-02-02', i FROM generate_series(11,15)i; +ANALYZE constraint_pt1; ALTER TABLE constraint_pt1 EXCHANGE PARTITION Feb08 WITH TABLE constraint_t1; select verify('constraint_pt1_1_prt_feb08'); verify @@ -356,6 +357,7 @@ NOTICE: exchanged partition "jan08" of relation "constraint_pt2" with relation NOTICE: dropped partition "jan08" for relation "constraint_pt2" NOTICE: CREATE TABLE will create partition "constraint_pt2_1_prt_jan08_15" for table "constraint_pt2" NOTICE: CREATE TABLE will create partition "constraint_pt2_1_prt_jan08_31" for table "constraint_pt2" +ANALYZE constraint_pt2; select verify('constraint_pt2_1_prt_feb08'); verify -------- diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out index f3123aa492d3..235f1c027a85 100755 --- a/src/test/regress/expected/partition.out +++ b/src/test/regress/expected/partition.out @@ -447,6 +447,7 @@ alter table foo_p exchange partition for(rank(6)) with table bar_p; ERROR: exchange table contains a row which violates the partitioning specification of "foo_p" (seg1 slarimac:40001 pid=97876) alter table foo_p exchange partition for(rank(6)) with table bar_p without validation; +analyze foo_p; select * from foo_p; i ----- @@ -471,6 +472,7 @@ NOTICE: CREATE TABLE will create partition "foo_p_1_prt_9" for table "foo_p" create table bar_p(i int, j int) distributed by (i); insert into bar_p values(6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -544,6 +546,7 @@ create table bar_p(i int, j int) distributed by (i); insert into foo_p values(1, 1), (2, 1), (3, 1); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -572,6 +575,7 @@ create table bar_p(i int, j int) with(appendonly = true) distributed by (i); insert into foo_p values(1, 1), (2, 1), (3, 2); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -600,6 +604,7 @@ create table bar_p(i int, j int) with(appendonly = true) distributed by (i); insert into foo_p values(1, 2), (2, 3), (3, 4); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -627,6 +632,7 @@ NOTICE: CREATE TABLE will create partition "foo_p_1_prt_9" for table "foo_p" create table bar_p(i int, j int) distributed by (i); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -7234,6 +7240,7 @@ select * from part_tab_1_prt_2; -- Right part insert into part_tab_1_prt_3 values(5,5); +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7271,6 +7278,7 @@ insert into input2 select i, i from (select generate_series(1,10) as i) as t; insert into part_tab_1_prt_1 select i1.x, i2.y from input1 as i1 join input2 as i2 on i1.x = i2.x where i2.y = 5; ERROR: trying to insert row into wrong partition (seg0 slarimac:40000 pid=97899) DETAIL: Expected partition: part_tab_1_prt_3, provided partition: part_tab_1_prt_1. +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7435,6 +7443,7 @@ select * from deep_part; -- Correct leaf part insert into deep_part_1_prt_male_2_prt_1_3_prt_1 values (1, 1, 1, 'M'); +analyze deep_part; select * from deep_part; i | j | k | s ---+---+---+------- @@ -7535,6 +7544,7 @@ NOTICE: CREATE TABLE will create partition "part_tab_1_prt_4" for table "part_t NOTICE: CREATE TABLE will create partition "part_tab_1_prt_5" for table "part_tab" -- Wrong part insert into part_tab_1_prt_1 values(5,5); +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7600,6 +7610,7 @@ HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sur insert into input2 select i, i from (select generate_series(1,10) as i) as t; -- Multiple range table entries in the plan insert into part_tab_1_prt_1 select i1.x, i2.y from input1 as i1 join input2 as i2 on i1.x = i2.x where i2.y = 5; +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7785,6 +7796,7 @@ select * from deep_part; -- Correct leaf part insert into deep_part_1_prt_male_2_prt_1_3_prt_1 values (1, 1, 1, 'M'); +analyze deep_part; select * from deep_part; i | j | k | s ---+---+---+------- diff --git a/src/test/regress/expected/partition_optimizer.out b/src/test/regress/expected/partition_optimizer.out index de723ba0510f..58bc2acf3829 100755 --- a/src/test/regress/expected/partition_optimizer.out +++ b/src/test/regress/expected/partition_optimizer.out @@ -451,6 +451,7 @@ alter table foo_p exchange partition for(rank(6)) with table bar_p; ERROR: exchange table contains a row which violates the partitioning specification of "foo_p" (seg1 slarimac:40001 pid=97876) alter table foo_p exchange partition for(rank(6)) with table bar_p without validation; +analyze foo_p; select * from foo_p; i ----- @@ -475,6 +476,7 @@ NOTICE: CREATE TABLE will create partition "foo_p_1_prt_9" for table "foo_p" create table bar_p(i int, j int) distributed by (i); insert into bar_p values(6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -548,6 +550,7 @@ create table bar_p(i int, j int) distributed by (i); insert into foo_p values(1, 1), (2, 1), (3, 1); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -576,6 +579,7 @@ create table bar_p(i int, j int) with(appendonly = true) distributed by (i); insert into foo_p values(1, 1), (2, 1), (3, 2); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -604,6 +608,7 @@ create table bar_p(i int, j int) with(appendonly = true) distributed by (i); insert into foo_p values(1, 2), (2, 3), (3, 4); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -631,6 +636,7 @@ NOTICE: CREATE TABLE will create partition "foo_p_1_prt_9" for table "foo_p" create table bar_p(i int, j int) distributed by (i); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; i | j ---+--- @@ -7242,6 +7248,7 @@ select * from part_tab_1_prt_2; -- Right part insert into part_tab_1_prt_3 values(5,5); +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7279,6 +7286,7 @@ insert into input2 select i, i from (select generate_series(1,10) as i) as t; insert into part_tab_1_prt_1 select i1.x, i2.y from input1 as i1 join input2 as i2 on i1.x = i2.x where i2.y = 5; ERROR: trying to insert row into wrong partition (seg0 slarimac:40000 pid=97899) DETAIL: Expected partition: part_tab_1_prt_3, provided partition: part_tab_1_prt_1. +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7443,6 +7451,7 @@ select * from deep_part; -- Correct leaf part insert into deep_part_1_prt_male_2_prt_1_3_prt_1 values (1, 1, 1, 'M'); +analyze deep_part; select * from deep_part; i | j | k | s ---+---+---+------- @@ -7543,6 +7552,7 @@ NOTICE: CREATE TABLE will create partition "part_tab_1_prt_4" for table "part_t NOTICE: CREATE TABLE will create partition "part_tab_1_prt_5" for table "part_tab" -- Wrong part insert into part_tab_1_prt_1 values(5,5); +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7608,6 +7618,7 @@ HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sur insert into input2 select i, i from (select generate_series(1,10) as i) as t; -- Multiple range table entries in the plan insert into part_tab_1_prt_1 select i1.x, i2.y from input1 as i1 join input2 as i2 on i1.x = i2.x where i2.y = 5; +analyze part_tab; select * from part_tab; i | j ---+--- @@ -7793,6 +7804,7 @@ select * from deep_part; -- Correct leaf part insert into deep_part_1_prt_male_2_prt_1_3_prt_1 values (1, 1, 1, 'M'); +analyze deep_part; select * from deep_part; i | j | k | s ---+---+---+------- diff --git a/src/test/regress/expected/partition_pruning.out b/src/test/regress/expected/partition_pruning.out index 73cb66482973..2f52540897ab 100644 --- a/src/test/regress/expected/partition_pruning.out +++ b/src/test/regress/expected/partition_pruning.out @@ -3178,6 +3178,7 @@ ALTER TABLE sales ALTER PARTITION FOR (RANK(1)) EXCHANGE PARTITION FOR ('usa') WITH TABLE sales_exchange_part ; NOTICE: exchanged partition "usa" of partition for rank 1 of relation "sales" with relation "sales_exchange_part" +ANALYZE sales; -- TODO: #141973839. Expected 10 parts, currently selecting 15 parts. First level: 4 parts + 1 default. Second level 2 parts. Total 10 parts. select get_selected_parts('explain analyze select * from sales where region = ''usa'' or region = ''asia'';'); get_selected_parts @@ -3219,25 +3220,19 @@ NOTICE: building index for child partition "sales_1_prt_5_2_prt_asia" NOTICE: building index for child partition "sales_1_prt_5_2_prt_europe" NOTICE: building index for child partition "sales_1_prt_5_2_prt_other_regions" explain select * from sales where date = '2011-01-01' and region = 'usa'; - QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- - Gather Motion 3:1 (slice1; segments: 3) (cost=100.46..4773.02 rows=5 width=47) - -> Append (cost=100.46..4773.02 rows=2 width=47) - -> Bitmap Heap Scan on sales_1_prt_outlying_dates_2_prt_usa (cost=100.46..1524.29 rows=1 width=54) - Recheck Cond: (date = '01-01-2011'::date) + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) (cost=0.12..800.58 rows=4 width=45) + -> Append (cost=0.12..800.58 rows=2 width=45) + -> Index Scan using sales_1_prt_outlying_dates_2_prt_usa_date_idx on sales_1_prt_outlying_dates_2_prt_usa (cost=0.12..200.14 rows=1 width=54) + Index Cond: (date = '01-01-2011'::date) Filter: (region = 'usa'::text) - -> Bitmap Index Scan on sales_1_prt_outlying_dates_2_prt_usa_date_idx (cost=0.00..100.46 rows=13 width=0) - Index Cond: (date = '01-01-2011'::date) - -> Bitmap Heap Scan on sales_1_prt_outlying_dates_2_prt_other_regions (cost=100.46..1524.29 rows=1 width=54) - Recheck Cond: (date = '01-01-2011'::date) + -> Index Scan using sales_1_prt_outlying_dates_2_prt_other_regions_date_idx on sales_1_prt_outlying_dates_2_prt_other_regions (cost=0.12..200.14 rows=1 width=54) + Index Cond: (date = '01-01-2011'::date) Filter: (region = 'usa'::text) - -> Bitmap Index Scan on sales_1_prt_outlying_dates_2_prt_other_regions_date_idx (cost=0.00..100.46 rows=13 width=0) - Index Cond: (date = '01-01-2011'::date) - -> Bitmap Heap Scan on sales_1_prt_2_2_prt_other_regions (cost=100.46..1524.29 rows=1 width=54) - Recheck Cond: (date = '01-01-2011'::date) + -> Index Scan using sales_1_prt_2_2_prt_other_regions_date_idx on sales_1_prt_2_2_prt_other_regions (cost=0.12..200.14 rows=1 width=54) + Index Cond: (date = '01-01-2011'::date) Filter: (region = 'usa'::text) - -> Bitmap Index Scan on sales_1_prt_2_2_prt_other_regions_date_idx (cost=0.00..100.46 rows=13 width=0) - Index Cond: (date = '01-01-2011'::date) -> Index Scan using sales_1_prt_2_2_prt_usa_date_idx on sales_1_prt_2_2_prt_usa (cost=0.12..200.14 rows=1 width=19) Index Cond: (date = '01-01-2011'::date) Filter: (region = 'usa'::text) diff --git a/src/test/regress/expected/partition_pruning_optimizer.out b/src/test/regress/expected/partition_pruning_optimizer.out index 2e61a47be484..0feb4f5b04e9 100644 --- a/src/test/regress/expected/partition_pruning_optimizer.out +++ b/src/test/regress/expected/partition_pruning_optimizer.out @@ -2787,6 +2787,7 @@ ALTER TABLE sales ALTER PARTITION FOR (RANK(1)) EXCHANGE PARTITION FOR ('usa') WITH TABLE sales_exchange_part ; NOTICE: exchanged partition "usa" of partition for rank 1 of relation "sales" with relation "sales_exchange_part" +ANALYZE sales; -- TODO: #141973839. Expected 10 parts, currently selecting 15 parts. First level: 4 parts + 1 default. Second level 2 parts. Total 10 parts. select get_selected_parts('explain analyze select * from sales where region = ''usa'' or region = ''asia'';'); get_selected_parts diff --git a/src/test/regress/expected/portals_updatable.out b/src/test/regress/expected/portals_updatable.out index 27a9729cbabe..bcb829e06521 100644 --- a/src/test/regress/expected/portals_updatable.out +++ b/src/test/regress/expected/portals_updatable.out @@ -475,6 +475,7 @@ DROP TABLE aopart; CREATE TABLE aopart (LIKE portals_updatable_rank) WITH (appendonly=true) DISTRIBUTED BY (id); INSERT INTO aopart SELECT * FROM portals_updatable_rank_1_prt_11; ALTER TABLE portals_updatable_rank EXCHANGE PARTITION FOR (9) WITH TABLE aopart; +ANALYZE portals_updatable_rank; BEGIN; DECLARE c CURSOR FOR SELECT * FROM portals_updatable_rank WHERE rank = 10; -- isolate the remaining heap part FETCH 1 FROM c; diff --git a/src/test/regress/expected/portals_updatable_optimizer.out b/src/test/regress/expected/portals_updatable_optimizer.out index 9f8bbe52029d..ed501d20fc99 100644 --- a/src/test/regress/expected/portals_updatable_optimizer.out +++ b/src/test/regress/expected/portals_updatable_optimizer.out @@ -475,6 +475,7 @@ DROP TABLE aopart; CREATE TABLE aopart (LIKE portals_updatable_rank) WITH (appendonly=true) DISTRIBUTED BY (id); INSERT INTO aopart SELECT * FROM portals_updatable_rank_1_prt_11; ALTER TABLE portals_updatable_rank EXCHANGE PARTITION FOR (9) WITH TABLE aopart; +ANALYZE portals_updatable_rank; BEGIN; DECLARE c CURSOR FOR SELECT * FROM portals_updatable_rank WHERE rank = 10; -- isolate the remaining heap part FETCH 1 FROM c; diff --git a/src/test/regress/expected/qp_dropped_cols.out b/src/test/regress/expected/qp_dropped_cols.out index 5c474566b374..7ff47894c126 100644 --- a/src/test/regress/expected/qp_dropped_cols.out +++ b/src/test/regress/expected/qp_dropped_cols.out @@ -13838,6 +13838,7 @@ SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_char_candidate ORDER BY -- DML on partition table INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_char SELECT 'a','b', 1, 'a', 'a'; +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_char ORDER BY 1,2,3; col2 | col3 | col4 | col5 | col1 ------+------+------+------+------ @@ -13909,6 +13910,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_decimal_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_decimal" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_decim_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_decimal" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_decimal ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -14014,6 +14016,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int4_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int4" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int4_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int4" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_int4 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14119,6 +14122,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int8_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int8" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int8_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int8" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_int8 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 --------------------+--------------------+------+------+-------------------- @@ -14224,6 +14228,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_interva_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_interval" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_inter_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_interval" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_interval ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14329,6 +14334,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_numeric_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_numeric" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_numer_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_numeric" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_numeric ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14434,6 +14440,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_char_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_char" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_char_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_char" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_char VALUES('g','g','a',0, 'g'); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_char ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -14538,6 +14545,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_decimal_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_decimal" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_decimal_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_decimal" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_decimal ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -14642,6 +14650,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int4_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_int4" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int4_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_int4" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_int4 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14746,6 +14755,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int8_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_int8" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int8_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_int8" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_int8 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 --------------------+--------------------+------+------+-------------------- @@ -14850,6 +14860,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_interval_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_interval" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_interval_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_interval" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_interval ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14954,6 +14965,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_numeric_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_numeric" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_numeric_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_numeric" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_numeric ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -15058,6 +15070,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_char_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_char" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_char_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_char" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_char VALUES('g','g','a',0, 'g'); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_char ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -15168,6 +15181,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_decimal_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_decimal" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_decimal_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_decimal" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_decimal ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -15278,6 +15292,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int4_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int4" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int4_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int4" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_int4 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -15388,6 +15403,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int8_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int8" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int8_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int8" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_int8 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 --------------------+--------------------+------+------+-------------------- @@ -15498,6 +15514,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_interval_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_interval" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_interval_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_interval" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_interval ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -15608,6 +15625,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_numeric_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_numeric" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_numeric_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_numeric" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_numeric ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- diff --git a/src/test/regress/expected/qp_dropped_cols_optimizer.out b/src/test/regress/expected/qp_dropped_cols_optimizer.out index a300b4ff1820..2ae7c0f9d17a 100644 --- a/src/test/regress/expected/qp_dropped_cols_optimizer.out +++ b/src/test/regress/expected/qp_dropped_cols_optimizer.out @@ -13742,6 +13742,7 @@ SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_char_candidate ORDER BY -- DML on partition table INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_char SELECT 'a','b', 1, 'a', 'a'; +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_char ORDER BY 1,2,3; col2 | col3 | col4 | col5 | col1 ------+------+------+------+------ @@ -13813,6 +13814,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_decimal_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_decimal" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_decim_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_decimal" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_decimal ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -13918,6 +13920,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int4_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int4" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int4_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int4" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_int4 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14023,6 +14026,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int8_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int8" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_int8_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_int8" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_int8 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 --------------------+--------------------+------+------+-------------------- @@ -14128,6 +14132,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_interva_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_interval" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_inter_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_interval" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_interval ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14233,6 +14238,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addc NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_numeric_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_numeric" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_addcol_dml_numer_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_addcol_dml_numeric" INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_numeric ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14338,6 +14344,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_char_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_char" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_char_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_char" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_char VALUES('g','g','a',0, 'g'); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_char ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -14442,6 +14449,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_decimal_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_decimal" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_decimal_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_decimal" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_decimal ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -14546,6 +14554,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int4_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_int4" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int4_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_int4" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_int4 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14650,6 +14659,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int8_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_int8" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_int8_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_int8" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_int8 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 --------------------+--------------------+------+------+-------------------- @@ -14754,6 +14764,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_interval_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_interval" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_interval_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_interval" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_interval ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14858,6 +14869,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_numeric_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_dml_numeric" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_dml_numeric_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_dml_numeric" INSERT INTO mpp21090_xchange_pttab_dropcol_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_numeric ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -14962,6 +14974,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_char_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_char" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_char_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_char" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_char VALUES('g','g','a',0, 'g'); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_char ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -15072,6 +15085,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_decimal_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_decimal" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_decimal_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_decimal" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_decimal ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ------+------+------+------+------ @@ -15182,6 +15196,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int4_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int4" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int4_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int4" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_int4 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -15292,6 +15307,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int8_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int8" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_int8_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_int8" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_int8 ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 --------------------+--------------------+------+------+-------------------- @@ -15402,6 +15418,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_interval_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_interval" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_interval_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_interval" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_interval ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- @@ -15512,6 +15529,7 @@ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_ NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_numeric_1_prt_parttwo" for table "mpp21090_xchange_pttab_dropcol_idx_dml_numeric" NOTICE: CREATE TABLE will create partition "mpp21090_xchange_pttab_dropcol_idx_dml_numeric_1_prt_partthree" for table "mpp21090_xchange_pttab_dropcol_idx_dml_numeric" INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_numeric ORDER BY 1,2,3,4; col1 | col2 | col3 | col4 | col5 ----------+----------+------+------+---------- diff --git a/src/test/regress/input/uao_ddl/alter_ao_part_exch.source b/src/test/regress/input/uao_ddl/alter_ao_part_exch.source index 8e58b15878a5..1a555589e5b7 100644 --- a/src/test/regress/input/uao_ddl/alter_ao_part_exch.source +++ b/src/test/regress/input/uao_ddl/alter_ao_part_exch.source @@ -24,6 +24,8 @@ insert into ao_part(col1, col2, col3) values (1, '2008-04-15', 'ao_row'), (1, '2008-04-15', 'heap'), (1, '2008-04-05', 'ao_col'), (1, '2008-05-06', 'ao_row'); +analyze ao_part; + select count(*) FROM pg_appendonly WHERE visimapidxid is not NULL AND visimapidxid is not NULL AND relid in (SELECT c.oid FROM pg_class c inner join pg_namespace n ON c.relnamespace = n.oid and c.relname like diff --git a/src/test/regress/output/uao_ddl/alter_ao_part_exch.source b/src/test/regress/output/uao_ddl/alter_ao_part_exch.source index e3feb1a5684c..e60d4650ed73 100644 --- a/src/test/regress/output/uao_ddl/alter_ao_part_exch.source +++ b/src/test/regress/output/uao_ddl/alter_ao_part_exch.source @@ -47,6 +47,7 @@ insert into ao_part(col1, col2, col3) values (1, '2008-03-04', 'ao_row'), (1, '2008-03-04', 'heap'), (1, '2008-03-04', 'ao_col'), (1, '2008-04-15', 'ao_row'), (1, '2008-04-15', 'heap'), (1, '2008-04-05', 'ao_col'), (1, '2008-05-06', 'ao_row'); +analyze ao_part; select count(*) FROM pg_appendonly WHERE visimapidxid is not NULL AND visimapidxid is not NULL AND relid in (SELECT c.oid FROM pg_class c inner join pg_namespace n ON c.relnamespace = n.oid and c.relname like diff --git a/src/test/regress/sql/AOCO_Compression.sql b/src/test/regress/sql/AOCO_Compression.sql index 4fa6c49f0520..2bdb2e54dd6a 100644 --- a/src/test/regress/sql/AOCO_Compression.sql +++ b/src/test/regress/sql/AOCO_Compression.sql @@ -157,7 +157,7 @@ TRUNCATE table co_crtb_with_strg_dir_and_col_ref_1; -- Insert data again -- insert into co_crtb_with_strg_dir_and_col_ref_1 select * from co_crtb_with_strg_dir_and_col_ref_1_uncompr order by a1; - +analyze co_crtb_with_strg_dir_and_col_ref_1; -- -- Select the data: Using the JOIN as mentioned above -- @@ -216,7 +216,7 @@ CREATE INDEX co_cr_sub_partzlib8192_1_idx_btree ON co_cr_sub_partzlib8192_1(a9); INSERT INTO co_cr_sub_partzlib8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); INSERT INTO co_cr_sub_partzlib8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(500,510),'F',2010,'f','b','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','2001-12-25 02:22:11','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child',generate_series(2500,2516),'2011-10-12','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The type integer is the usual choice, as it offers the best balance between range, storage size, and performance The type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer ','1134.26',311353,generate_series(3982,3992),7885,'0101','2002-02-12 01:31:14+1344','2003-11-14 01:41:15','((1,1),(0,1),(1,1))','((2,1)(1,5))','08:00:2b:01:01:03','1-3','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an errorThe types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges.','((6,5)(4,2))','(3,6)',12.233,'((5,4),2)',12,3114,'(1,1,0,3)','2010-03-21',43164,'$1,500.00','192.167.2','126.1.1.1','10:30:55','Parents and other family members are always welcome at Stratford. After the first two weeks ofschool','0','1',33,44); - +ANALYZE co_cr_sub_partzlib8192_1; --Create Uncompressed table of same schema definition @@ -521,7 +521,7 @@ CREATE INDEX co_wt_sub_partrle_type8192_1_idx_btree ON co_wt_sub_partrle_type819 INSERT INTO co_wt_sub_partrle_type8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); INSERT INTO co_wt_sub_partrle_type8192_1(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(500,510),'F',2010,'f','b','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','2001-12-25 02:22:11','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child',generate_series(2500,2516),'2011-10-12','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The type integer is the usual choice, as it offers the best balance between range, storage size, and performance The type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer ','1134.26',311353,generate_series(3982,3992),7885,'0101','2002-02-12 01:31:14+1344','2003-11-14 01:41:15','((1,1),(0,1),(1,1))','((2,1)(1,5))','08:00:2b:01:01:03','1-3','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an errorThe types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges.','((6,5)(4,2))','(3,6)',12.233,'((5,4),2)',12,3114,'(1,1,0,3)','2010-03-21',43164,'$1,500.00','192.167.2','126.1.1.1','10:30:55','Parents and other family members are always welcome at Stratford. After the first two weeks ofschool','0','1',33,44); - +ANALYZE co_wt_sub_partrle_type8192_1; @@ -827,7 +827,7 @@ CREATE INDEX ao_wt_sub_partzlib8192_5_idx_btree ON ao_wt_sub_partzlib8192_5(a9); INSERT INTO ao_wt_sub_partzlib8192_5(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); INSERT INTO ao_wt_sub_partzlib8192_5(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(500,510),'F',2010,'f','b','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','2001-12-25 02:22:11','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child',generate_series(2500,2516),'2011-10-12','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The type integer is the usual choice, as it offers the best balance between range, storage size, and performance The type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer is the usual choice, as it offers the best balance between range, storage size, and performanceThe type integer ','1134.26',311353,generate_series(3982,3992),7885,'0101','2002-02-12 01:31:14+1344','2003-11-14 01:41:15','((1,1),(0,1),(1,1))','((2,1)(1,5))','08:00:2b:01:01:03','1-3','Some students may need time to adjust to school.For most children, the adjustment is quick. Tears will usually disappear after Mommy and Daddy leave the classroom. Do not plead with your child The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an errorThe types smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges.','((6,5)(4,2))','(3,6)',12.233,'((5,4),2)',12,3114,'(1,1,0,3)','2010-03-21',43164,'$1,500.00','192.167.2','126.1.1.1','10:30:55','Parents and other family members are always welcome at Stratford. After the first two weeks ofschool','0','1',33,44); - +ANALYZE ao_wt_sub_partzlib8192_5; --Create Uncompressed table of same schema definition diff --git a/src/test/regress/sql/alter_table_ao.sql b/src/test/regress/sql/alter_table_ao.sql index e7fc600531e8..39f3620ee629 100644 --- a/src/test/regress/sql/alter_table_ao.sql +++ b/src/test/regress/sql/alter_table_ao.sql @@ -262,6 +262,8 @@ insert into testbug_char5 (timest,user_id,to_be_drop) select '201203',1111,'1000 insert into testbug_char5 (timest,user_id,to_be_drop) select '201204',1111,'10000'; insert into testbug_char5 (timest,user_id,to_be_drop) select '201205',1111,'10000'; +analyze testbug_char5; + select * from testbug_char5 order by 1,2; ALTER TABLE testbug_char5 drop column to_be_drop; diff --git a/src/test/regress/sql/gporca.sql b/src/test/regress/sql/gporca.sql index 858bd2ecaa83..501303374538 100644 --- a/src/test/regress/sql/gporca.sql +++ b/src/test/regress/sql/gporca.sql @@ -2354,6 +2354,21 @@ select count(*) as expect_20 from gpexp_hash h join gpexp_repl r on h.a=r.a; explain select count(*) as expect_20 from noexp_hash h join gpexp_repl r on h.a=r.a; select count(*) as expect_20 from noexp_hash h join gpexp_repl r on h.a=r.a; +create table part1(a int, b int) partition by range(b) (start(1) end(5) every(1)); +create table part2(a int, b int) partition by range(b) (start(1) end(5) every(1)); +insert into part1 select i, (i % 2) + 1 from generate_series(1, 1000) i; +insert into part2 select i, (i % 2) + 1 from generate_series(1, 100) i; +-- make sure some child partitions have not been analyzed. This just means that +-- stats are missing for some child partition but not necessarily that the relation +-- is empty. So we should not flag this as an empty relation +analyze part1_1_prt_1; +analyze part1_1_prt_2; +analyze part2_1_prt_1; +analyze part2_1_prt_2; +-- the plan should contain a 2 stage limit. If we incorrectly estimate that the +-- relation is empty, we would end up choosing a single stage limit. +explain select * from part1, part2 where part1.b = part2.b limit 5; + -- start_ignore DROP SCHEMA orca CASCADE; -- end_ignore diff --git a/src/test/regress/sql/oid_consistency.sql b/src/test/regress/sql/oid_consistency.sql index ae48e129a0d2..935b35cb899b 100644 --- a/src/test/regress/sql/oid_consistency.sql +++ b/src/test/regress/sql/oid_consistency.sql @@ -141,7 +141,7 @@ INSERT INTO constraint_pt1 SELECT i, '2008-01-13', i FROM generate_series(1,5)i; INSERT INTO constraint_pt1 SELECT i, '2008-02-13', i FROM generate_series(1,5)i; INSERT INTO constraint_pt1 SELECT i, '2008-03-13', i FROM generate_series(1,5)i; INSERT INTO constraint_t1 SELECT i, '2008-02-02', i FROM generate_series(11,15)i; - +ANALYZE constraint_pt1; ALTER TABLE constraint_pt1 EXCHANGE PARTITION Feb08 WITH TABLE constraint_t1; select verify('constraint_pt1_1_prt_feb08'); @@ -180,6 +180,8 @@ ALTER TABLE constraint_pt2 EXCHANGE PARTITION Feb08 WITH TABLE constraint_t2, SPLIT PARTITION FOR ('2008-01-01') AT ('2008-01-16') INTO (PARTITION jan08_15, PARTITION jan08_31); +ANALYZE constraint_pt2; + select verify('constraint_pt2_1_prt_feb08'); select verify('constraint_t2'); diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql index 6e01a4a3a77c..a4468ec4bd89 100644 --- a/src/test/regress/sql/partition.sql +++ b/src/test/regress/sql/partition.sql @@ -251,6 +251,7 @@ insert into bar_p values(100); alter table foo_p exchange partition for(rank(6)) with table bar_p; alter table foo_p exchange partition for(rank(6)) with table bar_p without validation; +analyze foo_p; select * from foo_p; drop table foo_p, bar_p; @@ -262,6 +263,7 @@ create table bar_p(i int, j int) distributed by (i); insert into bar_p values(6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; select * from bar_p; -- test that we got the dependencies right @@ -295,6 +297,7 @@ create table bar_p(i int, j int) distributed by (i); insert into foo_p values(1, 1), (2, 1), (3, 1); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; drop table bar_p; drop table foo_p; @@ -308,6 +311,7 @@ create table bar_p(i int, j int) with(appendonly = true) distributed by (i); insert into foo_p values(1, 1), (2, 1), (3, 2); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; drop table bar_p; drop table foo_p; @@ -321,6 +325,7 @@ create table bar_p(i int, j int) with(appendonly = true) distributed by (i); insert into foo_p values(1, 2), (2, 3), (3, 4); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; drop table bar_p; drop table foo_p; @@ -333,6 +338,7 @@ create table bar_p(i int, j int) distributed by (i); insert into bar_p values(6, 6); alter table foo_p exchange partition for(rank(6)) with table bar_p; +analyze foo_p; select * from foo_p; select * from bar_p; @@ -3589,6 +3595,7 @@ select * from part_tab_1_prt_2; -- Right part insert into part_tab_1_prt_3 values(5,5); +analyze part_tab; select * from part_tab; select * from part_tab_1_prt_3; @@ -3605,6 +3612,7 @@ insert into input2 select i, i from (select generate_series(1,10) as i) as t; -- Multiple range table entries in the plan insert into part_tab_1_prt_1 select i1.x, i2.y from input1 as i1 join input2 as i2 on i1.x = i2.x where i2.y = 5; +analyze part_tab; select * from part_tab; select * from part_tab_1_prt_1; @@ -3643,6 +3651,7 @@ select * from deep_part; -- Correct leaf part insert into deep_part_1_prt_male_2_prt_1_3_prt_1 values (1, 1, 1, 'M'); +analyze deep_part; select * from deep_part; select * from deep_part_1_prt_male_2_prt_1_3_prt_1; @@ -3676,6 +3685,7 @@ drop table if exists part_tab; create table part_tab ( i int, j int) distributed by (i) partition by range(j) (start(0) end(10) every(2)); -- Wrong part insert into part_tab_1_prt_1 values(5,5); +analyze part_tab; select * from part_tab; select * from part_tab_1_prt_1; @@ -3701,6 +3711,7 @@ insert into input2 select i, i from (select generate_series(1,10) as i) as t; -- Multiple range table entries in the plan insert into part_tab_1_prt_1 select i1.x, i2.y from input1 as i1 join input2 as i2 on i1.x = i2.x where i2.y = 5; +analyze part_tab; select * from part_tab; select * from part_tab_1_prt_1; @@ -3740,6 +3751,7 @@ select * from deep_part; -- Correct leaf part insert into deep_part_1_prt_male_2_prt_1_3_prt_1 values (1, 1, 1, 'M'); +analyze deep_part; select * from deep_part; select * from deep_part_1_prt_male_2_prt_1_3_prt_1; diff --git a/src/test/regress/sql/partition_pruning.sql b/src/test/regress/sql/partition_pruning.sql index 397af63453c2..1bb2f2010c58 100644 --- a/src/test/regress/sql/partition_pruning.sql +++ b/src/test/regress/sql/partition_pruning.sql @@ -825,6 +825,7 @@ insert into sales_exchange_part values(1, '2011-01-01', 10.1, 'usa'); ALTER TABLE sales ALTER PARTITION FOR (RANK(1)) EXCHANGE PARTITION FOR ('usa') WITH TABLE sales_exchange_part ; +ANALYZE sales; -- TODO: #141973839. Expected 10 parts, currently selecting 15 parts. First level: 4 parts + 1 default. Second level 2 parts. Total 10 parts. select get_selected_parts('explain analyze select * from sales where region = ''usa'' or region = ''asia'';'); diff --git a/src/test/regress/sql/portals_updatable.sql b/src/test/regress/sql/portals_updatable.sql index d7cd1b525801..7849900de4e5 100644 --- a/src/test/regress/sql/portals_updatable.sql +++ b/src/test/regress/sql/portals_updatable.sql @@ -243,6 +243,7 @@ DROP TABLE aopart; CREATE TABLE aopart (LIKE portals_updatable_rank) WITH (appendonly=true) DISTRIBUTED BY (id); INSERT INTO aopart SELECT * FROM portals_updatable_rank_1_prt_11; ALTER TABLE portals_updatable_rank EXCHANGE PARTITION FOR (9) WITH TABLE aopart; +ANALYZE portals_updatable_rank; BEGIN; DECLARE c CURSOR FOR SELECT * FROM portals_updatable_rank WHERE rank = 10; -- isolate the remaining heap part FETCH 1 FROM c; diff --git a/src/test/regress/sql/qp_dropped_cols.sql b/src/test/regress/sql/qp_dropped_cols.sql index 8610a107afe3..7a8b93193703 100644 --- a/src/test/regress/sql/qp_dropped_cols.sql +++ b/src/test/regress/sql/qp_dropped_cols.sql @@ -7448,6 +7448,7 @@ SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_char_candidate ORDER BY -- DML on partition table INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_char SELECT 'a','b', 1, 'a', 'a'; +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_char ORDER BY 1,2,3; UPDATE mpp21090_xchange_pttab_dropcol_addcol_dml_char SET col5 = 'z' WHERE col2 = 'a' AND col5 = 'a'; @@ -7483,6 +7484,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1.00) end(10.00) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(10.00) end(20.00) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(20.00) end(30.00)); INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_decimal ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_addcol_dml_decimal DROP COLUMN col1; @@ -7536,6 +7538,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1) end(100000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(100000001) end(200000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(200000001) end(300000001)); INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_int4 ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_addcol_dml_int4 DROP COLUMN col1; @@ -7589,6 +7592,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1) end(1000000000000000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(1000000000000000001) end(2000000000000000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(2000000000000000001) end(3000000000000000001)); INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_int8 ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_addcol_dml_int8 DROP COLUMN col1; @@ -7642,6 +7646,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start('1 sec') end('1 min') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start('1 min') end('1 hour') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start('1 hour') end('12 hours')); INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_interval ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_addcol_dml_interval DROP COLUMN col1; @@ -7695,6 +7700,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1.000000) end(10.000000) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(10.000000) end(20.000000) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(20.000000) end(30.000000)); INSERT INTO mpp21090_xchange_pttab_dropcol_addcol_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_addcol_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_addcol_dml_numeric ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_addcol_dml_numeric DROP COLUMN col1; @@ -7748,6 +7754,7 @@ DISTRIBUTED by (col1) PARTITION BY LIST(col2)(partition partone VALUES('a','b','c','d','e','f','g','h') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo VALUES('i','j','k','l','m','n','o','p') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree VALUES('q','r','s','t','u','v','w','x')); INSERT INTO mpp21090_xchange_pttab_dropcol_dml_char VALUES('g','g','a',0, 'g'); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_char ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_dml_char DROP COLUMN col1; @@ -7800,6 +7807,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1.00) end(10.00) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(10.00) end(20.00) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(20.00) end(30.00)); INSERT INTO mpp21090_xchange_pttab_dropcol_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_decimal ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_dml_decimal DROP COLUMN col1; @@ -7852,6 +7860,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1) end(100000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(100000001) end(200000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(200000001) end(300000001)); INSERT INTO mpp21090_xchange_pttab_dropcol_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_int4 ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_dml_int4 DROP COLUMN col1; @@ -7904,6 +7913,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1) end(1000000000000000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(1000000000000000001) end(2000000000000000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(2000000000000000001) end(3000000000000000001)); INSERT INTO mpp21090_xchange_pttab_dropcol_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_int8 ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_dml_int8 DROP COLUMN col1; @@ -7956,6 +7966,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start('1 sec') end('1 min') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start('1 min') end('1 hour') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start('1 hour') end('12 hours')); INSERT INTO mpp21090_xchange_pttab_dropcol_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_interval ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_dml_interval DROP COLUMN col1; @@ -8008,6 +8019,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1.000000) end(10.000000) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(10.000000) end(20.000000) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(20.000000) end(30.000000)); INSERT INTO mpp21090_xchange_pttab_dropcol_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_dml_numeric ORDER BY 1,2,3,4; ALTER TABLE mpp21090_xchange_pttab_dropcol_dml_numeric DROP COLUMN col1; @@ -8060,6 +8072,7 @@ DISTRIBUTED by (col1) PARTITION BY LIST(col2)(partition partone VALUES('a','b','c','d','e','f','g','h') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo VALUES('i','j','k','l','m','n','o','p') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree VALUES('q','r','s','t','u','v','w','x')); INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_char VALUES('g','g','a',0, 'g'); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_char; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_char ORDER BY 1,2,3,4; DROP INDEX IF EXISTS mpp21090_xchange_pttab_dropcol_idx_dml_idx_char; @@ -8115,6 +8128,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1.00) end(10.00) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(10.00) end(20.00) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(20.00) end(30.00)); INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_decimal VALUES(2.00,2.00,'a',0, 2.00); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_decimal; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_decimal ORDER BY 1,2,3,4; DROP INDEX IF EXISTS mpp21090_xchange_pttab_dropcol_idx_dml_idx_decimal; @@ -8170,6 +8184,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1) end(100000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(100000001) end(200000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(200000001) end(300000001)); INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_int4 VALUES(20000000,20000000,'a',0, 20000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_int4; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_int4 ORDER BY 1,2,3,4; DROP INDEX IF EXISTS mpp21090_xchange_pttab_dropcol_idx_dml_idx_int4; @@ -8225,6 +8240,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1) end(1000000000000000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(1000000000000000001) end(2000000000000000001) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(2000000000000000001) end(3000000000000000001)); INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_int8 VALUES(200000000000000000,200000000000000000,'a',0, 200000000000000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_int8; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_int8 ORDER BY 1,2,3,4; DROP INDEX IF EXISTS mpp21090_xchange_pttab_dropcol_idx_dml_idx_int8; @@ -8280,6 +8296,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start('1 sec') end('1 min') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start('1 min') end('1 hour') WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start('1 hour') end('12 hours')); INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_interval VALUES('10 secs','10 secs','a',0, '10 secs'); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_interval; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_interval ORDER BY 1,2,3,4; DROP INDEX IF EXISTS mpp21090_xchange_pttab_dropcol_idx_dml_idx_interval; @@ -8335,6 +8352,7 @@ DISTRIBUTED by (col1) PARTITION BY RANGE(col2)(partition partone start(1.000000) end(10.000000) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=column),partition parttwo start(10.000000) end(20.000000) WITH (APPENDONLY=true, COMPRESSLEVEL=5, ORIENTATION=row),partition partthree start(20.000000) end(30.000000)); INSERT INTO mpp21090_xchange_pttab_dropcol_idx_dml_numeric VALUES(2.000000,2.000000,'a',0, 2.000000); +ANALYZE mpp21090_xchange_pttab_dropcol_idx_dml_numeric; SELECT * FROM mpp21090_xchange_pttab_dropcol_idx_dml_numeric ORDER BY 1,2,3,4; DROP INDEX IF EXISTS mpp21090_xchange_pttab_dropcol_idx_dml_idx_numeric; From 48226510891a02b74378d7fb754bc8f3f78005f9 Mon Sep 17 00:00:00 2001 From: Ning Yu Date: Tue, 18 Feb 2020 17:29:52 +0800 Subject: [PATCH 065/102] replication/basebackup: use hash table for the exclude list The replication basebackup command supports excluding path names with the EXCLUDE clause, the excluding list used to be stored internally as a string list and be matched one by one, this can be very inefficient when the excluding list is large. Now we store the path names to exclude in a hash table for better performance. (cherry picked from commit b6236a50e5f5c15a97b4b68a063be88b72164ad3) --- src/backend/replication/basebackup.c | 101 ++++++++++++++++++++++----- 1 file changed, 84 insertions(+), 17 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index e592e427e666..6ecb6b1029c6 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -19,6 +19,7 @@ #include "miscadmin.h" #include "access/genam.h" +#include "access/hash.h" #include "access/xact.h" #include "access/xlog_internal.h" /* for pg_start/stop_backup */ #include "cdb/cdbvars.h" @@ -61,12 +62,12 @@ typedef struct bool nowait; bool includewal; uint32 maxrate; - List *exclude; + HTAB *exclude; } basebackup_options; -static bool match_exclude_list(char *path, List *exclude); -static int64 sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces, List *exclude); +static bool match_exclude_list(char *path, HTAB *exclude); +static int64 sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces, HTAB *exclude); static int64 sendTablespace(char *path, bool sizeonly); static bool sendFile(char *readfilename, char *tarfilename, struct stat * statbuf, bool missing_ok); @@ -673,6 +674,35 @@ compareWalFileNames(const void *a, const void *b) return strcmp(fna + 8, fnb + 8); } +/* Hash entire string */ +static uint32 +key_string_hash(const void *key, Size keysize) +{ + Size s_len = strlen((const char *) key); + + Assert(keysize == sizeof(char *)); + return DatumGetUInt32(hash_any((const unsigned char *) key, (int) s_len)); +} + +/* Compare entire string. */ +static int +key_string_compare(const void *key1, const void *key2, Size keysize) +{ + Assert(keysize == sizeof(char *)); + + return strcmp(*((const char **) key1), key2); +} + +/* Copy string by copying pointer. */ +static void * +key_string_copy(void *dest, const void *src, Size keysize) +{ + Assert(keysize == sizeof(char *)); + + *((char **) dest) = (char *) src; /* trust caller re allocation */ + return NULL; /* not used */ +} + /* * Parse the base backup options passed down by the parser */ @@ -688,7 +718,14 @@ parse_basebackup_options(List *options, basebackup_options *opt) bool o_maxrate = false; MemSet(opt, 0, sizeof(*opt)); - opt->exclude = NIL; + + /* + * The exclude hash table is only created if EXCLUDE options are specified. + * The matching function is optimized to run fast when the hash table is + * NULL. + */ + opt->exclude = NULL; + foreach(lopt, options) { DefElem *defel = (DefElem *) lfirst(lopt); @@ -760,7 +797,39 @@ parse_basebackup_options(List *options, basebackup_options *opt) else if (strcmp(defel->defname, "exclude") == 0) { /* EXCLUDE option can be specified multiple times */ - opt->exclude = lappend(opt->exclude, defel->arg); + bool found; + + if (unlikely(opt->exclude == NULL)) + { + HASHCTL hashctl; + + /* + * The hash table stores the string keys in-place if the + * `match` and `keycopy` functions are not explicitly + * specified. In our case MAXPGPATH bytes need to be reserved + * for each key, which is too wasteful. + * + * By specifying the `match` and `keycopy` functions we could + * allocate the strings separately and store only the string + * pointers in the hash table. + */ + hashctl.hash = key_string_hash; + hashctl.match = key_string_compare; + hashctl.keycopy = key_string_copy; + + /* The hash table is used as a set, only the keys are meaningful */ + hashctl.keysize = sizeof(char *); + hashctl.entrysize = hashctl.keysize; + + opt->exclude = hash_create("replication exclude", + 64 /* nelem */, + &hashctl, + HASH_ELEM | HASH_FUNCTION | + HASH_COMPARE | HASH_KEYCOPY); + } + + hash_search(opt->exclude, pstrdup(strVal(defel->arg)), + HASH_ENTER, &found); } else elog(ERROR, "option \"%s\" not recognized", @@ -769,6 +838,9 @@ parse_basebackup_options(List *options, basebackup_options *opt) if (opt->label == NULL) opt->label = "base backup"; + if (opt->exclude) + hash_freeze(opt->exclude); + elogif(debug_basebackup, LOG, "basebackup options -- " "label = %s, " @@ -1065,7 +1137,7 @@ sendTablespace(char *path, bool sizeonly) size = 512; /* Size of the header just added */ /* Send all the files in the tablespace version directory */ - size += sendDir(pathbuf, strlen(path), sizeonly, NIL, NIL); + size += sendDir(pathbuf, strlen(path), sizeonly, NIL, NULL); return size; } @@ -1076,19 +1148,14 @@ sendTablespace(char *path, bool sizeonly) * "./pg_log" etc). */ static bool -match_exclude_list(char *path, List *exclude) +match_exclude_list(char *path, HTAB *exclude) { - ListCell *l; + bool found = false; - foreach (l, exclude) - { - char *val = strVal(lfirst(l)); - - if (strcmp(val, path) == 0) - return true; - } + if (unlikely(exclude)) + hash_search(exclude, path, HASH_FIND, &found); - return false; + return found; } /* @@ -1103,7 +1170,7 @@ match_exclude_list(char *path, List *exclude) */ static int64 sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces, - List *exclude) + HTAB *exclude) { DIR *dir; struct dirent *de; From dcd7b87157ebff430ff61ad619a222c9265701d3 Mon Sep 17 00:00:00 2001 From: Ning Yu Date: Tue, 18 Feb 2020 17:30:52 +0800 Subject: [PATCH 066/102] pg_basebackup: new option to load exclude lists from files The pg_basebackup command supports the --exclude option to exclude a path name, it can be provided multiple times to exclude multiple path names. However it can be provided at most 255 times, the limit is hard-coded in the source code because it is internall a fix-sized string array. To support excluding more path names, we could enlarge this number, however there is still limits on the cmdline size or the argument count. So we now provide a new option, --exclude-from=FILE, to get the path names to exclude from a file. (cherry picked from commit f0bae10bc8944cb405b691267452a419a392f6fc) --- src/bin/pg_basebackup/pg_basebackup.c | 83 ++++++++++++++++---- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 43 +++++++++- 2 files changed, 111 insertions(+), 15 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 58caaf45b31f..c506c894c939 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -74,6 +74,8 @@ static bool forceoverwrite = false; #define MAX_EXCLUDE 255 static int num_exclude = 0; static char *excludes[MAX_EXCLUDE]; +static int num_exclude_from = 0; +static char *excludefroms[MAX_EXCLUDE]; static int target_gp_dbid = 0; /* Progress counters */ @@ -271,6 +273,8 @@ usage(void) printf(_(" -w, --no-password never prompt for password\n")); printf(_(" -W, --password force password prompt (should happen automatically)\n")); printf(_(" -E, --exclude exclude path names\n")); + printf(_(" --exclude-from=FILE\n" + " get path names to exclude from FILE\n")); printf(_("\nReport bugs to .\n")); } @@ -1681,31 +1685,69 @@ WriteRecoveryConf(void) fclose(cf); } +static void +add_to_exclude_list(PQExpBufferData *buf, const char *exclude) +{ + char quoted[MAXPGPATH]; + int error; + size_t len; + + error = 1; + len = PQescapeStringConn(conn, quoted, exclude, MAXPGPATH, &error); + if (len == 0 || error != 0) + { + fprintf(stderr, _("%s: could not process exclude \"%s\": %s\n"), + progname, exclude, PQerrorMessage(conn)); + disconnect_and_exit(1); + } + appendPQExpBuffer(buf, " EXCLUDE '%s'", quoted); +} + static char * -build_exclude_list(char **exclude_list, int num) +build_exclude_list(void) { PQExpBufferData buf; int i; - char quoted[MAXPGPATH]; - int error; - size_t len; - if (num == 0) + if (num_exclude == 0 && num_exclude_from == 0) return ""; initPQExpBuffer(&buf); - for (i = 0; i < num; i++) + for (i = 0; i < num_exclude; i++) + add_to_exclude_list(&buf, excludes[i]); + + for (i = 0; i < num_exclude_from; i++) { - error = 1; - len = PQescapeStringConn(conn, quoted, exclude_list[i], MAXPGPATH, &error); - if (len == 0 || error != 0) + const char *filename = excludefroms[i]; + FILE *file = fopen(filename, "r"); + char str[MAXPGPATH]; + + if (file == NULL) { - fprintf(stderr, _("%s: could not process exclude \"%s\": %s\n"), - progname, exclude_list[i], PQerrorMessage(conn)); + fprintf(stderr, _("%s: could not open exclude-from file \"%s\": %m\n"), + progname, filename); disconnect_and_exit(1); } - appendPQExpBuffer(&buf, "EXCLUDE '%s'", quoted); + + /* + * Each line contains a pathname to exclude. + * + * We must use fgets() instead of fscanf("%s") to correctly handle the + * spaces in the filenames. + */ + while (fgets(str, sizeof(str), file)) + { + /* Remove all trailing \r and \n */ + for (int len = strlen(str); + len > 0 && (str[len - 1] == '\r' || str[len - 1] == '\n'); + len--) + str[len - 1] = '\0'; + + add_to_exclude_list(&buf, str); + } + + fclose(file); } if (PQExpBufferDataBroken(buf)) @@ -1798,7 +1840,7 @@ BaseBackup(void) if (maxrate > 0) maxrate_clause = psprintf("MAX_RATE %u", maxrate); - exclude_list = build_exclude_list(excludes, num_exclude); + exclude_list = build_exclude_list(); if (verbose) fprintf(stderr, @@ -1818,7 +1860,7 @@ BaseBackup(void) maxrate_clause ? maxrate_clause : "", exclude_list); - if (num_exclude != 0) + if (exclude_list[0] != '\0') free(exclude_list); if (PQsendQuery(conn, basebkp) == 0) @@ -2138,6 +2180,7 @@ main(int argc, char **argv) {"progress", no_argument, NULL, 'P'}, {"xlogdir", required_argument, NULL, 1}, {"exclude", required_argument, NULL, 'E'}, + {"exclude-from", required_argument, NULL, 2}, {"force-overwrite", no_argument, NULL, 128}, {"target-gp-dbid", required_argument, NULL, 129}, {NULL, 0, NULL, 0} @@ -2165,6 +2208,7 @@ main(int argc, char **argv) } num_exclude = 0; + num_exclude_from = 0; while ((c = getopt_long(argc, argv, "D:F:r:RT:xX:l:zZ:d:c:h:p:U:s:S:wWvPE:", long_options, &option_index)) != -1) { @@ -2306,11 +2350,22 @@ main(int argc, char **argv) { fprintf(stderr, _("%s: too many elements in exclude list: max is %d"), progname, MAX_EXCLUDE); + fprintf(stderr, _("hint: use --exclude-from to load a large exclude list from a file")); exit(1); } excludes[num_exclude++] = pg_strdup(optarg); break; + case 2: /* --exclude-from=FILE */ + if (num_exclude_from >= MAX_EXCLUDE) + { + fprintf(stderr, _("%s: too many elements in exclude-from list: max is %d"), + progname, MAX_EXCLUDE); + exit(1); + } + + excludefroms[num_exclude_from++] = pg_strdup(optarg); + break; case 128: forceoverwrite = true; break; diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 23ec917435b4..7f184ffcfa58 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -2,7 +2,7 @@ use warnings; use Cwd; use TestLib; -use Test::More tests => 39; +use Test::More tests => 42; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -179,3 +179,44 @@ command_fails( [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ], '-T with invalid format fails'); + +# +# GPDB: Exclude some files with the --exclude-from option +# + +my $exclude_tempdir = "$tempdir/backup_exclude"; +my $excludelist = "$tempdir/exclude.list"; + +mkdir "$exclude_tempdir"; +mkdir "$tempdir/pgdata/exclude"; + +open EXCLUDELIST, ">$excludelist"; + +# Put a large amount of non-exist patterns in the exclude-from file, +# the pattern matching is efficient enough to handle them. +for my $i (1..1000000) { + print EXCLUDELIST "./exclude/non_exist.$i\n"; +} + +# Create some files to exclude +for my $i (1..1000) { + print EXCLUDELIST "./exclude/$i\n"; + + open FILE, ">$tempdir/pgdata/exclude/$i"; + close FILE; +} + +# Below file should not be excluded +open FILE, ">$tempdir/pgdata/exclude/keep"; +close FILE; + +close EXCLUDELIST; + +command_ok( + [ 'pg_basebackup', + '-D', "$exclude_tempdir", + '--target-gp-dbid', '123', + '--exclude-from', "$excludelist" ], + 'pg_basebackup runs with exclude-from file'); +ok(! -f "$exclude_tempdir/exclude/0", 'excluded files were not created'); +ok(-f "$exclude_tempdir/exclude/keep", 'other files were created'); From 1ca4a4a1d9b20102d2dbec412bc174f050869bda Mon Sep 17 00:00:00 2001 From: Ning Yu Date: Wed, 4 Mar 2020 16:44:52 +0800 Subject: [PATCH 067/102] Fix flaky aocsam unittest We used to mock AppendOnlyStorageRead_Init() when testing aocs_begin_headerscan(), however as the mocked one does not initialize desc->ao_read.storageAttributes at all, it makes a following assertion, which checks the attributes, flaky. On the other hand aocs_begin_headerscan() itself does not do many useful things, the actual job is done inside AppendOnlyStorageRead_Init(), so to make the test more useful we removed the mocking and test against the real AppendOnlyStorageRead_Init() now. Reviewed-by: Jinbao Chen --- src/backend/access/aocs/test/Makefile | 1 - src/backend/access/aocs/test/aocsam_test.c | 20 ++++++++------------ 2 files changed, 8 insertions(+), 13 deletions(-) diff --git a/src/backend/access/aocs/test/Makefile b/src/backend/access/aocs/test/Makefile index 3583397922b0..7f97fce857a6 100644 --- a/src/backend/access/aocs/test/Makefile +++ b/src/backend/access/aocs/test/Makefile @@ -7,6 +7,5 @@ TARGETS=aocsam include $(top_builddir)/src/backend/mock.mk aocsam.t: \ - $(MOCK_DIR)/backend/cdb/cdbappendonlystorageread_mock.o \ $(MOCK_DIR)/backend/catalog/pg_attribute_encoding_mock.o \ $(MOCK_DIR)/backend/utils/datumstream/datumstream_mock.o diff --git a/src/backend/access/aocs/test/aocsam_test.c b/src/backend/access/aocs/test/aocsam_test.c index d055d55ce64d..df086526d6d1 100644 --- a/src/backend/access/aocs/test/aocsam_test.c +++ b/src/backend/access/aocs/test/aocsam_test.c @@ -37,21 +37,17 @@ test__aocs_begin_headerscan(void **state) strncpy(&pgclass.relname.data[0], "mock_relation", 13); expect_value(RelationGetAttributeOptions, rel, &reldata); will_return(RelationGetAttributeOptions, &opts); - expect_any(AppendOnlyStorageRead_Init, storageRead); - expect_any(AppendOnlyStorageRead_Init, memoryContext); - expect_any(AppendOnlyStorageRead_Init, maxBufferLen); - expect_any(AppendOnlyStorageRead_Init, relationName); - expect_any(AppendOnlyStorageRead_Init, title); - expect_any(AppendOnlyStorageRead_Init, storageAttributes); /* - * AppendOnlyStorageRead_Init assigns storageRead->storageAttributes. - * will_assign_*() functions mandate a paramter as an argument. Here we - * want to set selective members of a parameter. I don't know how this - * can be achieved using cmockery. This test will be meaningful only when - * we are able to set storageAttributes member of desc.ao_read. + * We used to mock AppendOnlyStorageRead_Init() here, however as the mocked + * one does not initialize desc->ao_read.storageAttributes at all, it makes + * the following assertion flaky. + * + * On the other hand aocs_begin_headerscan() itself does not do many useful + * things, the actual job is done inside AppendOnlyStorageRead_Init(), so + * to make the test more useful we removed the mocking and test against the + * real AppendOnlyStorageRead_Init() now. */ - will_be_called(AppendOnlyStorageRead_Init); desc = aocs_begin_headerscan(&reldata, 0); assert_false(desc->ao_read.storageAttributes.compress); assert_int_equal(desc->colno, 0); From fedb7c66cf7d305f100445925ccf64fe73f31e41 Mon Sep 17 00:00:00 2001 From: David Yozie Date: Thu, 5 Mar 2020 14:03:22 -0500 Subject: [PATCH 068/102] docs - pxf jdbc connector supports bigint PARTITION_BY column (#9685) * docs - pxf jdbc connector supports bigint PARTITION_BY column * int represents all integral types, 64-bit signed integer range --- gpdb-doc/markdown/pxf/jdbc_pxf.html.md.erb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gpdb-doc/markdown/pxf/jdbc_pxf.html.md.erb b/gpdb-doc/markdown/pxf/jdbc_pxf.html.md.erb index 10071bb009b3..0183eb92c13a 100644 --- a/gpdb-doc/markdown/pxf/jdbc_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/jdbc_pxf.html.md.erb @@ -94,9 +94,9 @@ You include JDBC connector custom options in the `LOCATION` URI, prefacing each | FETCH_SIZE | Read | Integer that identifies the number of rows to buffer when reading from an external SQL database. Read row batching is enabled by default; the default read fetch size is 1000. | | QUERY_TIMEOUT | Read/Write | Integer that identifies the amount of time (in seconds) that the JDBC driver waits for a statement to execute. The default wait time is infinite. | | POOL_SIZE | Write | Enable thread pooling on `INSERT` operations and identify the number of threads in the pool. Thread pooling is disabled by default. | -| PARTITION_BY | Read | Enables read partitioning. The partition column, \:\. You may specify only one partition column. The JDBC connector supports `date`, `int`, and `enum` \ values. If you do not identify a `PARTITION_BY` column, a single PXF instance services the read request. | -| RANGE | Read | Required when `PARTITION_BY` is specified. The query range; used as a hint to aid the creation of partitions. The `RANGE` format is dependent upon the data type of the partition column. When the partition column is an `enum` type, `RANGE` must specify a list of values, \:\[:\[...]], each of which forms its own fragment. If the partition column is an `int` or `date` type, `RANGE` must specify \:\ and represents the interval from \ through \, inclusive. If the partition column is a `date` type, use the `yyyy-MM-dd` date format. | -| INTERVAL | Read | Required when `PARTITION_BY` is specified and of the `int` or `date` type. The interval, \[:\], of one fragment. Used with `RANGE` as a hint to aid the creation of partitions. Specify the size of the fragment in \. If the partition column is a `date` type, use the \ to specify `year`, `month`, or `day`. PXF ignores `INTERVAL` when the `PARTITION_BY` column is of the `enum` type. | +| PARTITION_BY | Read | Enables read partitioning. The partition column, \:\. You may specify only one partition column. The JDBC connector supports `date`, `int`, and `enum` \ values, where `int` represents any JDBC integral type. If you do not identify a `PARTITION_BY` column, a single PXF instance services the read request. | +| RANGE | Read | Required when `PARTITION_BY` is specified. The query range; used as a hint to aid the creation of partitions. The `RANGE` format is dependent upon the data type of the partition column. When the partition column is an `enum` type, `RANGE` must specify a list of values, \:\[:\[...]], each of which forms its own fragment. If the partition column is an `int` or `date` type, `RANGE` must specify \:\ and represents the interval from \ through \, inclusive. The `RANGE` for an `int` partition column may span any 64-bit signed integer values. If the partition column is a `date` type, use the `yyyy-MM-dd` date format. | +| INTERVAL | Read | Required when `PARTITION_BY` is specified and of the `int`, `bigint`, or `date` type. The interval, \[:\], of one fragment. Used with `RANGE` as a hint to aid the creation of partitions. Specify the size of the fragment in \. If the partition column is a `date` type, use the \ to specify `year`, `month`, or `day`. PXF ignores `INTERVAL` when the `PARTITION_BY` column is of the `enum` type. | | QUOTE_COLUMNS | Read | Controls whether PXF should quote column names when constructing an SQL query to the external database. Specify `true` to force PXF to quote all column names; PXF does not quote column names if any other value is provided. If `QUOTE_COLUMNS` is not specified (the default), PXF automatically quotes *all* column names in the query when *any* column name:
    - includes special characters, or
    - is mixed case and the external database does not support unquoted mixed case identifiers. | From 0a88983bf2eed819ea42ba176dbf660476c191ae Mon Sep 17 00:00:00 2001 From: David Yozie Date: Thu, 5 Mar 2020 16:46:09 -0500 Subject: [PATCH 069/102] docs - add description for jsonb data type (#9660) description is based on a comment for the jsonb_recv function that I found in gpdb/src/backend/utils/adt/jsonb.c --- gpdb-doc/dita/ref_guide/data_types.xml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gpdb-doc/dita/ref_guide/data_types.xml b/gpdb-doc/dita/ref_guide/data_types.xml index d08830d3741f..aea5ae5be4db 100644 --- a/gpdb-doc/dita/ref_guide/data_types.xml +++ b/gpdb-doc/dita/ref_guide/data_types.xml @@ -188,9 +188,9 @@ jsonb - ??? - ??? - ??? + 1 byte + binary string + json of any length in a decomposed binary format + variable unlimited length lseg From 0740922f349c04f606c584a85168bdd5854368fa Mon Sep 17 00:00:00 2001 From: Huiliang Liu Date: Fri, 6 Mar 2020 10:48:57 +0800 Subject: [PATCH 070/102] Make sure ECONNRESET is correct on windows for gpfdist (#9688) ECONNRESET is defined as POSIX errno(108) in some windows build environment. Then gpfdist doesn't redefine it. But that is incorrect value on windows. So we make sure ECONNRESET is defined as WSAECONNRESET in gpfdist. (cherry-pick from master commit: 4f2b5fc9b17bd858c830e44c8ec67f85a0f56d58) --- src/bin/gpfdist/gpfdist.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/bin/gpfdist/gpfdist.c b/src/bin/gpfdist/gpfdist.c index 6489716e2f2d..c3979b873f00 100644 --- a/src/bin/gpfdist/gpfdist.c +++ b/src/bin/gpfdist/gpfdist.c @@ -47,12 +47,10 @@ #include #define SHUT_WR SD_SEND #define socklen_t int -#ifndef ECONNRESET +#undef ECONNRESET #define ECONNRESET WSAECONNRESET #endif -#endif - #include #include "gpfdist_helper.h" #ifdef USE_SSL From 38d43c5eb4906c8775fcdc7bc837011034741ed0 Mon Sep 17 00:00:00 2001 From: Shaoqi Bai Date: Mon, 17 Feb 2020 13:20:38 +0800 Subject: [PATCH 071/102] Add job for Pivotal Greenplum 6 Server for SLES 12 Add compile job for sles12 Add ICW job for sles12 Note: Modify comments for workaround_before_concourse_stops_stripping_suid_bits, and tiny code style change Co-authored-by: Bradford D. Boyle Co-authored-by: Shaoqi Bai --- concourse/pipelines/gen_pipeline.py | 7 +- .../pipelines/gpdb_6X_STABLE-generated.yml | 145 +++++++++++++++- concourse/pipelines/templates/gpdb-tpl.yml | 161 ++++++++++++++++++ concourse/scripts/compile_gpdb.bash | 21 ++- concourse/scripts/ic_gpdb.bash | 2 +- concourse/scripts/setup_gpadmin_user.bash | 17 +- gpAux/Makefile | 6 +- gpAux/Makefile.global | 1 + 8 files changed, 341 insertions(+), 19 deletions(-) diff --git a/concourse/pipelines/gen_pipeline.py b/concourse/pipelines/gen_pipeline.py index 0d9ef621edb8..db7592c0f0c1 100755 --- a/concourse/pipelines/gen_pipeline.py +++ b/concourse/pipelines/gen_pipeline.py @@ -74,6 +74,9 @@ 'test_gpdb_clients_windows', 'walrep_2', 'Publish Server Builds', + 'compile_gpdb_sles12', + 'icw_gporca_sles12', + 'icw_planner_sles12', ] + RELEASE_VALIDATOR_JOB + JOBS_THAT_ARE_GATES ) @@ -292,7 +295,7 @@ def main(): action='store', dest='os_types', default=['centos6'], - choices=['centos6', 'centos7', 'ubuntu18.04', 'win'], + choices=['centos6', 'centos7', 'ubuntu18.04', 'sles12', 'win'], nargs='+', help='List of OS values to support' ) @@ -369,7 +372,7 @@ def main(): args.pipeline_configuration = 'prod' if args.pipeline_configuration == 'prod' or args.pipeline_configuration == 'full': - args.os_types = ['centos6', 'centos7', 'ubuntu18.04', 'win'] + args.os_types = ['centos6', 'centos7', 'ubuntu18.04', 'sles12', 'win'] args.test_sections = [ 'ICW', 'Replication', diff --git a/concourse/pipelines/gpdb_6X_STABLE-generated.yml b/concourse/pipelines/gpdb_6X_STABLE-generated.yml index d60ba24bd4d7..c38ed2bb36e0 100644 --- a/concourse/pipelines/gpdb_6X_STABLE-generated.yml +++ b/concourse/pipelines/gpdb_6X_STABLE-generated.yml @@ -12,9 +12,9 @@ ## file (example: templates/gpdb-tpl.yml) and regenerate the pipeline ## using appropriate tool (example: gen_pipeline.py -t prod). ## ---------------------------------------------------------------------- -## Generated by gen_pipeline.py at: 2020-01-23 11:20:29.571301 +## Generated by gen_pipeline.py at: 2020-03-05 12:19:44.070911 ## Template file: gpdb-tpl.yml -## OS Types: ['centos6', 'centos7', 'ubuntu18.04', 'win'] +## OS Types: ['centos6', 'centos7', 'ubuntu18.04', 'sles12', 'win'] ## Test Sections: ['ICW', 'Replication', 'ResourceGroups', 'Interconnect', 'CLI', 'UD', 'AA', 'Extensions', 'Gpperfmon'] ## ====================================================================== @@ -171,6 +171,7 @@ groups: - prepare_binary_swap_gpdb_centos6 - compile_gpdb_centos7 - compile_gpdb_ubuntu18.04 + - compile_gpdb_sles12 - compile_gpdb_clients_windows - test_gpdb_clients_windows ## -------------------------------------------------------------------- @@ -183,6 +184,8 @@ groups: - icw_planner_centos7 - icw_gporca_ubuntu18.04 - icw_planner_ubuntu18.04 + - icw_gporca_sles12 + - icw_planner_sles12 - gate_icw_end ## -------------------------------------------------------------------- - interconnect @@ -254,6 +257,7 @@ groups: - compile_gpdb_centos6 - compile_gpdb_centos7 - compile_gpdb_ubuntu18.04 + - compile_gpdb_sles12 - compile_gpdb_clients_windows - test_gpdb_clients_windows @@ -272,6 +276,9 @@ groups: - compile_gpdb_centos7 - icw_gporca_ubuntu18.04 - icw_planner_ubuntu18.04 + - icw_gporca_sles12 + - icw_planner_sles12 + - compile_gpdb_sles12 - gate_icw_end @@ -499,6 +506,27 @@ resources: bucket: ((gcs-bucket)) json_key: ((concourse-gcs-resources-service-account-key)) regexp: gp-internal-artifacts/ubuntu18.04/libquicklz-dev_(1\.5\.0-.*)-1_amd64.deb +- name: libquicklz-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/libquicklz-(1\.5\.0-.*)-1.x86_64.rpm + +- name: libquicklz-devel-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/libquicklz-devel-(1\.5\.0-.*)-1.x86_64.rpm + +- name: libsigar-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/sigar-sles12_x86_64-(1\.6\.5-.*).targz + - name: python-centos6 type: gcs source: @@ -520,6 +548,13 @@ resources: json_key: ((concourse-gcs-resources-service-account-key)) regexp: gp-internal-artifacts/ubuntu18.04/python-(2\.7\.12-.*).tar.gz +- name: python-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/python-(2\.7\.12-.*).tar.gz + - name: gpdb6-centos6-build type: docker-image source: @@ -556,6 +591,24 @@ resources: repository: pivotaldata/gpdb6-ubuntu18.04-test tag: latest + +- name: gpdb6-sles12-build + type: docker-image + source: + repository: pivotaldata/gpdb6-sles12-build + tag: latest + username: ((docker_username)) + password: ((docker_password)) + +- name: gpdb6-sles12-test + type: docker-image + source: + repository: pivotaldata/gpdb6-sles12-test + tag: latest + username: ((docker_username)) + password: ((docker_password)) + + - name: bin_gpdb_centos6 type: gcs source: @@ -677,6 +730,21 @@ resources: json_key: ((concourse-gcs-resources-service-account-key)) regexp: server/published/gpdb6/server-build-(.*)-ubuntu18.04_x86_64((rc-build-type-gcs)).tar.gz + +- name: bin_gpdb_sles12 + type: gcs + source: + bucket: ((gcs-bucket-intermediates)) + json_key: ((concourse-gcs-resources-service-account-key)) + versioned_file: ((pipeline-name))/bin_gpdb_sles12/bin_gpdb.tar.gz + +- name: bin_gpdb_clients_sles12 + type: gcs + source: + bucket: ((gcs-bucket-intermediates)) + json_key: ((concourse-gcs-resources-service-account-key)) + versioned_file: ((pipeline-name))/bin_gpdb_clients_sles12/bin_gpdb_clients.tar.gz + - name: terraform_windows type: terraform source: @@ -945,6 +1013,41 @@ jobs: params: file: gpdb_artifacts/gpdb-clients-ubuntu18.04.tar.gz + +- name: compile_gpdb_sles12 + plan: + - in_parallel: + steps: + - get: reduced-frequency-trigger + trigger: ((reduced-frequency-trigger-flag)) + - get: gpdb_src + trigger: ((gpdb_src-trigger-flag)) + - get: gpdb6-sles12-build + - get: libquicklz-installer + resource: libquicklz-sles12 + - get: libquicklz-devel-installer + resource: libquicklz-devel-sles12 + - get: libsigar-installer + resource: libsigar-sles12 + - get: python-tarball + resource: python-sles12 + - task: compile_gpdb + image: gpdb6-sles12-build + file: gpdb_src/concourse/tasks/compile_gpdb.yml + params: + CONFIGURE_FLAGS: {{configure_flags_with_extensions}} + TARGET_OS: sles + TARGET_OS_VERSION: "12" + BLD_TARGETS: "clients" + RC_BUILD_TYPE_GCS: ((rc-build-type-gcs)) + - in_parallel: + steps: + - put: bin_gpdb_sles12 + params: + file: gpdb_artifacts/bin_gpdb.tar.gz + - put: bin_gpdb_clients_sles12 + params: + file: gpdb_artifacts/gpdb-clients-sles12.tar.gz - name: compile_gpdb_clients_windows serial: true plan: @@ -1152,6 +1255,44 @@ jobs: TEST_OS: ubuntu CONFIGURE_FLAGS: {{configure_flags}} +- name: icw_gporca_sles12 + plan: + - in_parallel: + steps: + - get: gpdb_src + passed: [compile_gpdb_sles12] + - get: bin_gpdb + resource: bin_gpdb_sles12 + passed: [compile_gpdb_sles12] + trigger: true + - get: gpdb6-sles12-test + - task: ic_gpdb + file: gpdb_src/concourse/tasks/ic_gpdb.yml + image: gpdb6-sles12-test + params: + MAKE_TEST_COMMAND: -k PGOPTIONS='-c optimizer=on' installcheck-world + TEST_OS: sles + CONFIGURE_FLAGS: {{configure_flags}} + +- name: icw_planner_sles12 + plan: + - in_parallel: + steps: + - get: gpdb_src + passed: [compile_gpdb_sles12] + - get: bin_gpdb + passed: [compile_gpdb_sles12] + resource: bin_gpdb_sles12 + trigger: true + - get: gpdb6-sles12-test + - task: ic_gpdb + file: gpdb_src/concourse/tasks/ic_gpdb.yml + image: gpdb6-sles12-test + params: + MAKE_TEST_COMMAND: -k PGOPTIONS='-c optimizer=off' installcheck-world + TEST_OS: sles + CONFIGURE_FLAGS: {{configure_flags}} + - name: gate_icw_end plan: - in_parallel: diff --git a/concourse/pipelines/templates/gpdb-tpl.yml b/concourse/pipelines/templates/gpdb-tpl.yml index 54ddf24b1761..4d8537faddae 100644 --- a/concourse/pipelines/templates/gpdb-tpl.yml +++ b/concourse/pipelines/templates/gpdb-tpl.yml @@ -194,6 +194,9 @@ groups: {% if "ubuntu18.04" in os_types %} - compile_gpdb_ubuntu18.04 {% endif %} +{% if "sles12" in os_types %} + - compile_gpdb_sles12 +{% endif %} {% if "win" in os_types %} - compile_gpdb_clients_windows - test_gpdb_clients_windows @@ -216,6 +219,10 @@ groups: {% if "ubuntu18.04" in os_types %} - icw_gporca_ubuntu18.04 - icw_planner_ubuntu18.04 +{% endif %} +{% if "sles12" in os_types %} + - icw_gporca_sles12 + - icw_planner_sles12 {% endif %} - gate_icw_end {% endif %} @@ -295,6 +302,9 @@ groups: {% if "ubuntu18.04" in os_types %} - compile_gpdb_ubuntu18.04 {% endif %} +{% if "sles12" in os_types %} + - compile_gpdb_sles12 +{% endif %} {% if "win" in os_types %} - compile_gpdb_clients_windows - test_gpdb_clients_windows @@ -323,6 +333,11 @@ groups: {% if "ubuntu18.04" in os_types %} - icw_gporca_ubuntu18.04 - icw_planner_ubuntu18.04 +{% endif %} +{% if "sles12" in os_types %} + - icw_gporca_sles12 + - icw_planner_sles12 + - compile_gpdb_sles12 {% endif %} - gate_icw_end @@ -569,6 +584,29 @@ resources: json_key: ((concourse-gcs-resources-service-account-key)) regexp: gp-internal-artifacts/ubuntu18.04/libquicklz-dev_(1\.5\.0-.*)-1_amd64.deb {% endif %} +{% if "sles12" in os_types %} +- name: libquicklz-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/libquicklz-(1\.5\.0-.*)-1.x86_64.rpm + +- name: libquicklz-devel-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/libquicklz-devel-(1\.5\.0-.*)-1.x86_64.rpm + +- name: libsigar-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/sigar-sles12_x86_64-(1\.6\.5-.*).targz + +{% endif %} {% if "centos6" in os_types %} - name: python-centos6 type: gcs @@ -595,6 +633,15 @@ resources: json_key: ((concourse-gcs-resources-service-account-key)) regexp: gp-internal-artifacts/ubuntu18.04/python-(2\.7\.12-.*).tar.gz +{% endif %} +{% if "sles12" in os_types %} +- name: python-sles12 + type: gcs + source: + bucket: ((gcs-bucket)) + json_key: ((concourse-gcs-resources-service-account-key)) + regexp: gp-internal-artifacts/sles12/python-(2\.7\.12-.*).tar.gz + {% endif %} {% if "centos6" in os_types %} - name: gpdb6-centos6-build @@ -638,6 +685,26 @@ resources: tag: latest {% endif %} + +- name: gpdb6-sles12-build + type: docker-image + source: + repository: pivotaldata/gpdb6-sles12-build + tag: latest + username: ((docker_username)) + password: ((docker_password)) + +{% if "sles12" in os_types %} +- name: gpdb6-sles12-test + type: docker-image + source: + repository: pivotaldata/gpdb6-sles12-test + tag: latest + username: ((docker_username)) + password: ((docker_password)) + +{% endif %} + {% if "centos6" in os_types %} - name: bin_gpdb_centos6 type: gcs @@ -774,6 +841,23 @@ resources: regexp: server/published/gpdb6/server-build-(.*)-ubuntu18.04_x86_64((rc-build-type-gcs)).tar.gz {% endif %} +{% endif %} + +{% if "sles12" in os_types %} +- name: bin_gpdb_sles12 + type: gcs + source: + bucket: ((gcs-bucket-intermediates)) + json_key: ((concourse-gcs-resources-service-account-key)) + versioned_file: ((pipeline-name))/bin_gpdb_sles12/bin_gpdb.tar.gz + +- name: bin_gpdb_clients_sles12 + type: gcs + source: + bucket: ((gcs-bucket-intermediates)) + json_key: ((concourse-gcs-resources-service-account-key)) + versioned_file: ((pipeline-name))/bin_gpdb_clients_sles12/bin_gpdb_clients.tar.gz + {% endif %} {% if "win" in os_types %} - name: terraform_windows @@ -1055,6 +1139,43 @@ jobs: file: gpdb_artifacts/gpdb-clients-ubuntu18.04.tar.gz {% endif %} + +{% if "sles12" in os_types %} +- name: compile_gpdb_sles12 + plan: + - in_parallel: + steps: + - get: reduced-frequency-trigger + trigger: ((reduced-frequency-trigger-flag)) + - get: gpdb_src + trigger: ((gpdb_src-trigger-flag)) + - get: gpdb6-sles12-build + - get: libquicklz-installer + resource: libquicklz-sles12 + - get: libquicklz-devel-installer + resource: libquicklz-devel-sles12 + - get: libsigar-installer + resource: libsigar-sles12 + - get: python-tarball + resource: python-sles12 + - task: compile_gpdb + image: gpdb6-sles12-build + file: gpdb_src/concourse/tasks/compile_gpdb.yml + params: + CONFIGURE_FLAGS: {{configure_flags_with_extensions}} + TARGET_OS: sles + TARGET_OS_VERSION: "12" + BLD_TARGETS: "clients" + RC_BUILD_TYPE_GCS: ((rc-build-type-gcs)) + - in_parallel: + steps: + - put: bin_gpdb_sles12 + params: + file: gpdb_artifacts/bin_gpdb.tar.gz + - put: bin_gpdb_clients_sles12 + params: + file: gpdb_artifacts/gpdb-clients-sles12.tar.gz +{% endif %} {% if "win" in os_types %} - name: compile_gpdb_clients_windows serial: true @@ -1270,6 +1391,46 @@ jobs: TEST_OS: ubuntu CONFIGURE_FLAGS: {{configure_flags}} +{% endif %} +{% if "sles12" in os_types %} +- name: icw_gporca_sles12 + plan: + - in_parallel: + steps: + - get: gpdb_src + passed: [compile_gpdb_sles12] + - get: bin_gpdb + resource: bin_gpdb_sles12 + passed: [compile_gpdb_sles12] + trigger: [[ test_trigger ]] + - get: gpdb6-sles12-test + - task: ic_gpdb + file: gpdb_src/concourse/tasks/ic_gpdb.yml + image: gpdb6-sles12-test + params: + MAKE_TEST_COMMAND: -k PGOPTIONS='-c optimizer=on' installcheck-world + TEST_OS: sles + CONFIGURE_FLAGS: {{configure_flags}} + +- name: icw_planner_sles12 + plan: + - in_parallel: + steps: + - get: gpdb_src + passed: [compile_gpdb_sles12] + - get: bin_gpdb + passed: [compile_gpdb_sles12] + resource: bin_gpdb_sles12 + trigger: [[ test_trigger ]] + - get: gpdb6-sles12-test + - task: ic_gpdb + file: gpdb_src/concourse/tasks/ic_gpdb.yml + image: gpdb6-sles12-test + params: + MAKE_TEST_COMMAND: -k PGOPTIONS='-c optimizer=off' installcheck-world + TEST_OS: sles + CONFIGURE_FLAGS: {{configure_flags}} + {% endif %} - name: gate_icw_end plan: diff --git a/concourse/scripts/compile_gpdb.bash b/concourse/scripts/compile_gpdb.bash index 3f5bca671ca3..b80bf25f6993 100755 --- a/concourse/scripts/compile_gpdb.bash +++ b/concourse/scripts/compile_gpdb.bash @@ -36,10 +36,19 @@ function prep_env() { ;; esac ;; + sles) + case "${TARGET_OS_VERSION}" in + 12) export BLD_ARCH=sles12_x86_64 ;; + *) + echo "TARGET_OS_VERSION not set or recognized for SLES" + exit 1 + ;; + esac + ;; esac } -function install_deps_for_centos() { +function install_deps_for_centos_or_sles() { rpm -i libquicklz-installer/libquicklz-*.rpm rpm -i libquicklz-devel-installer/libquicklz-*.rpm # install libsigar from tar.gz @@ -52,7 +61,7 @@ function install_deps_for_ubuntu() { function install_deps() { case "${TARGET_OS}" in - centos) install_deps_for_centos;; + centos | sles) install_deps_for_centos_or_sles;; ubuntu) install_deps_for_ubuntu;; esac } @@ -115,7 +124,7 @@ function unittest_check_gpdb() { function include_zstd() { local libdir case "${TARGET_OS}" in - centos) libdir=/usr/lib64 ;; + centos | sles) libdir=/usr/lib64 ;; ubuntu) libdir=/usr/lib ;; *) return ;; esac @@ -129,7 +138,7 @@ function include_zstd() { function include_quicklz() { local libdir case "${TARGET_OS}" in - centos) libdir=/usr/lib64 ;; + centos | sles) libdir=/usr/lib64 ;; ubuntu) libdir=/usr/local/lib ;; *) return ;; esac @@ -231,7 +240,7 @@ function _main() { mkdir gpdb_src/gpAux/ext case "${TARGET_OS}" in - centos|ubuntu) + centos|ubuntu|sles) prep_env fetch_orca_src "${ORCA_TAG}" build_xerces @@ -244,7 +253,7 @@ function _main() { CONFIGURE_FLAGS="${CONFIGURE_FLAGS} --disable-pxf" ;; *) - echo "only centos, ubuntu, and win32 are supported TARGET_OS'es" + echo "only centos, ubuntu, sles and win32 are supported TARGET_OS'es" false ;; esac diff --git a/concourse/scripts/ic_gpdb.bash b/concourse/scripts/ic_gpdb.bash index fe9d91019080..39441081d15d 100755 --- a/concourse/scripts/ic_gpdb.bash +++ b/concourse/scripts/ic_gpdb.bash @@ -56,7 +56,7 @@ function _main() { fi case "${TEST_OS}" in - centos|ubuntu) ;; #Valid + centos|ubuntu|sles) ;; #Valid *) echo "FATAL: TEST_OS is set to an invalid value: $TEST_OS" echo "Configure TEST_OS to be centos, or ubuntu" diff --git a/concourse/scripts/setup_gpadmin_user.bash b/concourse/scripts/setup_gpadmin_user.bash index fda418205979..38b9046c9b74 100755 --- a/concourse/scripts/setup_gpadmin_user.bash +++ b/concourse/scripts/setup_gpadmin_user.bash @@ -66,6 +66,10 @@ setup_gpadmin_user() { ubuntu) /usr/sbin/useradd -G supergroup,tty gpadmin -s /bin/bash ;; + sles) + # create a default group gpadmin, and add user gpadmin to group gapdmin, supergroup, tty + /usr/sbin/useradd -U -G supergroup,tty gpadmin + ;; *) echo "Unknown OS: $TEST_OS"; exit 1 ;; esac echo -e "password\npassword" | passwd gpadmin @@ -112,16 +116,19 @@ determine_os() { echo "ubuntu" return fi + if grep -q 'ID="sles"' /etc/os-release ; then + echo "sles" + return + fi echo "Could not determine operating system type" >/dev/stderr exit 1 } -# This might no longer be necessary, as the centos7 base image has been updated -# with ping's setcap set properly, although it would need to be verified to work -# for other OSs used by Concourse. -# https://github.com/Pivotal-DataFabric/toolsmiths-images/pull/27 +# Set the "Set-User-ID" bit of ping, or else gpinitsystem will error by following message: +# [FATAL]:-Unknown host d6f9f630-65a3-4c98-4c03-401fbe5dd60b: ping: socket: Operation not permitted +# This is needed in centos7, sles12sp5, but not for centos6, ubuntu18.04 workaround_before_concourse_stops_stripping_suid_bits() { - chmod u+s /bin/ping + chmod u+s $(which ping) } _main() { diff --git a/gpAux/Makefile b/gpAux/Makefile index a93bb093b321..3d46812eedf5 100644 --- a/gpAux/Makefile +++ b/gpAux/Makefile @@ -132,8 +132,8 @@ aix7_ppc_64_CONFIGFLAGS=--disable-gpcloud --without-readline --without-libcurl - rhel6_x86_64_CONFIGFLAGS=--with-quicklz --enable-gpperfmon --with-gssapi --enable-mapreduce --enable-orafce ${ORCA_CONFIG} --with-libxml $(APU_CONFIG) rhel7_x86_64_CONFIGFLAGS=--with-quicklz --enable-gpperfmon --with-gssapi --enable-mapreduce --enable-orafce ${ORCA_CONFIG} --with-libxml $(APU_CONFIG) linux_x86_64_CONFIGFLAGS=${ORCA_CONFIG} --with-libxml $(APR_CONFIG) -ubuntu18.04_x86_64_CONFIGFLAGS=--with-quicklz --enable-gpperfmon --with-gssapi --enable-mapreduce --enable-orafce ${ORCA_CONFIG} --with-libxml $(APU_CONFIG) - +ubuntu18.04_x86_64_CONFIGFLAGS=--with-quicklz --with-gssapi --enable-mapreduce --enable-orafce ${ORCA_CONFIG} --with-libxml +sles12_x86_64_CONFIGFLAGS=--with-quicklz --with-gssapi --enable-mapreduce --enable-orafce ${ORCA_CONFIG} --with-libxml BLD_CONFIGFLAGS=$($(BLD_ARCH)_CONFIGFLAGS) CONFIGFLAGS=$(strip $(BLD_CONFIGFLAGS) --with-pgport=$(DEFPORT) $(BLD_DEPLOYMENT_SETTING)) @@ -165,7 +165,7 @@ CONFIGFLAGS+= --with-includes="$(CONFIG_INCLUDES)" --with-libraries="$(CONFIG_LI # Configure in "authlibs"... # ...without an ext/ dir copy of curl-config on Tiger or AIX -ifeq "$(findstring $(BLD_ARCH),aix7_ppc_64 rhel6_x86_64 rhel7_x86_64 ubuntu18.04_x86_64)" "" +ifeq "$(findstring $(BLD_ARCH),aix7_ppc_64 rhel6_x86_64 rhel7_x86_64 ubuntu18.04_x86_64 sles12_x86_64)" "" BLD_CURL_CONFIG=CURL_CONFIG=$(BLD_THIRDPARTY_BIN_DIR)/curl-config endif # ...and do not include the authlibs on Windows or AIX diff --git a/gpAux/Makefile.global b/gpAux/Makefile.global index ba2014ebf780..a4a56cac1a01 100644 --- a/gpAux/Makefile.global +++ b/gpAux/Makefile.global @@ -60,6 +60,7 @@ aix7_ppc_64_PYTHONHOME=/opt/freeware rhel6_x86_64_PYTHONHOME=$(BLD_THIRDPARTY_DIR)/python-2.7.12 rhel7_x86_64_PYTHONHOME=$(BLD_THIRDPARTY_DIR)/python-2.7.12 ubuntu18.04_x86_64_PYTHONHOME=$(BLD_THIRDPARTY_DIR)/python-2.7.12 +sles12_x86_64_PYTHONHOME=$(BLD_THIRDPARTY_DIR)/python-2.7.12 ifneq "$($(BLD_ARCH)_PYTHONHOME)" "" export PYTHONHOME=$($(BLD_ARCH)_PYTHONHOME) From e680ba8a04de21b3faa8d6bc4d8b490f07b9690f Mon Sep 17 00:00:00 2001 From: "Bradford D. Boyle" Date: Mon, 24 Feb 2020 12:01:58 -0800 Subject: [PATCH 072/102] Increase max ELF program sections processed for core files To fix packcore test failure in sles12 Co-authored-by: Bradford D. Boyle Co-authored-by: Shaoqi Bai --- gpMgmt/sbin/packcore | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/gpMgmt/sbin/packcore b/gpMgmt/sbin/packcore index 43320cf5d2ed..a6fdae4b035d 100755 --- a/gpMgmt/sbin/packcore +++ b/gpMgmt/sbin/packcore @@ -10,7 +10,7 @@ import shutil import stat import sys from optparse import OptionParser -from subprocess import Popen, PIPE +from subprocess import Popen, PIPE, STDOUT def _getPlatformInfo(): @@ -28,7 +28,17 @@ def _getPlatformInfo(): def _getFileInfo(coreFile): - cmd = Popen(['/usr/bin/file', coreFile], stdout=PIPE) + cmd = Popen(['/usr/bin/file', '--version'], stdout=PIPE, stderr=STDOUT) + fileVersion = cmd.communicate()[0].split()[0].strip() + # file allow setting parameters from command line from version 5.21, refer: + # https://github.com/file/file/commit/6ce24f35cd4a43c4bdd249e8e0c4952c1f8eac67 + # Set ELF program sections processed for core files to suppres "too many + # program headers" output + opts = ['/usr/bin/file'] + if fileVersion >= 'file-5.21': + opts += ['-P', 'elf_phnum=2048'] + opts += [coreFile] + cmd = Popen(opts, stdout=PIPE) return cmd.communicate()[0] From 485e23c11e58a6c7df4180012a7d57791f236c05 Mon Sep 17 00:00:00 2001 From: "Bradford D. Boyle" Date: Mon, 2 Mar 2020 11:21:16 -0800 Subject: [PATCH 073/102] Unset LD_LIBRARY_PATH before invoking `runGdb.sh` from packcore file [#168654694] Authored-by: Bradford D. Boyle --- src/test/isolation2/expected/packcore.out | 2 +- src/test/isolation2/sql/packcore.sql | 7 +++++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/src/test/isolation2/expected/packcore.out b/src/test/isolation2/expected/packcore.out index e8969d47a2d4..1b50a22aed1a 100644 --- a/src/test/isolation2/expected/packcore.out +++ b/src/test/isolation2/expected/packcore.out @@ -10,7 +10,7 @@ def check_call(cmds): ret = subprocess.Popen(cmds, stdout=subprocess.PIPE, stder # generate the tarball, the packcore command should return 0 check_call(cmds) assert os.path.isfile(tarball) # extract the tarball check_call(['tar', '-zxf', tarball]) assert os.path.isdir(dirname) # verify that binary and shared libraries are included assert os.path.exists('{}/postgres'.format(dirname)) assert os.path.exists('{}/lib64/ld-linux-x86-64.so.2'.format(dirname)) -if os.path.exists('/usr/bin/gdb'): # load the coredump and run some simple gdb commands os.chdir(dirname) check_call(['./runGDB.sh', '--batch', '--nx', '--eval-command=bt', '--eval-command=p main', '--eval-command=p fork']) os.chdir('..') +if os.path.exists('/usr/bin/gdb'): # load the coredump and run some simple gdb commands os.chdir(dirname) # remove LD_LIBRARY_PATH before invoking gdb ld_library_path = None if 'LD_LIBRARY_PATH' in os.environ: ld_library_path = os.environ.pop('LD_LIBRARY_PATH') check_call(['./runGDB.sh', '--batch', '--nx', '--eval-command=bt', '--eval-command=p main', '--eval-command=p fork']) # restore LD_LIBRARY_PATH to its previous value if ld_library_path is not None: os.environ['LD_LIBRARY_PATH'] = ld_library_path os.chdir('..') # gzip runs much faster with -1 os.putenv('GZIP', '-1') # do not put the packcore results under master data, that will cause # failures in other tests os.chdir('/tmp') gphome = os.getenv('GPHOME') assert gphome diff --git a/src/test/isolation2/sql/packcore.sql b/src/test/isolation2/sql/packcore.sql index 7d3c0b681f1d..3e124f039d70 100644 --- a/src/test/isolation2/sql/packcore.sql +++ b/src/test/isolation2/sql/packcore.sql @@ -53,12 +53,19 @@ stderr: {stderr} if os.path.exists('/usr/bin/gdb'): # load the coredump and run some simple gdb commands os.chdir(dirname) + # remove LD_LIBRARY_PATH before invoking gdb + ld_library_path = None + if 'LD_LIBRARY_PATH' in os.environ: + ld_library_path = os.environ.pop('LD_LIBRARY_PATH') check_call(['./runGDB.sh', '--batch', '--nx', '--eval-command=bt', '--eval-command=p main', '--eval-command=p fork']) + # restore LD_LIBRARY_PATH to its previous value + if ld_library_path is not None: + os.environ['LD_LIBRARY_PATH'] = ld_library_path os.chdir('..') # gzip runs much faster with -1 From d5e169e5a3f8304698efd05eaa3a21ea80b705cb Mon Sep 17 00:00:00 2001 From: Shaoqi Bai Date: Thu, 5 Mar 2020 15:53:27 +0800 Subject: [PATCH 074/102] Fix unittest failure in syslogger_test.c The syslogger_test.c is failing by message: make[3]: *** [syslogger-check] Segmentation fault The root cause is not analyzed The use of time() mock seems pointless. We just remove it. Co-authored-by: Asim R P Co-authored-by: Daniel Gustafsson Co-authored-by: Shaoqi Bai --- src/backend/postmaster/test/syslogger_test.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/src/backend/postmaster/test/syslogger_test.c b/src/backend/postmaster/test/syslogger_test.c index efb1ff91390d..80a1e1753018 100644 --- a/src/backend/postmaster/test/syslogger_test.c +++ b/src/backend/postmaster/test/syslogger_test.c @@ -8,13 +8,6 @@ #include "../syslogger.c" -time_t -time(time_t *unused) -{ - return (time_t)mock(); -} - - static void test__open_alert_log_file__NonGucOpen(void **state) { @@ -37,11 +30,10 @@ test__logfile_getname(void **state) char *alert_file_name; alert_file_pattern = "alert_log"; - will_return(time, 12345); log_timezone = pg_tzset("GMT"); - alert_file_name = logfile_getname(time(NULL), NULL, "gpperfmon/logs", "alert_log-%F"); + alert_file_name = logfile_getname((pg_time_t) 12345, NULL, "gpperfmon/logs", "alert_log-%F"); assert_true(strcmp(alert_file_name, "gpperfmon/logs/alert_log-1970-01-01") == 0); } From c74341463335fcb12d1eb3f2cb04094a1efc1ea7 Mon Sep 17 00:00:00 2001 From: Zhenghua Lyu Date: Thu, 27 Feb 2020 14:46:27 +0800 Subject: [PATCH 075/102] Refactor and correct prefetch joinqual We introduces prefetch joinqual logic in commit fa762b and later refactoed it in commit 36a93ba. But the logic before does not handle things correct. Previously we finally check this until ExecInitXXX and forgot to check if there is joinqual. This commit fixes the issue for planner. It determines the bool field `prefetch_joinqual` during planning stage. The logic is first set the field to show the risk in create_xxxjoin_plan, and then in create_join_plan after we get the plan, if there is risk then we loop each joinqual to see if there is motion in it (The only case is the joinqual is a SubPlan). In 6X we add motions for SubPlan after we have gotten the plan. So at the time of this function calling, we do not know if the SubPlan contains motion. We might walk the plannedstmt to accurately determine this, but that makes the code not that clean. So, we adopt an over kill method here: any SubPlan contains motion. At least, this can avoid motion deadlock. Also this commit improves the logic of ExecPrefetchJoinQual to solve the github issue: https://github.com/greenplum-db/gpdb/issues/8677 It fixes this by flattening the bool expr of the joinqual and force each of them to be Prefetched. --- src/backend/executor/execUtils.c | 88 ++++++++++----- src/backend/executor/nodeHashjoin.c | 14 ++- src/backend/executor/nodeMergejoin.c | 14 ++- src/backend/executor/nodeNestloop.c | 17 ++- src/backend/optimizer/plan/createplan.c | 102 +++++++++++++++--- src/backend/utils/misc/guc_gp.c | 12 +++ src/include/executor/executor.h | 3 +- src/include/utils/sync_guc_name.h | 1 + src/test/regress/expected/deadlock2.out | 35 ++++++ src/test/regress/expected/join_gp.out | 68 ++++++++++++ .../regress/expected/join_gp_optimizer.out | 68 ++++++++++++ src/test/regress/sql/deadlock2.sql | 15 +++ src/test/regress/sql/join_gp.sql | 29 +++++ 13 files changed, 413 insertions(+), 53 deletions(-) diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index 9ccd320f216f..e19d9db1a8b5 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -87,6 +87,7 @@ static bool index_recheck_constraint(Relation index, Oid *constr_procs, Datum *existing_values, bool *existing_isnull, Datum *new_values); static void ShutdownExprContext(ExprContext *econtext, bool isCommit); +static List *flatten_logic_exprs(Node *node); /* ---------------------------------------------------------------- @@ -1682,6 +1683,49 @@ ExecGetShareNodeEntry(EState* estate, int shareidx, bool fCreate) return (ShareNodeEntry *) list_nth(*estate->es_sharenode, shareidx); } +/* + * flatten_logic_exprs + * This function is only used by ExecPrefetchJoinQual. + * ExecPrefetchJoinQual need to prefetch subplan in join + * qual that contains motion to materialize it to avoid + * motion deadlock. This function is going to flatten + * the bool exprs to avoid shortcut of bool logic. + * An example is: + * (a and b or c) or (d or e and f or g) and (h and i or j) + * will be transformed to + * (a, b, c, d, e, f, g, h, i, j). + */ +static List * +flatten_logic_exprs(Node *node) +{ + if (node == NULL) + return NIL; + + if (IsA(node, BoolExprState)) + { + BoolExprState *be = (BoolExprState *) node; + return flatten_logic_exprs((Node *) (be->args)); + } + + if (IsA(node, List)) + { + List *es = (List *) node; + List *result = NIL; + ListCell *lc = NULL; + + foreach(lc, es) + { + Node *n = (Node *) lfirst(lc); + result = list_concat(result, + flatten_logic_exprs(n)); + } + + return result; + } + + return list_make1(node); +} + /* * Prefetch JoinQual to prevent motion hazard. * @@ -1709,7 +1753,7 @@ ExecGetShareNodeEntry(EState* estate, int shareidx, bool fCreate) * * Return true if the JoinQual is prefetched. */ -bool +void ExecPrefetchJoinQual(JoinState *node) { EState *estate = node->ps.state; @@ -1719,8 +1763,10 @@ ExecPrefetchJoinQual(JoinState *node) List *joinqual = node->joinqual; TupleTableSlot *innertuple = econtext->ecxt_innertuple; - if (!joinqual) - return false; + ListCell *lc = NULL; + List *quals = NIL; + + Assert(joinqual); /* Outer tuples should not be fetched before us */ Assert(econtext->ecxt_outertuple == NULL); @@ -1731,36 +1777,22 @@ ExecPrefetchJoinQual(JoinState *node) econtext->ecxt_outertuple = ExecInitNullTupleSlot(estate, ExecGetResultType(outer)); + quals = flatten_logic_exprs((Node *) joinqual); + /* Fetch subplan with the fake inner & outer tuples */ - ExecQual(joinqual, econtext, false); + foreach(lc, quals) + { + /* + * Force every joinqual is prefech because + * our target is to materialize motion node. + */ + ExprState *clause = (ExprState *) lfirst(lc); + (void) ExecQual(list_make1(clause), econtext, false); + } /* Restore previous state */ econtext->ecxt_innertuple = innertuple; econtext->ecxt_outertuple = NULL; - - return true; -} - -/* - * Decide if should prefetch joinqual. - * - * Joinqual should be prefetched when both outer and joinqual contain motions. - * In create_*join_plan() functions we set prefetch_joinqual according to the - * outer motions, now we detect for joinqual motions to make the final - * decision. - * - * See ExecPrefetchJoinQual() for details. - * - * This function should be called in ExecInit*Join() functions. - * - * Return true if JoinQual should be prefetched. - */ -bool -ShouldPrefetchJoinQual(EState *estate, Join *join) -{ - return (join->prefetch_joinqual && - findSenderMotion(estate->es_plannedstmt, - estate->currentSliceIdInPlan)); } /* ---------------------------------------------------------------- diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index 1c9b6cc7c6a9..59db6ac563c7 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -45,6 +45,8 @@ /* Returns true if doing null-fill on inner relation */ #define HJ_FILL_INNER(hjstate) ((hjstate)->hj_NullOuterTupleSlot != NULL) +extern bool Test_print_prefetch_joinqual; + static TupleTableSlot *ExecHashJoinOuterGetTuple(PlanState *outerNode, HashJoinState *hjstate, uint32 *hashvalue); @@ -239,8 +241,11 @@ ExecHashJoin_guts(HashJoinState *node) * * See ExecPrefetchJoinQual() for details. */ - if (node->prefetch_joinqual && ExecPrefetchJoinQual(&node->js)) + if (node->prefetch_joinqual) + { + ExecPrefetchJoinQual(&node->js); node->prefetch_joinqual = false; + } /* * We just scanned the entire inner side and built the hashtable @@ -600,7 +605,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags) * the fix to MPP-989) */ hjstate->prefetch_inner = node->join.prefetch_inner; - hjstate->prefetch_joinqual = ShouldPrefetchJoinQual(estate, &node->join); + hjstate->prefetch_joinqual = node->join.prefetch_joinqual; + + if (Test_print_prefetch_joinqual && hjstate->prefetch_joinqual) + elog(NOTICE, + "prefetch join qual in slice %d of plannode %d", + currentSliceId, ((Plan *) node)->plan_node_id); /* * initialize child nodes diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index c34079124675..88ce1fc27192 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -160,6 +160,7 @@ typedef enum #define MarkInnerTuple(innerTupleSlot, mergestate) \ ExecCopySlot((mergestate)->mj_MarkedTupleSlot, (innerTupleSlot)) +extern bool Test_print_prefetch_joinqual; /* * MJExamineQuals @@ -685,8 +686,11 @@ ExecMergeJoin_guts(MergeJoinState *node) * * See ExecPrefetchJoinQual() for details. */ - if (node->prefetch_joinqual && ExecPrefetchJoinQual(&node->js)) + if (node->prefetch_joinqual) + { + ExecPrefetchJoinQual(&node->js); node->prefetch_joinqual = false; + } /* * ok, everything is setup.. let's go to work @@ -1572,7 +1576,13 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags) mergestate->mj_ConstFalseJoin = false; mergestate->prefetch_inner = node->join.prefetch_inner; - mergestate->prefetch_joinqual = ShouldPrefetchJoinQual(estate, &node->join); + mergestate->prefetch_joinqual = node->join.prefetch_joinqual; + + if (Test_print_prefetch_joinqual && mergestate->prefetch_joinqual) + elog(NOTICE, + "prefetch join qual in slice %d of plannode %d", + currentSliceId, ((Plan *) node)->plan_node_id); + /* Prepare inner operators for rewind after the prefetch */ rewindflag = mergestate->prefetch_inner ? EXEC_FLAG_REWIND : 0; diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c index ca697bd866ed..38f245670eb5 100644 --- a/src/backend/executor/nodeNestloop.c +++ b/src/backend/executor/nodeNestloop.c @@ -23,12 +23,15 @@ #include "postgres.h" +#include "cdb/cdbvars.h" #include "executor/execdebug.h" #include "executor/nodeNestloop.h" #include "optimizer/clauses.h" #include "utils/lsyscache.h" #include "utils/memutils.h" +extern bool Test_print_prefetch_joinqual; + static void splitJoinQualExpr(NestLoopState *nlstate); static void extractFuncExprArgs(FuncExprState *fstate, List **lclauses, List **rclauses); @@ -147,8 +150,11 @@ ExecNestLoop_guts(NestLoopState *node) * * See ExecPrefetchJoinQual() for details. */ - if (node->prefetch_joinqual && ExecPrefetchJoinQual(&node->js)) + if (node->prefetch_joinqual) + { + ExecPrefetchJoinQual(&node->js); node->prefetch_joinqual = false; + } /* * Ok, everything is setup for the join so now loop until we return a @@ -390,8 +396,13 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags) nlstate->shared_outer = node->shared_outer; nlstate->prefetch_inner = node->join.prefetch_inner; - nlstate->prefetch_joinqual = ShouldPrefetchJoinQual(estate, &node->join); - + nlstate->prefetch_joinqual = node->join.prefetch_joinqual; + + if (Test_print_prefetch_joinqual && nlstate->prefetch_joinqual) + elog(NOTICE, + "prefetch join qual in slice %d of plannode %d", + currentSliceId, ((Plan *) node)->plan_node_id); + /*CDB-OLAP*/ nlstate->reset_inner = false; nlstate->require_inner_reset = !node->singleton_outer; diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index cc9a4eb9e063..25fdbe25b12a 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -64,6 +64,14 @@ #include "cdb/cdbsreh.h" #include "cdb/cdbvars.h" +typedef struct +{ + plan_tree_base_prefix base; /* Required prefix for + * plan_tree_walker/mutator */ + Bitmapset *seen_subplans; + bool result; +} contain_motion_walk_context; + static Plan *create_subplan(PlannerInfo *root, Path *best_path); /* CDB */ static Plan *create_scan_plan(PlannerInfo *root, Path *best_path); static List *build_path_tlist(PlannerInfo *root, Path *path); @@ -188,6 +196,8 @@ static Motion *cdbpathtoplan_create_motion_plan(PlannerInfo *root, CdbMotionPath *path, Plan *subplan); static void append_initplan_for_function_scan(PlannerInfo *root, Path *best_path, Plan *plan); +static bool contain_motion(PlannerInfo *root, Node *node); +static bool contain_motion_walk(Node *node, contain_motion_walk_context *ctx); /* * GPDB_92_MERGE_FIXME: The following functions have been removed in PG 9.2 @@ -790,18 +800,6 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path) if (partition_selector_created) ((Join *) plan)->prefetch_inner = true; - /* - * A motion deadlock can also happen when outer and joinqual both contain - * motions. It is not easy to check for joinqual here, so we set the - * prefetch_joinqual mark only according to outer motion, and check for - * joinqual later in the executor. - * - * See ExecPrefetchJoinQual() for details. - */ - if (best_path->outerjoinpath && - best_path->outerjoinpath->motionHazard) - ((Join *) plan)->prefetch_joinqual = true; - /* CDB: if the join's locus is bottleneck which means the * join gang only contains one process, so there is no * risk for motion deadlock. @@ -827,6 +825,20 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path) plan->targetlist = add_to_flat_tlist_junk(plan->targetlist, plan->flow->hashExprs, true /* resjunk */ ); } + /* + * We may set prefetch_joinqual to true if there is + * potential risk when create_xxxjoin_plan. Here, we + * have all the information at hand, this is the final + * logic to set prefetch_joinqual. + */ + if (((Join *) plan)->prefetch_joinqual) + { + List *joinqual = ((Join *) plan)->joinqual; + + ((Join *) plan)->prefetch_joinqual = contain_motion(root, + (Node *) joinqual); + } + /* * If there are any pseudoconstant clauses attached to this node, insert a * gating Result node that evaluates the pseudoconstants as one-time @@ -3330,7 +3342,8 @@ create_nestloop_plan(PlannerInfo *root, * See ExecPrefetchJoinQual() for details. */ if (best_path->outerjoinpath && - best_path->outerjoinpath->motionHazard) + best_path->outerjoinpath->motionHazard && + join_plan->join.joinqual != NIL) join_plan->join.prefetch_joinqual = true; return join_plan; @@ -3667,14 +3680,16 @@ create_mergejoin_plan(PlannerInfo *root, * See ExecPrefetchJoinQual() for details. */ if (best_path->jpath.outerjoinpath && - best_path->jpath.outerjoinpath->motionHazard) + best_path->jpath.outerjoinpath->motionHazard && + join_plan->join.joinqual != NIL) join_plan->join.prefetch_joinqual = true; /* * If inner motion is not under a Material or Sort node then there could * also be motion deadlock between inner and joinqual in mergejoin. */ if (best_path->jpath.innerjoinpath && - best_path->jpath.innerjoinpath->motionHazard) + best_path->jpath.innerjoinpath->motionHazard && + join_plan->join.joinqual != NIL) join_plan->join.prefetch_joinqual = true; /* Costs of sort and material steps are included in path cost already */ @@ -3840,7 +3855,8 @@ create_hashjoin_plan(PlannerInfo *root, * See ExecPrefetchJoinQual() for details. */ if (best_path->jpath.outerjoinpath && - best_path->jpath.outerjoinpath->motionHazard) + best_path->jpath.outerjoinpath->motionHazard && + join_plan->join.joinqual != NIL) join_plan->join.prefetch_joinqual = true; copy_path_costsize(root, &join_plan->join.plan, &best_path->jpath.path); @@ -7321,3 +7337,57 @@ append_initplan_for_function_scan(PlannerInfo *root, Path *best_path, Plan *plan : NULL, &initplan->scan.plan); } + +/* + * contain_motion + * This function walks the joinqual list to see there is + * any motion node in it. The only case a qual contains motion + * is that it is a SubPlan and the SubPlan contains motion. + * XXX: an over kill method is used here, for details please + * refer to the following function `contain_motion_walk`. + */ +static bool +contain_motion(PlannerInfo *root, Node *node) +{ + contain_motion_walk_context ctx; + planner_init_plan_tree_base(&ctx.base, root); + ctx.result = false; + ctx.seen_subplans = NULL; + + (void) contain_motion_walk(node, &ctx); + + return ctx.result; +} + +static bool +contain_motion_walk(Node *node, contain_motion_walk_context *ctx) +{ + if (ctx->result) + return true; + + if (node == NULL) + return false; + + if (IsA(node, SubPlan)) + { + /* + * In 6X we add motions for SubPlan after we have gotten + * the plan. So at the time of this function calling, + * we do not know if the SubPlan contains motion. We might + * walk the plannedstmt to accurately determine this, but + * that makes the code not that clean. So, we adopt an over + * kill method here: any SubPlan contains motion. At least, + * this can avoid motion deadlock. + */ + ctx->result = true; + return true; + } + + if (IsA(node, Motion)) + { + ctx->result = true; + return true; + } + + return plan_tree_walker((Node *) node, contain_motion_walk, ctx); +} diff --git a/src/backend/utils/misc/guc_gp.c b/src/backend/utils/misc/guc_gp.c index 0d7c28737567..8c5a46c3b5f0 100644 --- a/src/backend/utils/misc/guc_gp.c +++ b/src/backend/utils/misc/guc_gp.c @@ -132,6 +132,7 @@ bool Debug_appendonly_print_compaction = false; bool Debug_resource_group = false; bool Debug_bitmap_print_insert = false; bool Test_print_direct_dispatch_info = false; +bool Test_print_prefetch_joinqual = false; bool Test_copy_qd_qe_split = false; bool gp_permit_relation_node_change = false; int gp_max_local_distributed_cache = 1024; @@ -1547,6 +1548,17 @@ struct config_bool ConfigureNamesBool_gp[] = NULL, NULL, NULL }, + { + {"test_print_prefetch_joinqual", PGC_SUSET, DEVELOPER_OPTIONS, + gettext_noop("For testing purposes, print information about if we prefetch join qual."), + NULL, + GUC_SUPERUSER_ONLY | GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE + }, + &Test_print_prefetch_joinqual, + false, + NULL, NULL, NULL + }, + { {"test_copy_qd_qe_split", PGC_SUSET, DEVELOPER_OPTIONS, gettext_noop("For testing purposes, print information about which columns are parsed in QD and which in QE."), diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index a2b4308f341c..1d14ba271fe5 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -470,8 +470,7 @@ extern void UnregisterExprContextCallback(ExprContext *econtext, /* Share input utilities defined in execUtils.c */ extern ShareNodeEntry * ExecGetShareNodeEntry(EState *estate, int shareid, bool fCreate); -extern bool ExecPrefetchJoinQual(JoinState *node); -extern bool ShouldPrefetchJoinQual(EState *estate, Join *join); +extern void ExecPrefetchJoinQual(JoinState *node); /* ResultRelInfo and Append Only segment assignment */ void ResultRelInfoSetSegno(ResultRelInfo *resultRelInfo, List *mapping); diff --git a/src/include/utils/sync_guc_name.h b/src/include/utils/sync_guc_name.h index def32ea01a42..0d4df234c4b9 100644 --- a/src/include/utils/sync_guc_name.h +++ b/src/include/utils/sync_guc_name.h @@ -104,6 +104,7 @@ "statement_timeout", "temp_buffers", "test_copy_qd_qe_split", + "test_print_prefetch_joinqual", "TimeZone", "verify_gpfdists_cert", "vmem_process_interrupt", diff --git a/src/test/regress/expected/deadlock2.out b/src/test/regress/expected/deadlock2.out index 22fff567732d..bd2a63418c09 100644 --- a/src/test/regress/expected/deadlock2.out +++ b/src/test/regress/expected/deadlock2.out @@ -89,8 +89,43 @@ insert into t_subplan select :x0, :x0 from generate_series(1,:scale) i; set enable_hashjoin to on; set enable_mergejoin to off; set enable_nestloop to off; +set Test_print_prefetch_joinqual = on; select count(*) from t_inner right join t_outer on t_inner.c2=t_outer.c2 and not exists (select 0 from t_subplan where t_subplan.c2=t_outer.c1); +NOTICE: prefetch join qual in slice 0 of plannode 4 +NOTICE: prefetch join qual in slice 5 of plannode 4 (seg0 slice5 127.0.1.1:6002 pid=27794) +NOTICE: prefetch join qual in slice 5 of plannode 4 (seg1 slice5 127.0.1.1:6003 pid=27795) +NOTICE: prefetch join qual in slice 5 of plannode 4 (seg2 slice5 127.0.1.1:6004 pid=27796) + count +------- + 10000 +(1 row) + +-- The logic of ExecPrefetchJoinQual is to use two null +-- tuples to fake inner and outertuple and then to ExecQual. +-- It may short cut if some previous qual is test null expr. +-- So ExecPrefetchJoinQual has to force ExecQual for each +-- qual expr in the joinqual list. See the Github issue +-- https://github.com/greenplum-db/gpdb/issues/8677 +-- for details. +select count(*) from t_inner right join t_outer on t_inner.c2=t_outer.c2 + and (t_inner.c1 is null or not exists (select 0 from t_subplan where t_subplan.c2=t_outer.c1)); +NOTICE: prefetch join qual in slice 0 of plannode 4 +NOTICE: prefetch join qual in slice 5 of plannode 4 (seg0 slice5 127.0.1.1:6002 pid=27794) +NOTICE: prefetch join qual in slice 5 of plannode 4 (seg1 slice5 127.0.1.1:6003 pid=27795) +NOTICE: prefetch join qual in slice 5 of plannode 4 (seg2 slice5 127.0.1.1:6004 pid=27796) + count +------- + 10000 +(1 row) + +select count(*) from t_inner right join t_outer on t_inner.c2=t_outer.c2 + and not exists (select 0 from t_subplan where t_subplan.c2=t_outer.c1) + and not exists (select 1 from t_subplan where t_subplan.c2=t_outer.c1); +NOTICE: prefetch join qual in slice 0 of plannode 4 +NOTICE: prefetch join qual in slice 7 of plannode 4 (seg0 slice7 127.0.1.1:6002 pid=27794) +NOTICE: prefetch join qual in slice 7 of plannode 4 (seg1 slice7 127.0.1.1:6003 pid=27795) +NOTICE: prefetch join qual in slice 7 of plannode 4 (seg2 slice7 127.0.1.1:6004 pid=27796) count ------- 10000 diff --git a/src/test/regress/expected/join_gp.out b/src/test/regress/expected/join_gp.out index 8f5a7a1a9d1b..24b60a258598 100644 --- a/src/test/regress/expected/join_gp.out +++ b/src/test/regress/expected/join_gp.out @@ -1049,3 +1049,71 @@ on true; 1 | Sun Jan 02 02:04:00 2000 PST | Sun Jan 02 03:04:00 2000 PST (2 rows) +-- test prefetch join qual +-- we do not handle this correct +-- the only case we need to prefetch join qual is: +-- 1. outer plan contains motion +-- 2. the join qual contains subplan that contains motion +reset client_min_messages; +set Test_print_prefetch_joinqual = true; +-- prefetch join qual is only set correct for planner +set optimizer = off; +create table t1_test_pretch_join_qual(a int, b int, c int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +create table t2_test_pretch_join_qual(a int, b int, c int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +-- the following plan contains redistribute motion in both inner and outer plan +-- the join qual is t1.c > t2.c, it contains no motion, should not prefetch +explain (costs off) select * from t1_test_pretch_join_qual t1 join t2_test_pretch_join_qual t2 +on t1.b = t2.b and t1.c > t2.c; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice3; segments: 3) + -> Hash Join + Hash Cond: (t1.b = t2.b) + Join Filter: (t1.c > t2.c) + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: t1.b + -> Seq Scan on t1_test_pretch_join_qual t1 + -> Hash + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: t2.b + -> Seq Scan on t2_test_pretch_join_qual t2 + Optimizer: Postgres query optimizer +(12 rows) + +create table t3_test_pretch_join_qual(a int, b int, c int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +-- the following plan contains motion in both outer plan and join qual, +-- so we should prefetch join qual +explain (costs off) select * from t1_test_pretch_join_qual t1 join t2_test_pretch_join_qual t2 +on t1.b = t2.b and t1.a > any (select sum(b) from t3_test_pretch_join_qual t3 where c > t2.a); +NOTICE: prefetch join qual in slice 0 of plannode 2 + QUERY PLAN +------------------------------------------------------------------------------- + Gather Motion 3:1 (slice4; segments: 3) + -> Hash Join + Hash Cond: (t1.b = t2.b) + Join Filter: (SubPlan 1) + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: t1.b + -> Seq Scan on t1_test_pretch_join_qual t1 + -> Hash + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: t2.b + -> Seq Scan on t2_test_pretch_join_qual t2 + SubPlan 1 (slice4; segments: 3) + -> Aggregate + -> Result + Filter: (t3.c > t2.a) + -> Materialize + -> Broadcast Motion 3:3 (slice3; segments: 3) + -> Seq Scan on t3_test_pretch_join_qual t3 + Optimizer: Postgres query optimizer +(19 rows) + +reset Test_print_prefetch_joinqual; +reset optimizer; diff --git a/src/test/regress/expected/join_gp_optimizer.out b/src/test/regress/expected/join_gp_optimizer.out index 93100a33d915..6e518e38642b 100644 --- a/src/test/regress/expected/join_gp_optimizer.out +++ b/src/test/regress/expected/join_gp_optimizer.out @@ -1065,3 +1065,71 @@ on true; 1 | Sun Jan 02 02:04:00 2000 PST | Sun Jan 02 03:04:00 2000 PST (2 rows) +-- test prefetch join qual +-- we do not handle this correct +-- the only case we need to prefetch join qual is: +-- 1. outer plan contains motion +-- 2. the join qual contains subplan that contains motion +reset client_min_messages; +set Test_print_prefetch_joinqual = true; +-- prefetch join qual is only set correct for planner +set optimizer = off; +create table t1_test_pretch_join_qual(a int, b int, c int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +create table t2_test_pretch_join_qual(a int, b int, c int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +-- the following plan contains redistribute motion in both inner and outer plan +-- the join qual is t1.c > t2.c, it contains no motion, should not prefetch +explain (costs off) select * from t1_test_pretch_join_qual t1 join t2_test_pretch_join_qual t2 +on t1.b = t2.b and t1.c > t2.c; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice3; segments: 3) + -> Hash Join + Hash Cond: (t1.b = t2.b) + Join Filter: (t1.c > t2.c) + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: t1.b + -> Seq Scan on t1_test_pretch_join_qual t1 + -> Hash + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: t2.b + -> Seq Scan on t2_test_pretch_join_qual t2 + Optimizer: Postgres query optimizer +(12 rows) + +create table t3_test_pretch_join_qual(a int, b int, c int); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +-- the following plan contains motion in both outer plan and join qual, +-- so we should prefetch join qual +explain (costs off) select * from t1_test_pretch_join_qual t1 join t2_test_pretch_join_qual t2 +on t1.b = t2.b and t1.a > any (select sum(b) from t3_test_pretch_join_qual t3 where c > t2.a); +NOTICE: prefetch join qual in slice 0 of plannode 2 + QUERY PLAN +------------------------------------------------------------------------------- + Gather Motion 3:1 (slice4; segments: 3) + -> Hash Join + Hash Cond: (t1.b = t2.b) + Join Filter: (SubPlan 1) + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: t1.b + -> Seq Scan on t1_test_pretch_join_qual t1 + -> Hash + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: t2.b + -> Seq Scan on t2_test_pretch_join_qual t2 + SubPlan 1 (slice4; segments: 3) + -> Aggregate + -> Result + Filter: (t3.c > t2.a) + -> Materialize + -> Broadcast Motion 3:3 (slice3; segments: 3) + -> Seq Scan on t3_test_pretch_join_qual t3 + Optimizer: Postgres query optimizer +(19 rows) + +reset Test_print_prefetch_joinqual; +reset optimizer; diff --git a/src/test/regress/sql/deadlock2.sql b/src/test/regress/sql/deadlock2.sql index cd197dcb6e36..678db5894e80 100644 --- a/src/test/regress/sql/deadlock2.sql +++ b/src/test/regress/sql/deadlock2.sql @@ -92,6 +92,21 @@ insert into t_subplan select :x0, :x0 from generate_series(1,:scale) i; set enable_hashjoin to on; set enable_mergejoin to off; set enable_nestloop to off; +set Test_print_prefetch_joinqual = on; select count(*) from t_inner right join t_outer on t_inner.c2=t_outer.c2 and not exists (select 0 from t_subplan where t_subplan.c2=t_outer.c1); + +-- The logic of ExecPrefetchJoinQual is to use two null +-- tuples to fake inner and outertuple and then to ExecQual. +-- It may short cut if some previous qual is test null expr. +-- So ExecPrefetchJoinQual has to force ExecQual for each +-- qual expr in the joinqual list. See the Github issue +-- https://github.com/greenplum-db/gpdb/issues/8677 +-- for details. +select count(*) from t_inner right join t_outer on t_inner.c2=t_outer.c2 + and (t_inner.c1 is null or not exists (select 0 from t_subplan where t_subplan.c2=t_outer.c1)); + +select count(*) from t_inner right join t_outer on t_inner.c2=t_outer.c2 + and not exists (select 0 from t_subplan where t_subplan.c2=t_outer.c1) + and not exists (select 1 from t_subplan where t_subplan.c2=t_outer.c1); diff --git a/src/test/regress/sql/join_gp.sql b/src/test/regress/sql/join_gp.sql index f13cc14c24f9..06629f32c69f 100644 --- a/src/test/regress/sql/join_gp.sql +++ b/src/test/regress/sql/join_gp.sql @@ -472,6 +472,7 @@ reset enable_hashjoin; reset enable_mergejoin; reset enable_nestloop; + -- test lateral join inner plan contains limit -- we cannot pass params across motion so we -- can only generate a plan to gather all the @@ -515,3 +516,31 @@ inner join lateral (select myid, log_date as next_date from t_mylog_issue_8860 where myid = ml1.myid and log_date > ml1.log_date order by log_date asc limit 1) ml2 on true; + +-- test prefetch join qual +-- we do not handle this correct +-- the only case we need to prefetch join qual is: +-- 1. outer plan contains motion +-- 2. the join qual contains subplan that contains motion +reset client_min_messages; +set Test_print_prefetch_joinqual = true; +-- prefetch join qual is only set correct for planner +set optimizer = off; + +create table t1_test_pretch_join_qual(a int, b int, c int); +create table t2_test_pretch_join_qual(a int, b int, c int); + +-- the following plan contains redistribute motion in both inner and outer plan +-- the join qual is t1.c > t2.c, it contains no motion, should not prefetch +explain (costs off) select * from t1_test_pretch_join_qual t1 join t2_test_pretch_join_qual t2 +on t1.b = t2.b and t1.c > t2.c; + +create table t3_test_pretch_join_qual(a int, b int, c int); + +-- the following plan contains motion in both outer plan and join qual, +-- so we should prefetch join qual +explain (costs off) select * from t1_test_pretch_join_qual t1 join t2_test_pretch_join_qual t2 +on t1.b = t2.b and t1.a > any (select sum(b) from t3_test_pretch_join_qual t3 where c > t2.a); + +reset Test_print_prefetch_joinqual; +reset optimizer; From d59fbe1f8f3fee20a8c4d236e5a8731fe7b7435e Mon Sep 17 00:00:00 2001 From: Zhenghua Lyu Date: Fri, 6 Mar 2020 17:28:43 +0800 Subject: [PATCH 076/102] Fix PANIC if prefetch inner or joinqual that contains outerParams-ref. To avoid motion deadlock, Greenplum may decide to prefetch joinqual or inner plan. However, for NestLoop join, inner plan or joinqual may depend on outerParams. Previously, we do not handle outerParams correct in prefetch logic and thus may lead to PANIC. See Github Issue https://github.com/greenplum-db/gpdb/issues/9679 for Details. This commit fixes this by faking the outertuple to a null tuple and then build the params in econtext for NestLoop join's prefetch logic. --- src/backend/executor/execUtils.c | 53 +++++++++++++++++++ src/backend/executor/nodeNestloop.c | 20 +++++++ src/include/executor/executor.h | 1 + src/test/regress/expected/bfv_joins.out | 32 +++++++++++ .../regress/expected/bfv_joins_optimizer.out | 32 +++++++++++ src/test/regress/sql/bfv_joins.sql | 11 ++++ 6 files changed, 149 insertions(+) diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index e19d9db1a8b5..bfd0d5399d76 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -1726,6 +1726,52 @@ flatten_logic_exprs(Node *node) return list_make1(node); } +/* + * fake_outer_params + * helper function to fake the nestloop's nestParams + * so that prefetch inner or prefetch joinqual will + * not encounter NULL pointer reference issue. It is + * only invoked in ExecNestLoop and ExecPrefetchJoinQual + * when the join is a nestloop join. + */ +void +fake_outer_params(JoinState *node) +{ + ExprContext *econtext = node->ps.ps_ExprContext; + PlanState *inner = innerPlanState(node); + TupleTableSlot *outerTupleSlot = econtext->ecxt_outertuple; + NestLoop *nl = (NestLoop *) (node->ps.plan); + ListCell *lc = NULL; + + /* only nestloop contains nestParams */ + Assert(IsA(node->ps.plan, NestLoop)); + + /* econtext->ecxt_outertuple must have been set fakely. */ + Assert(outerTupleSlot != NULL); + /* + * fetch the values of any outer Vars that must be passed to the + * inner scan, and store them in the appropriate PARAM_EXEC slots. + */ + foreach(lc, nl->nestParams) + { + NestLoopParam *nlp = (NestLoopParam *) lfirst(lc); + int paramno = nlp->paramno; + ParamExecData *prm; + + prm = &(econtext->ecxt_param_exec_vals[paramno]); + /* Param value should be an OUTER_VAR var */ + Assert(IsA(nlp->paramval, Var)); + Assert(nlp->paramval->varno == OUTER_VAR); + Assert(nlp->paramval->varattno > 0); + prm->value = slot_getattr(outerTupleSlot, + nlp->paramval->varattno, + &(prm->isnull)); + /* Flag parameter value as changed */ + inner->chgParam = bms_add_member(inner->chgParam, + paramno); + } +} + /* * Prefetch JoinQual to prevent motion hazard. * @@ -1777,6 +1823,13 @@ ExecPrefetchJoinQual(JoinState *node) econtext->ecxt_outertuple = ExecInitNullTupleSlot(estate, ExecGetResultType(outer)); + if (IsA(node->ps.plan, NestLoop)) + { + NestLoop *nl = (NestLoop *) (node->ps.plan); + if (nl->nestParams) + fake_outer_params(node); + } + quals = flatten_logic_exprs((Node *) joinqual); /* Fetch subplan with the fake inner & outer tuples */ diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c index 38f245670eb5..b8fee7e28a0a 100644 --- a/src/backend/executor/nodeNestloop.c +++ b/src/backend/executor/nodeNestloop.c @@ -111,6 +111,26 @@ ExecNestLoop_guts(NestLoopState *node) */ if (node->prefetch_inner) { + /* + * Prefetch inner is Greenplum specific behavior. + * However, inner plan may depend on outer plan as + * outerParams. If so, we have to fake those params + * to avoid null pointer reference issue. And because + * of the nestParams, those inner results prefetched + * will be discarded (following code will rescan inner, + * even if inner's top is material node because of chgParam + * it will be re-executed too) that it is safe to fake + * nestParams here. The target is to materialize motion scan. + */ + if (nl->nestParams) + { + EState *estate = node->js.ps.state; + + econtext->ecxt_outertuple = ExecInitNullTupleSlot(estate, + ExecGetResultType(outerPlan)); + fake_outer_params(&(node->js)); + } + innerTupleSlot = ExecProcNode(innerPlan); node->reset_inner = true; econtext->ecxt_innertuple = innerTupleSlot; diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index 1d14ba271fe5..348ab31bcb34 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -470,6 +470,7 @@ extern void UnregisterExprContextCallback(ExprContext *econtext, /* Share input utilities defined in execUtils.c */ extern ShareNodeEntry * ExecGetShareNodeEntry(EState *estate, int shareid, bool fCreate); +extern void fake_outer_params(JoinState *node); extern void ExecPrefetchJoinQual(JoinState *node); /* ResultRelInfo and Append Only segment assignment */ diff --git a/src/test/regress/expected/bfv_joins.out b/src/test/regress/expected/bfv_joins.out index 984bf0b4866a..9cf9f6ea8adb 100644 --- a/src/test/regress/expected/bfv_joins.out +++ b/src/test/regress/expected/bfv_joins.out @@ -3310,6 +3310,38 @@ select * from a, b, c where b.i = a.i and (a.i + b.i) = c.j; 1 | 1 | 2 | 2 (1 row) +-- The above plan will prefetch inner plan and the inner plan refers +-- outerParams. Previously, we do not handle this case correct and forgot +-- to set the Params for nestloop in econtext. The outer Param is a compound +-- data type instead of simple integer, it will lead to PANIC. +-- See Github Issue: https://github.com/greenplum-db/gpdb/issues/9679 +-- for details. +create type mytype_prefetch_params as (x int, y int); +alter table b add column mt_col mytype_prefetch_params; +explain select a.*, b.i, c.* from a, b, c where ((mt_col).x > a.i or b.i = a.i) and (a.i + b.i) = c.j; + QUERY PLAN +------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) (cost=0.19..765906.72 rows=24 width=16) + -> Nested Loop (cost=0.19..765906.26 rows=8 width=16) + -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..1.05 rows=1 width=36) + -> Seq Scan on b (cost=0.00..1.01 rows=1 width=36) + -> Materialize (cost=0.19..255301.72 rows=2 width=12) + -> Nested Loop (cost=0.19..255301.70 rows=2 width=12) + Join Filter: (((b.mt_col).x > a.i) OR (b.i = a.i)) + -> Materialize (cost=0.00..1.06 rows=1 width=4) + -> Broadcast Motion 3:3 (slice3; segments: 3) (cost=0.00..1.05 rows=1 width=4) + -> Seq Scan on a (cost=0.00..1.01 rows=1 width=4) + -> Index Only Scan using c_i_j_idx on c (cost=0.19..85100.20 rows=1 width=8) + Index Cond: (j = (a.i + b.i)) + Optimizer: Postgres query optimizer +(13 rows) + +select a.*, b.i, c.* from a, b, c where ((mt_col).x > a.i or b.i = a.i) and (a.i + b.i) = c.j; + i | i | i | j +---+---+---+--- + 1 | 1 | 2 | 2 +(1 row) + reset enable_hashjoin; reset enable_mergejoin; reset enable_nestloop; diff --git a/src/test/regress/expected/bfv_joins_optimizer.out b/src/test/regress/expected/bfv_joins_optimizer.out index ae042ac7e110..359b8925b6fd 100644 --- a/src/test/regress/expected/bfv_joins_optimizer.out +++ b/src/test/regress/expected/bfv_joins_optimizer.out @@ -3307,6 +3307,38 @@ select * from a, b, c where b.i = a.i and (a.i + b.i) = c.j; 1 | 1 | 2 | 2 (1 row) +-- The above plan will prefetch inner plan and the inner plan refers +-- outerParams. Previously, we do not handle this case correct and forgot +-- to set the Params for nestloop in econtext. The outer Param is a compound +-- data type instead of simple integer, it will lead to PANIC. +-- See Github Issue: https://github.com/greenplum-db/gpdb/issues/9679 +-- for details. +create type mytype_prefetch_params as (x int, y int); +alter table b add column mt_col mytype_prefetch_params; +explain select a.*, b.i, c.* from a, b, c where ((mt_col).x > a.i or b.i = a.i) and (a.i + b.i) = c.j; + QUERY PLAN +------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) (cost=0.19..765906.72 rows=24 width=16) + -> Nested Loop (cost=0.19..765906.26 rows=8 width=16) + -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..1.05 rows=1 width=36) + -> Seq Scan on b (cost=0.00..1.01 rows=1 width=36) + -> Materialize (cost=0.19..255301.72 rows=2 width=12) + -> Nested Loop (cost=0.19..255301.70 rows=2 width=12) + Join Filter: (((b.mt_col).x > a.i) OR (b.i = a.i)) + -> Materialize (cost=0.00..1.06 rows=1 width=4) + -> Broadcast Motion 3:3 (slice3; segments: 3) (cost=0.00..1.05 rows=1 width=4) + -> Seq Scan on a (cost=0.00..1.01 rows=1 width=4) + -> Index Only Scan using c_i_j_idx on c (cost=0.19..85100.20 rows=1 width=8) + Index Cond: (j = (a.i + b.i)) + Optimizer: Postgres query optimizer +(13 rows) + +select a.*, b.i, c.* from a, b, c where ((mt_col).x > a.i or b.i = a.i) and (a.i + b.i) = c.j; + i | i | i | j +---+---+---+--- + 1 | 1 | 2 | 2 +(1 row) + reset enable_hashjoin; reset enable_mergejoin; reset enable_nestloop; diff --git a/src/test/regress/sql/bfv_joins.sql b/src/test/regress/sql/bfv_joins.sql index 709c6b24ad68..91084579e6ed 100644 --- a/src/test/regress/sql/bfv_joins.sql +++ b/src/test/regress/sql/bfv_joins.sql @@ -305,6 +305,17 @@ explain (costs off) select * from a, b, c where b.i = a.i and (a.i + b.i) = c.j; select * from a, b, c where b.i = a.i and (a.i + b.i) = c.j; +-- The above plan will prefetch inner plan and the inner plan refers +-- outerParams. Previously, we do not handle this case correct and forgot +-- to set the Params for nestloop in econtext. The outer Param is a compound +-- data type instead of simple integer, it will lead to PANIC. +-- See Github Issue: https://github.com/greenplum-db/gpdb/issues/9679 +-- for details. +create type mytype_prefetch_params as (x int, y int); +alter table b add column mt_col mytype_prefetch_params; +explain select a.*, b.i, c.* from a, b, c where ((mt_col).x > a.i or b.i = a.i) and (a.i + b.i) = c.j; +select a.*, b.i, c.* from a, b, c where ((mt_col).x > a.i or b.i = a.i) and (a.i + b.i) = c.j; + reset enable_hashjoin; reset enable_mergejoin; reset enable_nestloop; From ce0b417bd9e4e96d0f2685ad5fe7fa14775b59c0 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 16 Jul 2018 13:35:41 +0200 Subject: [PATCH 077/102] Add plan_cache_mode setting This allows overriding the choice of custom or generic plan. Author: Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com (cherry picked from commit f7cb2842bf47715133b40e4a503f35dbe60d1b72) --- doc/src/sgml/config.sgml | 30 ++ src/backend/utils/cache/plancache.c | 8 + src/backend/utils/misc/guc.c | 19 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/utils/plancache.h | 13 +- src/include/utils/unsync_guc_name.h | 1 + src/test/regress/expected/plancache.out | 91 +++++ .../regress/expected/plancache_optimizer.out | 352 ++++++++++++++++++ src/test/regress/sql/plancache.sql | 35 ++ 9 files changed, 549 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/plancache_optimizer.out diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 1143dada0ab9..5d6a76b428c6 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3683,6 +3683,36 @@ SELECT * FROM parent WHERE key = 2400;
    + + plan_cache_mode (enum) + + plan_cache_mode configuration parameter + + + + + Prepared statements (either explicitly prepared or implicitly + generated, for example in PL/pgSQL) can be executed using custom or + generic plans. A custom plan is replanned for a new parameter value, + a generic plan is reused for repeated executions of the prepared + statement. The choice between them is normally made automatically. + This setting overrides the default behavior and forces either a custom + or a generic plan. This can be used to work around performance + problems in specific cases. Note, however, that the plan cache + behavior is subject to change, so this setting, like all settings that + force the planner's hand, should be reevaluated regularly. + + + + The allowed values are auto, + force_custom_plan and + force_generic_plan. The default value is + auto. The setting is applied when a cached plan is + to be executed, not when it is prepared. + + + + diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index d8ec247b7558..d658755f3118 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -106,6 +106,8 @@ static void PlanCacheRelCallback(Datum arg, Oid relid); static void PlanCacheFuncCallback(Datum arg, int cacheid, uint32 hashvalue); static void PlanCacheSysCallback(Datum arg, int cacheid, uint32 hashvalue); +/* GUC parameter */ +int plan_cache_mode; /* * InitPlanCache: initialize module during InitPostgres. @@ -1030,6 +1032,12 @@ choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams, Into if (IsTransactionStmtPlan(plansource)) return false; + /* Let settings force the decision */ + if (plan_cache_mode == PLAN_CACHE_MODE_FORCE_GENERIC_PLAN) + return false; + if (plan_cache_mode == PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN) + return true; + /* See if caller wants to force the decision */ if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN) return false; diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 9e88f8ed7b77..c3f800aa3589 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -406,6 +406,13 @@ static const struct config_enum_entry huge_pages_options[] = { {NULL, 0, false} }; +static const struct config_enum_entry plan_cache_mode_options[] = { + {"auto", PLAN_CACHE_MODE_AUTO, false}, + {"force_generic_plan", PLAN_CACHE_MODE_FORCE_GENERIC_PLAN, false}, + {"force_custom_plan", PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN, false}, + {NULL, 0, false} +}; + /* * Options for enum values stored in other modules */ @@ -3576,6 +3583,18 @@ static struct config_enum ConfigureNamesEnum[] = NULL, NULL, NULL }, + { + {"plan_cache_mode", PGC_USERSET, QUERY_TUNING_OTHER, + gettext_noop("Controls the planner's selection of custom or generic plan."), + gettext_noop("Prepared statements can have custom and generic plans, and the planner " + "will attempt to choose which is better. This can be set to override " + "the default behavior.") + }, + &plan_cache_mode, + PLAN_CACHE_MODE_AUTO, plan_cache_mode_options, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 508045fe4b8c..59d1968e467f 100755 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -310,6 +310,7 @@ max_prepared_transactions = 250 # can be 0 or more #gp_segments_for_planner = 0 # if 0, actual number of segments is used #gp_enable_direct_dispatch = on +#plan_cache_mode = auto optimizer_analyze_root_partition = on # stats collection on root partitions diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h index 95d75aae7e69..abb041e92a62 100644 --- a/src/include/utils/plancache.h +++ b/src/include/utils/plancache.h @@ -176,4 +176,15 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource, IntoClause *intoClause); extern void ReleaseCachedPlan(CachedPlan *plan, bool useResOwner); -#endif /* PLANCACHE_H */ +/* possible values for plan_cache_mode */ +typedef enum +{ + PLAN_CACHE_MODE_AUTO, + PLAN_CACHE_MODE_FORCE_GENERIC_PLAN, + PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN +} PlanCacheMode; + +/* GUC parameter */ +extern int plan_cache_mode; + +#endif /* PLANCACHE_H */ diff --git a/src/include/utils/unsync_guc_name.h b/src/include/utils/unsync_guc_name.h index 09419a8ab720..737505939125 100644 --- a/src/include/utils/unsync_guc_name.h +++ b/src/include/utils/unsync_guc_name.h @@ -423,6 +423,7 @@ "optimizer_use_gpdb_allocators", "password_encryption", "password_hash_algorithm", + "plan_cache_mode", "pljava_classpath_insecure", "pljava_debug", "port", diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out index a64cc6692723..42fdb60f7851 100644 --- a/src/test/regress/expected/plancache.out +++ b/src/test/regress/expected/plancache.out @@ -262,3 +262,94 @@ NOTICE: 3 (1 row) +-- Test plan_cache_mode +create table test_mode (a int); +-- GPDB:setting the number of rows slightly higher to get a plan with +-- Index Only Scan (similar to upstream) +insert into test_mode select 1 from generate_series(1,15000) union all select 2; +create index on test_mode (a); +analyze test_mode; +prepare test_mode_pp (int) as select count(*) from test_mode where a = $1; +-- up to 5 executions, custom plan is used +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +---------------------------------------------------------- + Aggregate + -> Gather Motion 1:1 (slice1; segments: 1) + -> Aggregate + -> Index Only Scan using test_mode_a_idx on test_mode + Index Cond: (a = 2) + Optimizer: Postgres query optimizer +(6 rows) + +-- force generic plan +set plan_cache_mode to force_generic_plan; +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +----------------------------- + Aggregate + -> Gather Motion 3:1 (slice1; segments: 3) + -> Aggregate + -> Seq Scan on test_mode + Filter: (a = $1) + Optimizer: Postgres query optimizer +(6 rows) + +-- get to generic plan by 5 executions +set plan_cache_mode to auto; +execute test_mode_pp(1); -- 1x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 2x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 3x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 4x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 5x + count +------- + 15000 +(1 row) + +-- we should now get a really bad plan +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +----------------------------- + Aggregate + -> Gather Motion 3:1 (slice1; segments: 3) + -> Aggregate + -> Seq Scan on test_mode + Filter: (a = $1) + Optimizer: Postgres query optimizer +(6 rows) + +-- but we can force a custom plan +set plan_cache_mode to force_custom_plan; +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +---------------------------------------------------------- + Aggregate + -> Gather Motion 1:1 (slice1; segments: 1) + -> Aggregate + -> Index Only Scan using test_mode_a_idx on test_mode + Index Cond: (a = 2) + Optimizer: Postgres query optimizer +(6 rows) + +drop table test_mode; diff --git a/src/test/regress/expected/plancache_optimizer.out b/src/test/regress/expected/plancache_optimizer.out new file mode 100644 index 000000000000..37541d73c7ad --- /dev/null +++ b/src/test/regress/expected/plancache_optimizer.out @@ -0,0 +1,352 @@ +-- +-- Tests to exercise the plan caching/invalidation mechanism +-- +CREATE TEMP TABLE pcachetest AS SELECT * FROM int8_tbl; +-- create and use a cached plan +PREPARE prepstmt AS SELECT * FROM pcachetest; +EXECUTE prepstmt; + q1 | q2 +------------------+------------------- + 123 | 456 + 123 | 4567890123456789 + 4567890123456789 | 123 + 4567890123456789 | 4567890123456789 + 4567890123456789 | -4567890123456789 +(5 rows) + +-- and one with parameters +PREPARE prepstmt2(bigint) AS SELECT * FROM pcachetest WHERE q1 = $1; +EXECUTE prepstmt2(123); + q1 | q2 +-----+------------------ + 123 | 456 + 123 | 4567890123456789 +(2 rows) + +-- invalidate the plans and see what happens +DROP TABLE pcachetest; +EXECUTE prepstmt; +ERROR: relation "pcachetest" does not exist +EXECUTE prepstmt2(123); +ERROR: relation "pcachetest" does not exist +-- recreate the temp table (this demonstrates that the raw plan is +-- purely textual and doesn't depend on OIDs, for instance) +CREATE TEMP TABLE pcachetest AS SELECT * FROM int8_tbl; +EXECUTE prepstmt; + q1 | q2 +------------------+------------------- + 4567890123456789 | -4567890123456789 + 4567890123456789 | 123 + 123 | 456 + 123 | 4567890123456789 + 4567890123456789 | 4567890123456789 +(5 rows) + +EXECUTE prepstmt2(123); + q1 | q2 +-----+------------------ + 123 | 456 + 123 | 4567890123456789 +(2 rows) + +-- prepared statements should prevent change in output tupdesc, +-- since clients probably aren't expecting that to change on the fly +ALTER TABLE pcachetest ADD COLUMN q3 bigint; +EXECUTE prepstmt; +ERROR: cached plan must not change result type +EXECUTE prepstmt2(123); +ERROR: cached plan must not change result type +-- but we're nice guys and will let you undo your mistake +ALTER TABLE pcachetest DROP COLUMN q3; +EXECUTE prepstmt; + q1 | q2 +------------------+------------------- + 4567890123456789 | -4567890123456789 + 4567890123456789 | 123 + 123 | 456 + 123 | 4567890123456789 + 4567890123456789 | 4567890123456789 +(5 rows) + +EXECUTE prepstmt2(123); + q1 | q2 +-----+------------------ + 123 | 456 + 123 | 4567890123456789 +(2 rows) + +-- Try it with a view, which isn't directly used in the resulting plan +-- but should trigger invalidation anyway +CREATE TEMP VIEW pcacheview AS + SELECT * FROM pcachetest; +PREPARE vprep AS SELECT * FROM pcacheview; +EXECUTE vprep; + q1 | q2 +------------------+------------------- + 4567890123456789 | -4567890123456789 + 4567890123456789 | 123 + 123 | 456 + 123 | 4567890123456789 + 4567890123456789 | 4567890123456789 +(5 rows) + +CREATE OR REPLACE TEMP VIEW pcacheview AS + SELECT q1, q2/2 AS q2 FROM pcachetest; +EXECUTE vprep; + q1 | q2 +------------------+------------------- + 4567890123456789 | -2283945061728394 + 4567890123456789 | 61 + 123 | 228 + 123 | 2283945061728394 + 4567890123456789 | 2283945061728394 +(5 rows) + +-- Check basic SPI plan invalidation +create function cache_test(int) returns int as $$ +declare total int; +begin + create temp table t1(f1 int) distributed by (f1); + insert into t1 values($1); + insert into t1 values(11); + insert into t1 values(12); + insert into t1 values(13); + select sum(f1) into total from t1; + drop table t1; + return total; +end +$$ language plpgsql; +select cache_test(1); + cache_test +------------ + 37 +(1 row) + +select cache_test(2); + cache_test +------------ + 38 +(1 row) + +select cache_test(3); + cache_test +------------ + 39 +(1 row) + +-- Check invalidation of plpgsql "simple expression" +create temp view v1 as + select 2+2 as f1; +create function cache_test_2() returns int as $$ +begin + return f1 from v1; +end$$ language plpgsql; +select cache_test_2(); + cache_test_2 +-------------- + 4 +(1 row) + +create or replace temp view v1 as + select 2+2+4 as f1; +select cache_test_2(); + cache_test_2 +-------------- + 8 +(1 row) + +create or replace temp view v1 as + select 2+2+4+(select max(unique1) from tenk1) as f1; +select cache_test_2(); + cache_test_2 +-------------- + 10007 +(1 row) + +--- Check that change of search_path is honored when re-using cached plan +create schema s1 + create table abc (f1 int); +create schema s2 + create table abc (f1 int); +insert into s1.abc values(123); +insert into s2.abc values(456); +set search_path = s1; +prepare p1 as select f1 from abc; +execute p1; + f1 +----- + 123 +(1 row) + +set search_path = s2; +select f1 from abc; + f1 +----- + 456 +(1 row) + +execute p1; + f1 +----- + 456 +(1 row) + +alter table s1.abc add column f2 float8; -- force replan +execute p1; + f1 +----- + 456 +(1 row) + +drop schema s1 cascade; +NOTICE: drop cascades to table s1.abc +drop schema s2 cascade; +NOTICE: drop cascades to table abc +reset search_path; +-- Check that invalidation deals with regclass constants +create temp sequence seq; +prepare p2 as select nextval('seq'); +execute p2; + nextval +--------- + 1 +(1 row) + +drop sequence seq; +create temp sequence seq; +execute p2; + nextval +--------- + 1 +(1 row) + +-- Check DDL via SPI, immediately followed by SPI plan re-use +-- (bug in original coding) +create function cachebug() returns void as $$ +declare r int; +begin + drop table if exists temptable cascade; + -- Ignore NOTICE about missing DISTRIBUTED BY. It was annoying here, as + -- usually you would only see it on the first invocation, but sometimes + -- you'd also get it on the second invocation, if the plan cache + -- got invalidated in between the invocations. + set client_min_messages=warning; + create temp table temptable as select * from generate_series(1,3) as f1; + reset client_min_messages; + create temp view vv as select * from temptable; + for r in select * from vv order by f1 loop + raise notice '%', r; + end loop; +end$$ language plpgsql; +select cachebug(); +NOTICE: table "temptable" does not exist, skipping +CONTEXT: SQL statement "drop table if exists temptable cascade" +PL/pgSQL function cachebug() line 4 at SQL statement +NOTICE: 1 +NOTICE: 2 +NOTICE: 3 + cachebug +---------- + +(1 row) + +select cachebug(); +NOTICE: drop cascades to view vv +CONTEXT: SQL statement "drop table if exists temptable cascade" +PL/pgSQL function cachebug() line 4 at SQL statement +NOTICE: 1 +NOTICE: 2 +NOTICE: 3 + cachebug +---------- + +(1 row) + +-- Test plan_cache_mode +create table test_mode (a int); +-- GPDB:setting the number of rows slightly higher to get a plan with +-- Index Only Scan (similar to upstream) +insert into test_mode select 1 from generate_series(1,15000) union all select 2; +create index on test_mode (a); +analyze test_mode; +prepare test_mode_pp (int) as select count(*) from test_mode where a = $1; +-- up to 5 executions, custom plan is used +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +---------------------------------------------------------- + Aggregate + -> Gather Motion 3:1 (slice1; segments: 3) + -> Index Scan using test_mode_a_idx on test_mode + Index Cond: (a = 2) + Optimizer: Pivotal Optimizer (GPORCA) version 3.93.0 +(5 rows) + +-- force generic plan +set plan_cache_mode to force_generic_plan; +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +----------------------------- + Aggregate + -> Gather Motion 3:1 (slice1; segments: 3) + -> Aggregate + -> Seq Scan on test_mode + Filter: (a = $1) + Optimizer: Postgres query optimizer +(6 rows) + +-- get to generic plan by 5 executions +set plan_cache_mode to auto; +execute test_mode_pp(1); -- 1x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 2x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 3x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 4x + count +------- + 15000 +(1 row) + +execute test_mode_pp(1); -- 5x + count +------- + 15000 +(1 row) + +-- we should now get a really bad plan +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +----------------------------- + Aggregate + -> Gather Motion 3:1 (slice1; segments: 3) + -> Index Scan using test_mode_a_idx on test_mode + Index Cond: (a = 2) + Optimizer: Pivotal Optimizer (GPORCA) version 3.93.0 +(5 rows) + +-- but we can force a custom plan +set plan_cache_mode to force_custom_plan; +explain (costs off) execute test_mode_pp(2); + QUERY PLAN +---------------------------------------------------------- + Aggregate + -> Gather Motion 3:1 (slice1; segments: 3) + -> Index Scan using test_mode_a_idx on test_mode + Index Cond: (a = 2) + Optimizer: Pivotal Optimizer (GPORCA) version 3.93.0 +(5 rows) + +drop table test_mode; diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql index aeed6d936a75..51a648cea038 100644 --- a/src/test/regress/sql/plancache.sql +++ b/src/test/regress/sql/plancache.sql @@ -162,3 +162,38 @@ end$$ language plpgsql; select cachebug(); select cachebug(); + +-- Test plan_cache_mode + +create table test_mode (a int); +-- GPDB:setting the number of rows slightly higher to get a plan with +-- Index Only Scan (similar to upstream) +insert into test_mode select 1 from generate_series(1,15000) union all select 2; +create index on test_mode (a); +analyze test_mode; + +prepare test_mode_pp (int) as select count(*) from test_mode where a = $1; + +-- up to 5 executions, custom plan is used +explain (costs off) execute test_mode_pp(2); + +-- force generic plan +set plan_cache_mode to force_generic_plan; +explain (costs off) execute test_mode_pp(2); + +-- get to generic plan by 5 executions +set plan_cache_mode to auto; +execute test_mode_pp(1); -- 1x +execute test_mode_pp(1); -- 2x +execute test_mode_pp(1); -- 3x +execute test_mode_pp(1); -- 4x +execute test_mode_pp(1); -- 5x + +-- we should now get a really bad plan +explain (costs off) execute test_mode_pp(2); + +-- but we can force a custom plan +set plan_cache_mode to force_custom_plan; +explain (costs off) execute test_mode_pp(2); + +drop table test_mode; From 61b9f87c46b2d800d1f3ec2876394778c7a9075c Mon Sep 17 00:00:00 2001 From: Bhuvnesh Chaudhary Date: Mon, 2 Mar 2020 12:59:46 -0800 Subject: [PATCH 078/102] Return early if toast table is not required While attempting to create a toast table in case of binary upgrade, preassigned oids are required to see if a toast table is required. Currently, in the code, the check to see if there is AccessExclusiveLock is executed early even though a toast table is not required. In case, there is a no preassigned oid to create a toast table, we should bail out early as no AccessExclusiveLock would have been taken on the relation. --- .../expected/alter_statistic_on_column.out | 6 ++ .../upgraded_alter_statistic_on_column.out | 10 ++++ .../test/integration/gpdb5_schedule | 2 +- .../test/integration/gpdb6_schedule | 2 +- .../sql/alter_statistic_on_column.sql | 5 ++ .../upgraded_alter_statistic_on_column.sql | 2 + src/backend/catalog/toasting.c | 60 +++++++++++-------- 7 files changed, 59 insertions(+), 28 deletions(-) create mode 100644 contrib/pg_upgrade/test/integration/expected/alter_statistic_on_column.out create mode 100644 contrib/pg_upgrade/test/integration/expected/upgraded_alter_statistic_on_column.out create mode 100644 contrib/pg_upgrade/test/integration/sql/alter_statistic_on_column.sql create mode 100644 contrib/pg_upgrade/test/integration/sql/upgraded_alter_statistic_on_column.sql diff --git a/contrib/pg_upgrade/test/integration/expected/alter_statistic_on_column.out b/contrib/pg_upgrade/test/integration/expected/alter_statistic_on_column.out new file mode 100644 index 000000000000..ec46aa69640a --- /dev/null +++ b/contrib/pg_upgrade/test/integration/expected/alter_statistic_on_column.out @@ -0,0 +1,6 @@ +CREATE TABLE explicitly_set_statistic_table ( col1 integer NOT NULL ); +CREATE +ALTER TABLE ONLY explicitly_set_statistic_table ALTER COLUMN col1 SET STATISTICS 10; +ALTER +INSERT INTO explicitly_set_statistic_table SELECT i FROM generate_series(1,10)i; +INSERT 10 diff --git a/contrib/pg_upgrade/test/integration/expected/upgraded_alter_statistic_on_column.out b/contrib/pg_upgrade/test/integration/expected/upgraded_alter_statistic_on_column.out new file mode 100644 index 000000000000..5023c68379bf --- /dev/null +++ b/contrib/pg_upgrade/test/integration/expected/upgraded_alter_statistic_on_column.out @@ -0,0 +1,10 @@ +SELECT count(*) FROM explicitly_set_statistic_table; + count +------- + 10 +(1 row) +SELECT attname, attstattarget from pg_attribute, pg_class where attrelid=oid and relname='explicitly_set_statistic_table' and attname='col1'; + attname | attstattarget +---------+--------------- + col1 | 10 +(1 row) diff --git a/contrib/pg_upgrade/test/integration/gpdb5_schedule b/contrib/pg_upgrade/test/integration/gpdb5_schedule index d97eac218b29..e80062335a7b 100644 --- a/contrib/pg_upgrade/test/integration/gpdb5_schedule +++ b/contrib/pg_upgrade/test/integration/gpdb5_schedule @@ -1,4 +1,4 @@ -test: heap_table ao_table aoco_table partitioned_heap_table partitioned_ao_table partitioned_aoco_table user_defined_types pl_functions external_table check_constraints ao_table_without_base_relfilenode user_defined_aggregates +test: heap_table ao_table aoco_table partitioned_heap_table partitioned_ao_table partitioned_aoco_table user_defined_types pl_functions external_table check_constraints ao_table_without_base_relfilenode user_defined_aggregates alter_statistic_on_column test: exchange_partition_heap_table test: partitioned_heap_table_with_differently_sized_dropped_columns test: partitioned_heap_table_with_differently_aligned_dropped_columns diff --git a/contrib/pg_upgrade/test/integration/gpdb6_schedule b/contrib/pg_upgrade/test/integration/gpdb6_schedule index 2fbdee103288..2eab01394cd9 100644 --- a/contrib/pg_upgrade/test/integration/gpdb6_schedule +++ b/contrib/pg_upgrade/test/integration/gpdb6_schedule @@ -1,4 +1,4 @@ -test: upgraded_heap_table upgraded_ao_table upgraded_aoco_table upgraded_partitioned_heap_table upgraded_partitioned_ao_table upgraded_partitioned_aoco_table upgraded_user_defined_types upgraded_pl_functions upgraded_external_table upgraded_unsupported_dist_coltypes upgraded_unsupported_name_col upgraded_unsupported_tsquery_col upgraded_check_constraints upgraded_gphdfs upgraded_mismatched_aopartition_indexes upgraded_ao_table_without_base_relfilenode upgraded_different_name_index_backed_constraint upgraded_user_defined_aggregates +test: upgraded_heap_table upgraded_ao_table upgraded_aoco_table upgraded_partitioned_heap_table upgraded_partitioned_ao_table upgraded_partitioned_aoco_table upgraded_user_defined_types upgraded_pl_functions upgraded_external_table upgraded_unsupported_dist_coltypes upgraded_unsupported_name_col upgraded_unsupported_tsquery_col upgraded_check_constraints upgraded_gphdfs upgraded_mismatched_aopartition_indexes upgraded_ao_table_without_base_relfilenode upgraded_different_name_index_backed_constraint upgraded_user_defined_aggregates upgraded_alter_statistic_on_column test: upgraded_exchange_partition_heap_table test: upgraded_partitioned_heap_table_with_differently_sized_dropped_columns test: upgraded_partitioned_heap_table_with_differently_aligned_dropped_columns diff --git a/contrib/pg_upgrade/test/integration/sql/alter_statistic_on_column.sql b/contrib/pg_upgrade/test/integration/sql/alter_statistic_on_column.sql new file mode 100644 index 000000000000..3361881a3b28 --- /dev/null +++ b/contrib/pg_upgrade/test/integration/sql/alter_statistic_on_column.sql @@ -0,0 +1,5 @@ +CREATE TABLE explicitly_set_statistic_table ( + col1 integer NOT NULL +); +ALTER TABLE ONLY explicitly_set_statistic_table ALTER COLUMN col1 SET STATISTICS 10; +INSERT INTO explicitly_set_statistic_table SELECT i FROM generate_series(1,10)i; diff --git a/contrib/pg_upgrade/test/integration/sql/upgraded_alter_statistic_on_column.sql b/contrib/pg_upgrade/test/integration/sql/upgraded_alter_statistic_on_column.sql new file mode 100644 index 000000000000..03eab6642368 --- /dev/null +++ b/contrib/pg_upgrade/test/integration/sql/upgraded_alter_statistic_on_column.sql @@ -0,0 +1,2 @@ +SELECT count(*) FROM explicitly_set_statistic_table; +SELECT attname, attstattarget from pg_attribute, pg_class where attrelid=oid and relname='explicitly_set_statistic_table' and attname='col1'; diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c index 202f8030768b..88ba2dfb3a7f 100644 --- a/src/backend/catalog/toasting.c +++ b/src/backend/catalog/toasting.c @@ -210,6 +210,23 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, if (rel->rd_rel->reltoastrelid != InvalidOid) return false; + /* + * Toast tables for regular relations go in pg_toast; those for temp + * relations go into the per-backend temp-toast-table namespace. + */ + if (isTempOrToastNamespace(rel->rd_rel->relnamespace)) + namespaceid = GetTempToastNamespace(); + else + namespaceid = PG_TOAST_NAMESPACE; + + /* + * Create the toast table name and its index name + */ + snprintf(toast_relname, sizeof(toast_relname), + "pg_toast_%u", relOid); + snprintf(toast_idxname, sizeof(toast_idxname), + "pg_toast_%u_index", relOid); + /* * Check to see whether the table actually needs a TOAST table. */ @@ -250,6 +267,23 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, * !OidIsValid(binary_upgrade_next_toast_pg_type_oid)) * return false; */ + + /* + * Check if there is a preassigned oid for the toast relation, bailout + * if not. + * When the default statistic target for a column is not the default, pg_dump + * will dump an ALTER command to set the statistics for the + * target column. Thus, during restore, it will come here to add a toast table + * if required. + * + */ + toastOid = GetPreassignedOidForRelation(namespaceid, toast_relname); + if (!OidIsValid(toastOid)) + return false; + + /* Use binary-upgrade override for pg_type.oid */ + toast_typid = GetPreassignedOidForType(namespaceid, toast_relname, true); + } /* @@ -259,14 +293,6 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, if (check && lockmode != AccessExclusiveLock) elog(ERROR, "AccessExclusiveLock required to add toast table."); - /* - * Create the toast table and its index - */ - snprintf(toast_relname, sizeof(toast_relname), - "pg_toast_%u", relOid); - snprintf(toast_idxname, sizeof(toast_idxname), - "pg_toast_%u_index", relOid); - /* this is pretty painful... need a tuple descriptor */ tupdesc = CreateTemplateTupleDesc(3, false); TupleDescInitEntry(tupdesc, (AttrNumber) 1, @@ -291,24 +317,6 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, tupdesc->attrs[1]->attstorage = 'p'; tupdesc->attrs[2]->attstorage = 'p'; - /* - * Toast tables for regular relations go in pg_toast; those for temp - * relations go into the per-backend temp-toast-table namespace. - */ - if (isTempOrToastNamespace(rel->rd_rel->relnamespace)) - namespaceid = GetTempToastNamespace(); - else - namespaceid = PG_TOAST_NAMESPACE; - - /* Use binary-upgrade override for pg_type.oid */ - if (IsBinaryUpgrade) - { - toastOid = GetPreassignedOidForRelation(namespaceid, toast_relname); - if (!OidIsValid(toastOid)) - return false; - toast_typid = GetPreassignedOidForType(namespaceid, toast_relname, true); - } - toast_relid = heap_create_with_catalog(toast_relname, namespaceid, rel->rd_rel->reltablespace, From ea2b71bd36dfcde6dd68e0a4e52cf6c8a466bc9b Mon Sep 17 00:00:00 2001 From: Ashuka Xue Date: Tue, 3 Mar 2020 14:45:16 -0800 Subject: [PATCH 079/102] Add optimizer_enable_range_predicate_dpe GUC This commit adds a GUC needed to enable/disable ORCA traceflag introduced in ORCA commit : "Allow only equality comparisons for Dynamic Partition Elimination" --- concourse/tasks/compile_gpdb.yml | 2 +- config/orca.m4 | 4 ++-- configure | 4 ++-- depends/conanfile_orca.txt | 2 +- src/backend/gpopt/config/CConfigParamMapping.cpp | 6 ++++++ src/backend/utils/misc/guc_gp.c | 11 +++++++++++ src/include/utils/guc.h | 1 + src/include/utils/unsync_guc_name.h | 1 + 8 files changed, 25 insertions(+), 6 deletions(-) diff --git a/concourse/tasks/compile_gpdb.yml b/concourse/tasks/compile_gpdb.yml index dc5d1a3d3a5e..cfa63a24b938 100644 --- a/concourse/tasks/compile_gpdb.yml +++ b/concourse/tasks/compile_gpdb.yml @@ -19,5 +19,5 @@ params: BLD_TARGETS: OUTPUT_ARTIFACT_DIR: gpdb_artifacts CONFIGURE_FLAGS: - ORCA_TAG: v3.93.0 + ORCA_TAG: v3.94.0 RC_BUILD_TYPE_GCS: diff --git a/config/orca.m4 b/config/orca.m4 index ed9328808193..fc81f7c219d7 100644 --- a/config/orca.m4 +++ b/config/orca.m4 @@ -40,10 +40,10 @@ AC_RUN_IFELSE([AC_LANG_PROGRAM([[ #include ]], [ -return strncmp("3.93.", GPORCA_VERSION_STRING, 5); +return strncmp("3.94.", GPORCA_VERSION_STRING, 5); ])], [AC_MSG_RESULT([[ok]])], -[AC_MSG_ERROR([Your ORCA version is expected to be 3.93.XXX])] +[AC_MSG_ERROR([Your ORCA version is expected to be 3.94.XXX])] ) AC_LANG_POP([C++]) ])# PGAC_CHECK_ORCA_VERSION diff --git a/configure b/configure index e908f34d4af8..cc949c5a4d56 100755 --- a/configure +++ b/configure @@ -14948,7 +14948,7 @@ int main () { -return strncmp("3.93.", GPORCA_VERSION_STRING, 5); +return strncmp("3.94.", GPORCA_VERSION_STRING, 5); ; return 0; @@ -14958,7 +14958,7 @@ if ac_fn_cxx_try_run "$LINENO"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 $as_echo "ok" >&6; } else - as_fn_error $? "Your ORCA version is expected to be 3.93.XXX" "$LINENO" 5 + as_fn_error $? "Your ORCA version is expected to be 3.94.XXX" "$LINENO" 5 fi rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ diff --git a/depends/conanfile_orca.txt b/depends/conanfile_orca.txt index e144cc0c49a0..62c467d4c398 100644 --- a/depends/conanfile_orca.txt +++ b/depends/conanfile_orca.txt @@ -1,5 +1,5 @@ [requires] -orca/v3.93.0@gpdb/stable +orca/v3.94.0@gpdb/stable [imports] include, * -> build/include diff --git a/src/backend/gpopt/config/CConfigParamMapping.cpp b/src/backend/gpopt/config/CConfigParamMapping.cpp index 3203f4e5b687..960b12678968 100644 --- a/src/backend/gpopt/config/CConfigParamMapping.cpp +++ b/src/backend/gpopt/config/CConfigParamMapping.cpp @@ -403,6 +403,12 @@ CConfigParamMapping::SConfigMappingElem CConfigParamMapping::m_elements[] = &optimizer_prune_unused_columns, true, // m_negate_param GPOS_WSZ_LIT("Prune unused columns from the query.") + }, + { + EopttraceAllowGeneralPredicatesforDPE, + &optimizer_enable_range_predicate_dpe, + false, // m_negate_param + GPOS_WSZ_LIT("Enable range predicates for dynamic partition elimination.") } }; diff --git a/src/backend/utils/misc/guc_gp.c b/src/backend/utils/misc/guc_gp.c index 8c5a46c3b5f0..921e7c4360c4 100644 --- a/src/backend/utils/misc/guc_gp.c +++ b/src/backend/utils/misc/guc_gp.c @@ -408,6 +408,7 @@ bool optimizer_cte_inlining; bool optimizer_enable_space_pruning; bool optimizer_enable_associativity; bool optimizer_enable_eageragg; +bool optimizer_enable_range_predicate_dpe; /* Analyze related GUCs for Optimizer */ bool optimizer_analyze_root_partition; @@ -3019,6 +3020,16 @@ struct config_bool ConfigureNamesBool_gp[] = NULL, NULL, NULL }, + { + {"optimizer_enable_range_predicate_dpe", PGC_USERSET, DEVELOPER_OPTIONS, + gettext_noop("Enable range predicates for dynamic partition elimination."), + NULL, + GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE + }, + &optimizer_enable_range_predicate_dpe, + false, + NULL, NULL, NULL + }, /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index f1de9465c437..42d819928982 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -533,6 +533,7 @@ extern bool optimizer_array_constraints; extern bool optimizer_cte_inlining; extern bool optimizer_enable_space_pruning; extern bool optimizer_enable_associativity; +extern bool optimizer_enable_range_predicate_dpe; /* Analyze related GUCs for Optimizer */ extern bool optimizer_analyze_root_partition; diff --git a/src/include/utils/unsync_guc_name.h b/src/include/utils/unsync_guc_name.h index 737505939125..353d4f137660 100644 --- a/src/include/utils/unsync_guc_name.h +++ b/src/include/utils/unsync_guc_name.h @@ -370,6 +370,7 @@ "optimizer_enable_partial_index", "optimizer_enable_partition_propagation", "optimizer_enable_partition_selection", + "optimizer_enable_range_predicate_dpe", "optimizer_enable_sort", "optimizer_enable_space_pruning", "optimizer_enable_streaming_material", From 6eac24a35e9bab63cea14110946a32289baa8f93 Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Wed, 11 Mar 2020 15:01:31 -0700 Subject: [PATCH 080/102] docs - add examples for deprecated timestamp format YYYYMMDDHH24MISS. (#9665) Add examples to note when migration from GPDB 4,5 to 6 What fails in 6 and some workarounds. --- gpdb-doc/dita/install_guide/migrate.xml | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/gpdb-doc/dita/install_guide/migrate.xml b/gpdb-doc/dita/install_guide/migrate.xml index f9587b2ea53e..60e6fcb94c75 100644 --- a/gpdb-doc/dita/install_guide/migrate.xml +++ b/gpdb-doc/dita/install_guide/migrate.xml @@ -216,11 +216,24 @@ not permit tables DISTRIBUTED RANDOMLY to have primary keys or unique indexes. Restoring such a table from a Greenplum 4.3 or 5 backup will cause an error.
  • Greenplum 6 no longer automatically converts from the deprecated timestamp format - YYYYMMDDHH24MISS. The format could not be parsed unambiguously in previous Greenplum - Database releases. You can still specify the YYYYMMDDHH24MISS format in conversion - functions such as to_timestamp and to_char for - compatibility with other database systems. You can use input formats for converting text - to date or timestamp values to avoid unexpected results or query execution failures.
  • + YYYYMMDDHH24MISS. The format could not be parsed unambiguously in + previous Greenplum Database releases. You can still specify the + YYYYMMDDHH24MISS format in conversion functions such as + to_timestamp and to_char for compatibility with other + database systems. You can use input formats for converting text to date or timestamp + values to avoid unexpected results or query execution failures. For example, this + SELECT command returns a timestamp in Greenplum Database 5 and fails in + 6.SELECT to_timestamp('20190905140000');

    To convert the string + to a timestamp in Greenplum Database 6, you must use a valid format. Both of these + commands return a timestamp in Greenplum Database 6. The first example explicitly + specifies a timestamp format. The second example uses the string in a format that + Greenplum Database + recognizes.SELECT to_timestamp('20190905140000','YYYYMMDDHH24MISS'); +SELECT to_timestamp('201909051 40000');

    The + timestamp issue also applies when you use the :: syntax. In Greenplum + Database 6, the first command returns an error. The second command returns a timestamp. + SELECT '20190905140000'::timestamp ; +SELECT '20190905 140000'::timestamp ;

  • Creating a table using the CREATE TABLE AS command in Greenplum 4.3 or 5 could create a table with a duplicate distribution key. The gpbackup utility saves the table to the backup using a CREATE TABLE command that From 01166586be31a78d1235eb689a6da37fd74a87f4 Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Wed, 11 Mar 2020 15:10:05 -0700 Subject: [PATCH 081/102] docs - CREATE FUNCTION - new attribute EXECUTE ON INITPLAN (#9693) * docs - CREATE FUNCTION - new EXECUTE ON INITPLAN attribute Updated CREATE FUNCTION ALTER FUNCTION Using Functions and Operators This will be backported to 6X_STABLE HTML output on temporary GPDB doc review site https://docs-msk-gpdb6-dev.cfapps.io/7-0/ref_guide/sql_commands/CREATE_FUNCTION.html https://docs-msk-gpdb6-dev.cfapps.io/7-0/ref_guide/sql_commands/ALTER_FUNCTION.html https://docs-msk-gpdb6-dev.cfapps.io/7-0/admin_guide/query/topics/functions-operators.html * docs - update EXECUTE ON INITPLAN attribute description based on review comments. Labeled the attribute as Beta. * docs - removed Beta note, fixed minor errors and typos. --- .../query/topics/functions-operators.xml | 8 + .../ref_guide/sql_commands/ALTER_FUNCTION.xml | 7 +- .../sql_commands/CREATE_FUNCTION.xml | 137 +++++++++++++++--- 3 files changed, 126 insertions(+), 26 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml b/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml index 110b289e9445..75de2896d333 100644 --- a/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml +++ b/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml @@ -126,6 +126,14 @@ master. + + EXECUTE ON INITPLAN + Indicates that the function contains an SQL + command that dispatches queries to the segment instances and + requires special processing on the master instance by Greenplum + Database when possible. + +
  • diff --git a/gpdb-doc/dita/ref_guide/sql_commands/ALTER_FUNCTION.xml b/gpdb-doc/dita/ref_guide/sql_commands/ALTER_FUNCTION.xml index 9bf472665aa6..ad663ac6ee6a 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/ALTER_FUNCTION.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/ALTER_FUNCTION.xml @@ -21,7 +21,7 @@ ALTER FUNCTION name ( [ [argmode] [{CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT} {IMMUTABLE | STABLE | VOLATILE | [ NOT ] LEAKPROOF} {[EXTERNAL] SECURITY INVOKER | [EXTERNAL] SECURITY DEFINER} -EXECUTE ON { ANY | MASTER | ALL SEGMENTS } +EXECUTE ON { ANY | MASTER | ALL SEGMENTS | INITPLAN } COST execution_cost SET configuration_parameter { TO | = } { value | DEFAULT } SET configuration_parameter FROM CURRENT @@ -108,6 +108,7 @@ RESET ALL EXECUTE ON ANY EXECUTE ON MASTER EXECUTE ON ALL SEGMENTS + EXECUTE ON INITPLAN The EXECUTE ON attributes specify where (master or segment instance) a function executes when it is invoked during the query execution process. EXECUTE ON ANY (the default) indicates that the function can be @@ -120,11 +121,13 @@ RESET ALL all primary segment instances, but not the master, for each invocation. The overall result of the function is the UNION ALL of the results from all segment instances. + EXECUTE ON INITPLAN indicates that the function contains an SQL + command that dispatches queries to the segment instances and requires special processing + on the master instance by Greenplum Database when possible. For more information about the EXECUTE ON attributes, see CREATE FUNCTION. - COST execution_cost Change the estimated execution cost of the function. See execution_cost     | SET configuration_parameter { TO value | = value | FROM CURRENT } | AS 'definition'     | AS 'obj_file', 'link_symbol' } ...     [ WITH ({ DESCRIBE = describe_function - } [, ...] ) ] -
    Description

    CREATE FUNCTION defines - a new function. CREATE OR REPLACE FUNCTION either creates a new - function, or replaces an existing definition.

    The name of the new function - must not match any existing function with the same input argument types in the same - schema. However, functions of different argument types may share a name - (overloading).

    To update the definition of an existing function, use - CREATE OR REPLACE FUNCTION. It is not possible to change the - name or argument types of a function this way (this would actually create a new, - distinct function). Also, CREATE OR REPLACE FUNCTION will not let - you change the return type of an existing function. To do that, you must drop and - recreate the function. When using OUT parameters, that means you - cannot change the types of any OUT parameters except by dropping - the function. If you drop and then recreate a function, you will have to drop - existing objects (rules, views, triggers, and so on) that refer to the old function. - Use CREATE OR REPLACE FUNCTION to change a function definition - without breaking objects that refer to the function.

    + } [, ...] ) ] +
    +
    + Description +

    CREATE FUNCTION defines a new function. CREATE OR REPLACE + FUNCTION either creates a new function, or replaces an existing + definition.

    +

    The name of the new function must not match any existing function with the same input + argument types in the same schema. However, functions of different argument types + may share a name (overloading).

    +

    To update the definition of an existing function, use CREATE OR REPLACE + FUNCTION. It is not possible to change the name or argument types of a + function this way (this would actually create a new, distinct function). Also, + CREATE OR REPLACE FUNCTION will not let you change the return + type of an existing function. To do that, you must drop and recreate the function. + When using OUT parameters, that means you cannot change the types + of any OUT parameters except by dropping the function. If you drop + and then recreate a function, you will have to drop existing objects (rules, views, + triggers, and so on) that refer to the old function. Use CREATE OR REPLACE + FUNCTION to change a function definition without breaking objects that + refer to the function.

    The user that creates the function becomes the owner of the function.

    To be able to create a function, you must have USAGE privilege on the argument types and the return type.

    For more information about creating functions, see the User Defined Functions section of the PostgreSQL - documentation.

    Limited Use of VOLATILE and STABLE - Functions

    To prevent data from becoming out-of-sync across the segments in - Greenplum Database, any function classified as STABLE or + documentation.

    + Limited Use of VOLATILE and STABLE Functions

    To + prevent data from becoming out-of-sync across the segments in Greenplum + Database, any function classified as STABLE or VOLATILE cannot be executed at the segment level if it contains SQL or modifies the database in any way. For example, functions such as random() or timeofday() are not allowed to @@ -250,11 +255,11 @@ SELECT foo();

    In optional since, unlike in SQL, this feature applies to all functions not just external ones. - EXECUTE ON ANY EXECUTE ON MASTER EXECUTE ON ALL SEGMENTS + EXECUTE ON INITPLAN The EXECUTE ON attributes specify where (master or segment instance) a function executes when it is invoked during the query execution process. @@ -268,10 +273,29 @@ SELECT foo();

    In execute on all primary segment instances, but not the master, for each invocation. The overall result of the function is the UNION ALL of the results from all segment instances. + EXECUTE ON INITPLAN indicates that the function contains an + SQL command that dispatches queries to the segment instances and requires + special processing on the master instance by Greenplum Database when + possible. + EXECUTE ON INITPLAN is only supported in functions + that are used in the FROM clause of a CREATE + TABLE AS or INSERT command such as the + get_data() function in these + commands.CREATE TABLE t AS SELECT * FROM get_data(); + +INSERT INTO t1 SELECT * FROM get_data();

    Greenplum + Database does not support the EXECUTE ON INITPLAN + attribute in a function that is used in the WITH + clause of a query, a CTE (common table expression). For example, + specifying EXECUTE ON INITPLAN in function + get_data() in this CTE is not + supported.WITH tbl_a AS (SELECT * FROM get_data() ) + SELECT * from tbl_a + UNION + SELECT * FROM tbl_b;

    For information about using EXECUTE ON attributes, see Notes. - COST execution_cost A positive number identifying the estimated execution cost for the function, @@ -434,7 +458,72 @@ $SomeTag$Dianne's horse$SomeTag$ FROM clause.
  • A query that includes the function falls back from GPORCA to the Postgres Planner.
  • -
    +

    The attribute EXECUTE ON INITPLAN indicates that the + function contains an SQL command that dispatches queries to the segment + instances and requires special processing on the master instance by Greenplum + Database. When possible, Greenplum Database handles the function on the master + instance in the following manner.

      +
    1. First, Greenplum Database executes the function as part of an InitPlan + node on the master instance and holds the function output + temporarily.
    2. +
    3. Then, in the MainPlan of the query plan, the function is called in an + EntryDB (a special query executor (QE) that runs on the master instance) + and Greenplum Database returns the data that was captured when the + function was executed as part of the InitPlan node. The function is not + executed in the MainPlan.
    4. +

    This simple example uses the function get_data() in + a CTAS command to create a table using data from the table + country. The function contains a SELECT + command that retrieves data from the table country and uses the + EXECUTE ON INITPLAN + attribute.CREATE TABLE country( + c_id integer, c_name text, region int) + DISTRIBUTED RANDOMLY; + +INSERT INTO country VALUES (11,'INDIA', 1 ), (22,'CANADA', 2), (33,'USA', 3); + +CREATE OR REPLACE FUNCTION get_data() + RETURNS TABLE ( + c_id integer, c_name text + ) +AS $$ + SELECT + c.c_id, c.c_name + FROM + country c; +$$ +LANGUAGE SQL EXECUTE ON INITPLAN; + +CREATE TABLE t AS SELECT * FROM get_data() DISTRIBUTED RANDOMLY;

    If + you view the query plan of the CTAS command with EXPLAIN ANALYZE + VERBOSE, the plan shows that the function is run as part of an + InitPlan node, and one of the listed slices is labeled as entry + db. The query plan of a simple CTAS command without the function + does not have an InitPlan node or an entry db slice.

    If + the function did not contain the EXECUTE ON INITPLAN attribute, + the CTAS command returns the error function cannot execute on a QE + slice.

    When a function uses the EXECUTE ON + INITPLAN attribute, a command that uses the function such as + CREATE TABLE t AS SELECT * FROM get_data() gathers the + results of the function onto the master segment and then redistributes the + results to segment instances when inserting the data. If the function returns a + large amount of data, the master might become a bottleneck when gathering and + redistributing data. Performance might improve if you rewrite the function to + run the CTAS command in the user defined function and use the table name as an + input parameter. In this example, the function executes a CTAS command and does + not require the EXECUTE ON INITPLAN attribute. Running the + SELECT command creates the table t1 using + the function that executes the CTAS + command.CREATE OR REPLACE FUNCTION my_ctas(_tbl text) RETURNS VOID AS +$$ +BEGIN + EXECUTE format('CREATE TABLE %s AS SELECT c.c_id, c.c_name FROM country c DISTRIBUTED RANDOMLY', _tbl); +END +$$ +LANGUAGE plpgsql; + +SELECT my_ctas('t1');

    +
    Examples

    A very simple addition function:

    CREATE FUNCTION add(integer, integer) RETURNS integer AS 'select $1 + $2;' From b4692794a0abd8d8051c23019d26e22f7f3d0aa5 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Wed, 11 Mar 2020 14:57:21 -0700 Subject: [PATCH 082/102] Provide workaround to reclaim space on truncate cmd in sub-transaction As discussed in gpdb-users thread [1], currently no mechanism exist to reclaim disk space for a table created and truncated or dropped iteratively in a plpython function. PostgreSQL 11 provides `plpy.commit()` using which this can be achieved. But in absence of same need to provide some mechanism as MADlib has requirement for this functionality for GPDB6 and forward. Hence, considering the requirement as interim work-around adding guc to be able to perform unsafe truncate instead of safe truncate from plpython execute function. Setting the GUC can force unsafe truncation only if - inside sub-transaction and not in top transaction - table was created somewhere within this transactions scope and not outside of it The GUC will be set in the plpython udf and if any `plpy.execute()` errors out the top transaction will also rollback. The GUC can't be set in postgresql.conf file. Also, added description to warn the guc is not for general purpose use and is developer only guc. Added test showcases in simple form the scenario. [1] https://groups.google.com/a/greenplum.org/d/msg/gpdb-users/YCtI4oUA3r0/t0CzhtL6AQAJ Reviewed-by: Soumyadeep Chakraborty (cherry picked from commit 6c16abcad31337233aa34f39107d63079374c3bf) --- src/backend/commands/tablecmds.c | 12 ++++++++- src/backend/utils/misc/guc_gp.c | 16 ++++++++++++ src/include/utils/guc.h | 1 + src/include/utils/sync_guc_name.h | 1 + src/test/regress/expected/guc_gp.out | 37 ++++++++++++++++++++++++++++ src/test/regress/sql/guc_gp.sql | 31 +++++++++++++++++++++++ 6 files changed, 97 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index c1483a295136..caabb3de9b6b 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1771,6 +1771,8 @@ ExecuteTruncate(TruncateStmt *stmt) foreach(cell, rels) { Relation rel = (Relation) lfirst(cell); + bool inSubTransaction = mySubid != TopSubTransactionId; + bool createdInThisTransactionScope = rel->rd_createSubid != InvalidSubTransactionId; Assert(CheckExclusiveAccess(rel)); @@ -1780,9 +1782,17 @@ ExecuteTruncate(TruncateStmt *stmt) * a new relfilenode in the current (sub)transaction, then we can just * truncate it in-place, because a rollback would cause the whole * table or the current physical file to be thrown away anyway. + * + * GPDB: Using GUC dev_opt_unsafe_truncate_in_subtransaction can force + * unsafe truncation only if + + * - inside sub-transaction and not in top transaction + * - table was created somewhere within this transaction scope */ if (rel->rd_createSubid == mySubid || - rel->rd_newRelfilenodeSubid == mySubid) + rel->rd_newRelfilenodeSubid == mySubid || + (dev_opt_unsafe_truncate_in_subtransaction && + inSubTransaction && createdInThisTransactionScope)) { /* Immediate, non-rollbackable truncation is OK */ heap_truncate_one_rel(rel); diff --git a/src/backend/utils/misc/guc_gp.c b/src/backend/utils/misc/guc_gp.c index 921e7c4360c4..0c2915fd181f 100644 --- a/src/backend/utils/misc/guc_gp.c +++ b/src/backend/utils/misc/guc_gp.c @@ -109,6 +109,7 @@ bool gp_guc_need_restore = false; char *Debug_dtm_action_sql_command_tag; +bool dev_opt_unsafe_truncate_in_subtransaction = false; bool Debug_print_full_dtm = false; bool Debug_print_snapshot_dtm = false; bool Debug_disable_distributed_snapshot = false; @@ -1219,6 +1220,21 @@ struct config_bool ConfigureNamesBool_gp[] = NULL, NULL, NULL }, + { + {"dev_opt_unsafe_truncate_in_subtransaction", PGC_USERSET, DEVELOPER_OPTIONS, + gettext_noop("Pick unsafe truncate instead of safe truncate inside sub-transaction."), + gettext_noop("Usage of this GUC is strongly discouraged and only " + "should be used after understanding the impact of using " + "the same. Setting the GUC comes with cost of losing " + "table data on truncate command despite sub-transaction " + "rollback for table created within transaction."), + GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_DISALLOW_IN_AUTO_FILE + }, + &dev_opt_unsafe_truncate_in_subtransaction, + false, + NULL, NULL, NULL + }, + { {"debug_print_full_dtm", PGC_SUSET, LOGGING_WHAT, gettext_noop("Prints full DTM information to server log."), diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index 42d819928982..4855046b003c 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -240,6 +240,7 @@ extern bool Debug_print_parse; extern bool Debug_print_rewritten; extern bool Debug_pretty_print; +extern bool dev_opt_unsafe_truncate_in_subtransaction; extern bool Debug_print_full_dtm; extern bool Debug_print_snapshot_dtm; extern bool Debug_disable_distributed_snapshot; diff --git a/src/include/utils/sync_guc_name.h b/src/include/utils/sync_guc_name.h index 0d4df234c4b9..f5c4c283448a 100644 --- a/src/include/utils/sync_guc_name.h +++ b/src/include/utils/sync_guc_name.h @@ -5,6 +5,7 @@ "coredump_on_memerror", "DateStyle", "default_tablespace", + "dev_opt_unsafe_truncate_in_subtransaction", "dml_ignore_target_partition_check", "dtx_phase2_retry_count", "execute_pruned_plan", diff --git a/src/test/regress/expected/guc_gp.out b/src/test/regress/expected/guc_gp.out index 6b129d34a378..0dc4f04f534a 100644 --- a/src/test/regress/expected/guc_gp.out +++ b/src/test/regress/expected/guc_gp.out @@ -299,3 +299,40 @@ select gp_inject_fault('all', 'reset', dbid) from gp_segment_configuration; (8 rows) set allow_segment_DML to off; +-- test for guc dev_opt_unsafe_truncate_in_subtransaction +-- start_ignore +CREATE LANGUAGE plpythonu; +-- end_ignore +CREATE OR REPLACE FUNCTION run_all_in_one() RETURNS VOID AS +$$ + plpy.execute('CREATE TABLE unsafe_truncate(a int, b int) DISTRIBUTED BY (a)') + plpy.execute('INSERT INTO unsafe_truncate SELECT * FROM generate_series(1, 10)') + for i in range(1,4): + plpy.execute('UPDATE unsafe_truncate SET b = b + 1') + plpy.execute('CREATE TABLE foobar AS SELECT * FROM unsafe_truncate DISTRIBUTED BY (a)') + + before_truncate = plpy.execute('SELECT relfilenode FROM gp_dist_random(\'pg_class\') WHERE relname=\'unsafe_truncate\' ORDER BY gp_segment_id') + plpy.execute('truncate unsafe_truncate') + after_truncate = plpy.execute('SELECT relfilenode FROM gp_dist_random(\'pg_class\') WHERE relname=\'unsafe_truncate\' ORDER BY gp_segment_id') + + plpy.execute('DROP TABLE unsafe_truncate') + plpy.execute('ALTER TABLE foobar RENAME TO unsafe_truncate') + + if before_truncate[0]['relfilenode'] == after_truncate[0]['relfilenode']: + plpy.info('iteration:%d unsafe truncate performed' % (i)) + else: + plpy.info('iteration:%d safe truncate performed' % (i)) + + plpy.execute('SET dev_opt_unsafe_truncate_in_subtransaction TO ON') + plpy.execute('DROP TABLE unsafe_truncate') + plpy.execute('RESET dev_opt_unsafe_truncate_in_subtransaction') +$$ language plpythonu; +select run_all_in_one(); +INFO: iteration:1 safe truncate performed +INFO: iteration:2 unsafe truncate performed +INFO: iteration:3 unsafe truncate performed + run_all_in_one +---------------- + +(1 row) + diff --git a/src/test/regress/sql/guc_gp.sql b/src/test/regress/sql/guc_gp.sql index 3a55e4355e07..063140f57e5a 100644 --- a/src/test/regress/sql/guc_gp.sql +++ b/src/test/regress/sql/guc_gp.sql @@ -187,3 +187,34 @@ select current_setting('datestyle') from gp_dist_random('gp_id'); select gp_inject_fault('all', 'reset', dbid) from gp_segment_configuration; set allow_segment_DML to off; + +-- test for guc dev_opt_unsafe_truncate_in_subtransaction +-- start_ignore +CREATE LANGUAGE plpythonu; +-- end_ignore +CREATE OR REPLACE FUNCTION run_all_in_one() RETURNS VOID AS +$$ + plpy.execute('CREATE TABLE unsafe_truncate(a int, b int) DISTRIBUTED BY (a)') + plpy.execute('INSERT INTO unsafe_truncate SELECT * FROM generate_series(1, 10)') + for i in range(1,4): + plpy.execute('UPDATE unsafe_truncate SET b = b + 1') + plpy.execute('CREATE TABLE foobar AS SELECT * FROM unsafe_truncate DISTRIBUTED BY (a)') + + before_truncate = plpy.execute('SELECT relfilenode FROM gp_dist_random(\'pg_class\') WHERE relname=\'unsafe_truncate\' ORDER BY gp_segment_id') + plpy.execute('truncate unsafe_truncate') + after_truncate = plpy.execute('SELECT relfilenode FROM gp_dist_random(\'pg_class\') WHERE relname=\'unsafe_truncate\' ORDER BY gp_segment_id') + + plpy.execute('DROP TABLE unsafe_truncate') + plpy.execute('ALTER TABLE foobar RENAME TO unsafe_truncate') + + if before_truncate[0]['relfilenode'] == after_truncate[0]['relfilenode']: + plpy.info('iteration:%d unsafe truncate performed' % (i)) + else: + plpy.info('iteration:%d safe truncate performed' % (i)) + + plpy.execute('SET dev_opt_unsafe_truncate_in_subtransaction TO ON') + plpy.execute('DROP TABLE unsafe_truncate') + plpy.execute('RESET dev_opt_unsafe_truncate_in_subtransaction') +$$ language plpythonu; + +select run_all_in_one(); From 89036771747ea48d482c704ce71ea282a44165aa Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Wed, 11 Mar 2020 15:31:38 -0700 Subject: [PATCH 083/102] docs - add deflate support for s3 protocol (#9707) * docs - add deflate support for s3 protocol * docs - review updates for s3 protocol compressed file information * docs - clarified how s3 protocol recognizes gzip, deflate compressed files. --- gpdb-doc/dita/admin_guide/external/g-s3-protocol.xml | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/external/g-s3-protocol.xml b/gpdb-doc/dita/admin_guide/external/g-s3-protocol.xml index 80b0074d7c63..f4e9473dd66b 100644 --- a/gpdb-doc/dita/admin_guide/external/g-s3-protocol.xml +++ b/gpdb-doc/dita/admin_guide/external/g-s3-protocol.xml @@ -197,7 +197,7 @@ s3://s3-us-west-2.amazonaws.com/test1/abcdefff About S3 Data Files

    For each INSERT operation to a writable S3 table, each Greenplum Database segment uploads a single file to the configured S3 bucket using the filename - format <prefix><segment_id><random>.<extension>[.gz] + format <prefix><segment_id><random>.<extension>[.gz] where:

    • <prefix> is the prefix specified in the S3 URL.
    • <segment_id> is the Greenplum Database segment ID.
    • @@ -222,8 +222,11 @@ s3://s3-us-west-2.amazonaws.com/test1/abcdefff or a carriage return (\r). Also, the column delimiter cannot be a newline character (\n) or a carriage return character (\r).

      -

      The s3 protocol recognizes the gzip format and uncompress the files. - Only the gzip compression format is supported.

      +

      For read-only S3 tables, the s3 protocol recognizes gzip and deflate + compressed files and automatically decompresses the files. For gzip compression, the + protocol recognizes the format of a gzip compressed file. For deflate compression, the + protocol assumes a file with the .deflate suffix is a deflate + compressed file.

      The S3 file permissions must be Open/Download and View for the S3 user ID that is accessing the files. Writable S3 tables require the S3 user ID to have Upload/Delete permissions.

      @@ -586,7 +589,8 @@ gpcheckcloud -h Download data from the specified S3 location with the configuration specified in the s3 protocol URL and send the output to STDOUT. - If files are gzip compressed, the uncompressed data is sent to + If files are gzip compressed or have a .deflate suffix to + indicate deflate compression, the uncompressed data is sent to STDOUT. From e92accd0dcf403714800db14cdcefced570cba88 Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Wed, 11 Mar 2020 15:32:57 -0700 Subject: [PATCH 084/102] docs - gpload new option --max-retries (#9716) --- gpdb-doc/dita/utility_guide/ref/gpload.xml | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/gpdb-doc/dita/utility_guide/ref/gpload.xml b/gpdb-doc/dita/utility_guide/ref/gpload.xml index 13b1b91d4806..88928013da4e 100644 --- a/gpdb-doc/dita/utility_guide/ref/gpload.xml +++ b/gpdb-doc/dita/utility_guide/ref/gpload.xml @@ -9,7 +9,7 @@ Synopsis gpload -f control_file [-l log_file] [-h hostname] [-p port] [-U username] [-d database] [-W] [--gpfdist_timeout seconds] - [--no_auto_trans] [[-v | -V] [-q]] [-D] + [--no_auto_trans] [--max_retries retry_times] [[-v | -V] [-q]] [-D] gpload -? @@ -140,6 +140,17 @@ load control file, the environment variable $PGPORT or defaults to 5432. + + --max_retries retry_times + +

      Specifies the maximum number of times gpload + attempts to connect to Greenplum Database after a connection + timeout. The default value is 0, do not attempt to + connect after a connection timeout. A negative integer, such as + -1, specifies an unlimited number of + attempts.

      +
      +
      -U username The database role name to connect as. If not specified, reads from the From 2a34394214702b759ba4cd3fa5e9592573f61036 Mon Sep 17 00:00:00 2001 From: xiong-gang Date: Thu, 12 Mar 2020 11:27:29 +0800 Subject: [PATCH 085/102] Allocate and zero out the memory of MyTmGxactLocal --- src/backend/storage/lmgr/proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index a3318d60beb6..cb49376a1fd7 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -383,7 +383,7 @@ InitProcess(void) } MyPgXact = &ProcGlobal->allPgXact[MyProc->pgprocno]; MyTmGxact = &ProcGlobal->allTmGxact[MyProc->pgprocno]; - MyTmGxactLocal = (TMGXACTLOCAL*)MemoryContextAlloc(TopMemoryContext, sizeof(TMGXACTLOCAL)); + MyTmGxactLocal = (TMGXACTLOCAL*)MemoryContextAllocZero(TopMemoryContext, sizeof(TMGXACTLOCAL)); if (MyTmGxactLocal == NULL) elog(FATAL, "allocating TMGXACTLOCAL failed"); @@ -638,7 +638,7 @@ InitAuxiliaryProcess(void) lockHolderProcPtr = auxproc; MyPgXact = &ProcGlobal->allPgXact[auxproc->pgprocno]; MyTmGxact = &ProcGlobal->allTmGxact[auxproc->pgprocno]; - MyTmGxactLocal = (TMGXACTLOCAL*)MemoryContextAlloc(TopMemoryContext, sizeof(TMGXACTLOCAL)); + MyTmGxactLocal = (TMGXACTLOCAL*)MemoryContextAllocZero(TopMemoryContext, sizeof(TMGXACTLOCAL)); if (MyTmGxactLocal == NULL) elog(FATAL, "allocating TMGXACTLOCAL failed"); From 37c18c3e2322fd2d562af660823c926dfb3587dc Mon Sep 17 00:00:00 2001 From: Chris Hajas Date: Mon, 9 Mar 2020 17:40:49 -0700 Subject: [PATCH 086/102] Fix analyzedb with config file to work with partitioned tables Previously, running analyzedb with an input file (`analyzedb -f config_file" + When the user runs "analyzedb -a -d incr_analyze -f config_file" + Then analyzedb should return a return code of 0 + And output should contain both "-public.sales_1_prt_2" and "-public.sales_1_prt_2" + And "public.sales_1_prt_2" should appear in the latest state files + And "public.sales_1_prt_3" should appear in the latest state files + And "public.sales_1_prt_4" should appear in the latest state files + + @analyzedb_core @analyzedb_partition_tables + Scenario: Partition table with root partition passed to config file for heap table + Given no state files exist for database "incr_analyze" + And the user runs "psql -d incr_analyze -c 'create table foo (a int, b int) partition by range (b) (start (1) end (4) every (1))'" + And the user runs command "printf 'public.foo' > config_file" + When the user runs "analyzedb -a -d incr_analyze -f config_file" + Then analyzedb should return a return code of 0 + And output should contain both "-public.foo_1_prt_1" and "-public.foo_1_prt_3" + And the user runs "psql -d incr_analyze -c 'drop table foo'" + @analyzedb_core @analyzedb_root_and_partition_tables Scenario: Partition tables, (entries for all parts, no change, some parts, root parts) Given no state files exist for database "incr_analyze" From 7883c3ac7e2f0b4c5f4408eaf55959f0eb42bb6b Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Thu, 20 Feb 2020 09:46:22 -0800 Subject: [PATCH 087/102] docs - pxf cluster restart (#9569) --- gpdb-doc/markdown/pxf/cfginitstart_pxf.html.md.erb | 7 +++---- gpdb-doc/markdown/pxf/pxf_kerbhdfs.html.md.erb | 3 +-- gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb | 6 +++++- gpdb-doc/markdown/pxf/reg_jar_depend.html.md.erb | 3 +-- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/gpdb-doc/markdown/pxf/cfginitstart_pxf.html.md.erb b/gpdb-doc/markdown/pxf/cfginitstart_pxf.html.md.erb index c3dddfbb4231..b93c19c18675 100644 --- a/gpdb-doc/markdown/pxf/cfginitstart_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/cfginitstart_pxf.html.md.erb @@ -27,7 +27,7 @@ PXF provides two management commands: - `pxf cluster` - manage all PXF service instances in the Greenplum Database cluster - `pxf` - manage the PXF service instance on a specific Greenplum Database host -The [`pxf cluster`](ref/pxf-cluster.html) command supports `init`, `start`, `status`, `stop`, and `sync` subcommands. When you run a `pxf cluster` subcommand on the Greenplum Database master host, you perform the operation on all segment hosts in the Greenplum Database cluster. PXF also runs the `init` and `sync` commands on the standby master host. +The [`pxf cluster`](ref/pxf-cluster.html) command supports `init`, `start`, `restart`, `status`, `stop`, and `sync` subcommands. When you run a `pxf cluster` subcommand on the Greenplum Database master host, you perform the operation on all segment hosts in the Greenplum Database cluster. PXF also runs the `init` and `sync` commands on the standby master host. The [`pxf`](ref/pxf.html) command supports `init`, `start`, `stop`, `restart`, and `status` operations. These operations run locally. That is, if you want to start or stop the PXF agent on a specific Greenplum Database segment host, you log in to the host and run the command. @@ -110,7 +110,7 @@ Perform the following procedure to stop PXF on each segment host in your Greenpl ## Restarting PXF -If you must restart PXF, for example if you updated PXF user configuration files in `$PXF_CONF/conf`, you can stop, and then start, PXF in your Greenplum Database cluster. +If you must restart PXF, for example if you updated PXF user configuration files in `$PXF_CONF/conf`, you run `pxf cluster restart` to stop, and then start, PXF on all segment hosts in your Greenplum Database cluster. Only the `gpadmin` user can restart the PXF service. @@ -131,7 +131,6 @@ Perform the following procedure to restart PXF in your Greenplum Database cluste 2. Restart PXF: ```shell - gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster stop - gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster start + gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster restart ``` diff --git a/gpdb-doc/markdown/pxf/pxf_kerbhdfs.html.md.erb b/gpdb-doc/markdown/pxf/pxf_kerbhdfs.html.md.erb index 17e467519540..0486adc58beb 100644 --- a/gpdb-doc/markdown/pxf/pxf_kerbhdfs.html.md.erb +++ b/gpdb-doc/markdown/pxf/pxf_kerbhdfs.html.md.erb @@ -119,8 +119,7 @@ When you configure PXF for secure HDFS using an AD Kerberos KDC server, you will ``` shell gpadmin@master$ $GPHOME/pxf/bin/pxf cluster sync - gpadmin@master$ $GPHOME/pxf/bin/pxf cluster stop - gpadmin@master$ $GPHOME/pxf/bin/pxf cluster start + gpadmin@master$ $GPHOME/pxf/bin/pxf cluster restart ``` 6. Step 7 does not synchronize the keytabs in `$PXF_CONF`. You must distribute the keytab file to `$PXF_CONF/keytabs/`. Locate the keytab file, copy the file to the `$PXF_CONF` user configuration directory, and set required permissions. For example: diff --git a/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb b/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb index c907729d2cc0..6cc7275f158a 100644 --- a/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb +++ b/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb @@ -16,6 +16,7 @@ where `` is: help init reset +restart start status stop @@ -28,7 +29,7 @@ The `pxf cluster` utility command manages PXF on the master, standby master, and - Initialize PXF configuration on all hosts in the Greenplum Database cluster. - Reset the PXF service instance on all hosts to its uninitialized state. -- Start and stop the PXF service instance on all segment hosts. +- Start, stop, and restart the PXF service instance on all segment hosts. - Display the status of the PXF service instance on all segment hosts. - Synchronize the PXF configuration from the Greenplum Database master host to the standby master and to all segment hosts. @@ -47,6 +48,9 @@ The `pxf cluster` utility command manages PXF on the master, standby master, and
      reset
      Reset the PXF service instance on the master, standby master, and on all segment hosts. Resetting removes PXF runtime files and directories, and returns PXF to an uninitialized state. You must stop the PXF service instance running on each segment host before you reset PXF in your Greenplum Database cluster.
      +
      restart
      +
      Stop, and then start, the PXF service instance on all segment hosts.
      +
      start
      Start the PXF service instance on all segment hosts.
      diff --git a/gpdb-doc/markdown/pxf/reg_jar_depend.html.md.erb b/gpdb-doc/markdown/pxf/reg_jar_depend.html.md.erb index 606e8f031537..958a0572f24e 100644 --- a/gpdb-doc/markdown/pxf/reg_jar_depend.html.md.erb +++ b/gpdb-doc/markdown/pxf/reg_jar_depend.html.md.erb @@ -31,7 +31,6 @@ Should you need to add an additional JAR dependency for PXF, for example a JDBC $ ssh gpadmin@ gpadmin@gpmaster$ cp new_dependent_jar.jar $PXF_CONF/lib/ gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync -gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster stop -gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster start +gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster restart ``` From 46bb4284204aa0db90aa78cd4426a43ebd36fb05 Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Fri, 28 Feb 2020 08:48:28 -0800 Subject: [PATCH 088/102] =?UTF-8?q?docs=20-=20pxf=20projection=20and=20pus?= =?UTF-8?q?hdown=20info=20to=20tbl=20format;=20add=20parquet=20pushdo?= =?UTF-8?q?=E2=80=A6=20(#9618)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * docs - projection and pushdown info to tbl format; add parquet pushdown info * combine arithmetic ops in single column * multi-line header, center, combine is/not null * add parquet data type write mapping --- gpdb-doc/markdown/pxf/col_project.html.md.erb | 14 ++-- gpdb-doc/markdown/pxf/filter_push.html.md.erb | 31 ++++++--- .../markdown/pxf/hdfs_parquet.html.md.erb | 69 ++++++++++++++----- 3 files changed, 83 insertions(+), 31 deletions(-) diff --git a/gpdb-doc/markdown/pxf/col_project.html.md.erb b/gpdb-doc/markdown/pxf/col_project.html.md.erb index 5dbe14f93f4f..6f66c330da06 100644 --- a/gpdb-doc/markdown/pxf/col_project.html.md.erb +++ b/gpdb-doc/markdown/pxf/col_project.html.md.erb @@ -8,10 +8,16 @@ PXF supports column projection, and it is always enabled. With column projection Column projection is automatically enabled for the `pxf` external table protocol. PXF accesses external data sources using different connectors, and column projection support is also determined by the specific connector implementation. The following PXF connector and profile combinations support column projection on read operations: -- PXF Hive Connector, `HiveORC` profile -- PXF JDBC Connector, `Jdbc` profile -- PXF Hadoop and Object Store Connectors, `hdfs:parquet`, `adl:parquet`, `gs:parquet`,`s3:parquet`, and `wasbs:parquet` profiles -- PXF S3 Connector using Amazon S3 Select service, `s3:parquet` and `s3:text` profiles +| Data Source | Connector | Profile(s) | +|-------------|---------------|---------| +| External SQL database | JDBC Connector | Jdbc | +| Hive | Hive Connector | HiveORC, HiveVectorizedORC | +| Hadoop | HDFS Connector | hdfs:parquet | +| Amazon S3 | S3-Compatible Object Store Connectors | s3:parquet | +| Amazon S3 using S3 Select | S3-Compatible Object Store Connectors | s3:parquet, s3:text | +| Google Cloud Storage | GCS Object Store Connector | gs:parquet | +| Azure Blob Storage | Azure Object Store Connector | wasbs:parquet | +| Azure Data Lake | Azure Object Store Connector | adl:parquet | **Note:** PXF may disable column projection in cases where it cannot successfully serialize a query filter; for example, when the `WHERE` clause resolves to a `boolean` type. diff --git a/gpdb-doc/markdown/pxf/filter_push.html.md.erb b/gpdb-doc/markdown/pxf/filter_push.html.md.erb index c1082e9f69fb..fddf46b1e42d 100644 --- a/gpdb-doc/markdown/pxf/filter_push.html.md.erb +++ b/gpdb-doc/markdown/pxf/filter_push.html.md.erb @@ -32,14 +32,7 @@ SET gp_external_enable_filter_pushdown TO 'on'; **Note:** Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database). -PXF accesses data sources using different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF connectors support filter pushdown: - -- Hive Connector, all profiles -- HBase Connector -- JDBC Connector -- S3 Connector using the Amazon S3 Select service to access CSV and Parquet data - -PXF filter pushdown can be used with these data types (connector-specific): +PXF filter pushdown can be used with these data types (connector- and profile-specific): - `INT2`, `INT4`, `INT8` - `CHAR`, `TEXT` @@ -48,14 +41,32 @@ PXF filter pushdown can be used with these data types (connector-specific): - `BOOL` - `DATE`, `TIMESTAMP` (available only with the JDBC connector and the S3 connector when using S3 Select) -You can use PXF filter pushdown with these operators: +You can use PXF filter pushdown with these arithmetic and logical operators (connector- and profile-specific): - `<`, `<=`, `>=`, `>` - `<>`, `=` - `AND`, `OR`, `NOT` -- `IN` operator on arrays of `INT` and `TEXT` (JDBC connector only) - `LIKE` (`TEXT` fields, JDBC connector only) +PXF accesses data sources using profiles exposed by different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF profiles support some aspect of filter pushdown: + +|Profile | <,   >,
      <=,   >=,
      =,  <> | LIKE | IS [NOT] NULL | IN | AND | OR | NOT | +|-------|:------------------------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:| +| Jdbc | Y | Y | Y | Y | Y | Y | Y | Y | N | Y | Y | Y | +| *:parquet | Y1 | N | Y1 | N | Y1 | Y1 | Y1 | +| s3:parquet and s3:text with S3-Select | Y | N | Y | Y | Y | Y | Y | +| HBase | Y | N | Y | N | Y | Y | N | +| Hive | Y2 | N | N | N | Y2 | Y2 | N | +| HiveText | Y2 | N | N | N | Y2 | Y2 | N | +| HiveRC | Y2 | N | N | N | Y2 | Y2 | N | +| HiveORC | Y, Y2 | N | Y | Y | Y, Y2 | Y, Y2 | Y | +| HiveVectorizedORC | Y, Y2 | N | Y | Y | Y, Y2 | Y, Y2 | Y | + +
      1 PXF applies the predicate, rather than the remote system, reducing CPU usage and the memory footprint. +
      2 PXF supports partition pruning based on partition keys. + +PXF does not support filter pushdown for any profile not mentioned in the table above, including: *:avro, *:AvroSequenceFile, *:SequenceFile, *:json, *:text, and *:text:multi. + To summarize, all of the following criteria must be met for filter pushdown to occur: * You enable external table filter pushdown by setting the `gp_external_enable_filter_pushdown` server configuration parameter to `'on'`. diff --git a/gpdb-doc/markdown/pxf/hdfs_parquet.html.md.erb b/gpdb-doc/markdown/pxf/hdfs_parquet.html.md.erb index ab9c96610e23..967ffb7bb0ed 100644 --- a/gpdb-doc/markdown/pxf/hdfs_parquet.html.md.erb +++ b/gpdb-doc/markdown/pxf/hdfs_parquet.html.md.erb @@ -33,26 +33,61 @@ Ensure that you have met the PXF Hadoop [Prerequisites](access_hdfs.html#hadoop_ ## Data Type Mapping -To read and write Parquet primitive data types in Greenplum Database, map Parquet data values to Greenplum Database columns of the same type. The following table summarizes the external mapping rules: +To read and write Parquet primitive data types in Greenplum Database, map Parquet data values to Greenplum Database columns of the same type. + +Parquet supports a small set of primitive data types, and uses metadata annotations to extend the data types that it supports. These annotations specify how to interpret the primitive type. For example, Parquet stores both `INTEGER` and `DATE` types as the `INT32` primitive type. An annotation identifies the original type as a `DATE`. + +### Read Mapping -| Parquet Data Type | PXF/Greenplum Data Type | -|-------------------|-------------------------| -| boolean | Boolean | -| byte_array | Bytea, Text | -| double | Float8 | -| fixed\_len\_byte\_array | Numeric | -| float | Real | -| int\_8, int\_16 | Smallint, Integer | -| int64 | Bigint | -| int96 | Timestamp, Timestamptz | - -
      When writing to Parquet: -
        -
      • PXF localizes a timestamp to the current system timezone and converts it to universal time (UTC) before finally converting to int96.
      • -
      • PXF converts a timestamptz to a UTC timestamp and then converts to int96. PXF loses the time zone information during this conversion.
      • -
      +PXF uses the following data type mapping when reading Parquet data: + +| Parquet Data Type | Original Type | PXF/Greenplum Data Type | +|-------------------|---------------|--------------------------| +| binary (byte_array) | Date | Date | +| binary (byte_array) | Timestamp_millis | Timestamp | +| binary (byte_array) | all others | Text | +| binary (byte_array) | -- | Bytea | +| boolean | -- | Boolean | +| double | -- | Float8 | +| fixed\_len\_byte\_array | -- | Numeric | +| float | -- | Real | +| int32 | Date | Date | +| int32 | Decimal | Numeric | +| int32 | int_8 | Smallint | +| int32 | int_16 | Smallint | +| int32 | -- | Integer | +| int64 | Decimal | Numeric | +| int64 | -- | Bigint | +| int96 | -- | Timestamp | + +**Note**: PXF supports filter predicate pushdown on all parquet data types listed above, *except* the `fixed_len_byte_array` and `int96` types. + +### Write Mapping + +PXF uses the following data type mapping when writing Parquet data: + +| PXF/Greenplum Data Type | Original Type | Parquet Data Type | +|-------------------|---------------|--------------------------| +| Boolean | -- | boolean | +| Bytea | -- | binary | +| Bigint | -- | int64 | +| SmallInt | int_16 | int32 | +| Integer | -- | int32 | +| Real | -- | float | +| Float8 | -- | double | +| Numeric/Decimal | Decimal | fixed\_len\_byte\_array | +| Timestamp1 | -- | int96 | +| Timestamptz2 | -- | int96 | +| Date | utf8 | binary | +| Time | utf8 | binary | +| Varchar | utf8 | binary | +| Text | utf8 | binary | +| OTHERS | -- | UNSUPPORTED | + +
      1 PXF localizes a Timestamp to the current system timezone and converts it to universal time (UTC) before finally converting to int96. +
      2 PXF converts a Timestamptz to a UTC timestamp and then converts to int96. PXF loses the time zone information during this conversion. ## Creating the External Table From a5f3fee1022a05e3e9401da4b45184c6fcd6833e Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Fri, 28 Feb 2020 08:48:00 -0800 Subject: [PATCH 089/102] docs - pxf [cluster] sync now supports a delete option (#9631) * docs - pxf [cluster] sync now supports a delete option * option and argument * replace single dash with unicode * misc edir --- .../markdown/pxf/ref/pxf-cluster.html.md.erb | 18 ++++++++++++++++-- gpdb-doc/markdown/pxf/ref/pxf.html.md.erb | 13 ++++++++----- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb b/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb index 6cc7275f158a..7299fbe6595b 100644 --- a/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb +++ b/gpdb-doc/markdown/pxf/ref/pxf-cluster.html.md.erb @@ -7,7 +7,7 @@ Manage the PXF configuration and the PXF service instance on all Greenplum Datab ## Synopsis ``` pre -pxf cluster +pxf cluster [