Skip to content

Commit

Permalink
Fix stuck query after cancel or termination when segment is not respo…
Browse files Browse the repository at this point in the history
…nding (#948)

Problem:
The following scenario caused a stuck state of a coordinator backend process:
1. At some moment one of the segments stopped processing incoming requests (the
reason itself is not important, as sometimes bad things may happen with any
segment, and the system should be able to recover).
2. A query, which dispatches requests to segments, was executed (for example,
select from 'gp_toolkit.gp_resgroup_status_per_segment'). As one of the segments
was not responding, the coordinator hanged in 'checkDispatchResult' (it is
expected, and can be handled by the FTS).
3. The stuck query was canceled or terminated (by 'pg_cancel_backend' or
'pg_terminate_backend'). It didn't help to return from the stuck state, but
after this step the query became completely unrecoverable. Even after FTS had
detected the malfunction segment and had promoted the mirror, the query was
still hanging forever.

Expected behavior is that after FTS mirror promotion, all stuck queries are
unblocked and canceled successfully.

Note: if FTS mirror promotion happened before step 3, the FTS canceled the
query successfully.

Root cause:
During cancel or termination, the coordinator tried to do abort of the current
transaction, and it hanged in the function 'internal_cancel' (called from
PQcancel) on 'poll' system call. It has a timeout of 660 seconds, and, moreover,
after the timeout expired, it looped forever trying to do 'poll' again. So, if
the socket was opened on the segment side, but nobody replied (as the segment
process became not available for some reason), 'internal_cancel' had no way to
return.

Fix:
Once FTS promotes a mirror, it sends the  signal to the coordinator
postmaster (with PMSIGNAL_FTS_PROMOTED_MIRROR reason). On receiving of the
signal, the coordinator postmaster sends the SIGUSR1 signal (with
PROCSIG_FTS_PROMOTED_MIRROR reason) to all of its usual backends. Once the
backend receives the signal, if it is in the state of cancelling or terminating
of the query, it sets a flag in libpq. 'internal_cancel' checks this flag before
calling the 'poll' system call. If it is set, it will return with an error.
Thus:
a. if the FTS promotion happens before cancel/terminate, the query will be
canceled by the old logic;
b. if the FTS promotion happens after cancel/terminate, but before the
'internal_cancel' calls the 'poll', 'internal_cancel' will return an error
without calling the 'poll';
c. if the FTS promotion happens when the 'poll' is already called, the 'poll'
will return EINTR (as the SIGUSR1 was received), and a new 'poll' will not be
called, as the flag is set.

(cherry picked from commit 96ed1ad)

Differences comparing to the original commit:
1. exec_cmd_on_segments() is placed inside setup.sql file instead of
server_helper.sql, as the latter doesn't exist in GPDB 7 (and all functions from
this file are located in setup.sql)
2. The check 'GpIdentity.segindex == MASTER_CONTENT_ID' in 'sigusr1_handler()'
is replaced with 'IS_QUERY_DISPATCHER()', which is equivalent. It is done,
because MASTER_CONTENT_ID was renamed to COORDINATOR_CONTENT_ID in GPDB 7.
3. 'TermSignalReceived' global var was removed in GPDB 7 (when it was merged
with PostgreSQL 9.5), so it is brought back.
4. exec_cmd_on_segments() is reworked due to limitation in GPDB 7. Previous
version failed with "ERROR:  EXECUTE ON ALL SEGMENTS is only supported for
set-returning functions", so now it returns a table with TEXT field instead of
a single string. Also, exec_cmd_on_segments() is changed to use 'plpython3u'
language instead of 'plpythonu'.
5. Test is updated to work on GPDB 7.
6. File 'isolation2_resgroup_schedule' doesn't exist anymore, so the test is
added to 'isolation2_resgroup_v1_schedule' and 'isolation2_resgroup_v2_schedule'
files.
  • Loading branch information
whitehawk committed Nov 26, 2024
1 parent fb35e05 commit 8001db6
Show file tree
Hide file tree
Showing 17 changed files with 563 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/backend/cdb/cdbutil.c
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,9 @@ cdbcomponent_cleanupIdleQEs(bool includeWriter)
}
}

/* reset flag in libpq, that is used to avoid stuck in a loop in PQcancel*/
PQbypassConnCloseAtCancel(false);

return;
}

Expand Down
2 changes: 2 additions & 0 deletions src/backend/fts/ftsprobe.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
#include "postmaster/fts.h"
#include "postmaster/ftsprobe.h"
#include "postmaster/postmaster.h"
#include "storage/pmsignal.h"
#include "utils/builtins.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
Expand Down Expand Up @@ -1254,6 +1255,7 @@ processResponse(fts_context *context)
primary->config->segindex, primary->config->dbid);
updateSegmentDownStatus(primary,true, context->has_mirrors);
ftsInfo->state = FTS_RESPONSE_PROCESSED;
SendPostmasterSignal(PMSIGNAL_FTS_PROMOTED_MIRROR);
break;
case FTS_SYNCREP_OFF_SUCCESS:
elogif(gp_log_fts >= GPVARS_VERBOSITY_VERBOSE, LOG,
Expand Down
23 changes: 23 additions & 0 deletions src/backend/postmaster/postmaster.c
Original file line number Diff line number Diff line change
Expand Up @@ -5870,6 +5870,29 @@ sigusr1_handler(SIGNAL_ARGS)
signal_child(DtxRecoveryPID(), SIGINT);
}

if (CheckPostmasterSignal(PMSIGNAL_FTS_PROMOTED_MIRROR))
{
/*
* Notify all child coordinator backends that FTS has promoted a mirror.
*/
if (pmState == PM_RUN &&
IS_QUERY_DISPATCHER())
{
dlist_iter iter;
dlist_foreach(iter, &BackendList)
{
Backend *bp = dlist_container(Backend, elem, iter.cur);

if (bp->dead_end || !(BACKEND_TYPE_NORMAL & bp->bkend_type))
continue;

SendProcSignal(bp->pid,
PROCSIG_FTS_PROMOTED_MIRROR,
InvalidBackendId);
}
}
}

/*
* Try to advance postmaster's state machine, if a child requests it.
*
Expand Down
21 changes: 21 additions & 0 deletions src/backend/storage/ipc/procsignal.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "access/parallel.h"
#include "cdb/cdbvars.h"
#include "commands/async.h"
#include "libpq-fe.h"
#include "miscadmin.h"
#include "replication/walsender.h"
#include "storage/ipc.h"
Expand Down Expand Up @@ -273,6 +274,23 @@ QueryFinishHandler(void)
}
}

/*
* Coordinator postmaster signals that FTS has detected a failed segment and
* promoted the mirror.
*/
static void
FtsPromotedMirrorHandler(void)
{
/*
* In case the promotion is done during cancel or termination of a query,
* there is a very high chance that libpq will be stuck forever trying to
* wait cancel confirmation from the segment, which is not responding. Code
* below handles this case.
*/
if (QueryCancelCleanup || TermSignalReceived)
PQbypassConnCloseAtCancel(true);
}

/*
* procsignal_sigusr1_handler - handle SIGUSR1 signal.
*/
Expand Down Expand Up @@ -320,6 +338,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RESOURCE_GROUP_MOVE_QUERY))
HandleMoveResourceGroup();

if (CheckProcSignal(PROCSIG_FTS_PROMOTED_MIRROR))
FtsPromotedMirrorHandler();

SetLatch(MyLatch);

latch_sigusr1_handler();
Expand Down
1 change: 1 addition & 0 deletions src/backend/tcop/postgres.c
Original file line number Diff line number Diff line change
Expand Up @@ -3553,6 +3553,7 @@ die(SIGNAL_ARGS)
{
InterruptPending = true;
ProcDiePending = true;
TermSignalReceived = true;
}

/* If we're still here, waken anything waiting on the process latch */
Expand Down
2 changes: 2 additions & 0 deletions src/backend/utils/init/globals.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ volatile sig_atomic_t QueryCancelCleanup = false;
volatile sig_atomic_t QueryCancelPending = false;
volatile sig_atomic_t QueryFinishPending = false;

volatile bool TermSignalReceived = false;

/*
* GPDB: Make these signed integers (instead of uint32) to detect garbage
* negative values.
Expand Down
1 change: 1 addition & 0 deletions src/include/miscadmin.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;

/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile bool TermSignalReceived;
extern PGDLLIMPORT volatile int32 InterruptHoldoffCount;
extern PGDLLIMPORT volatile int32 QueryCancelHoldoffCount;
extern PGDLLIMPORT volatile int32 CritSectionCount;
Expand Down
2 changes: 2 additions & 0 deletions src/include/storage/pmsignal.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ typedef enum
PMSIGNAL_WAKEN_DTX_RECOVERY, /* wake up dtx recovery to abort dtx xacts */
PMSIGNAL_DTM_RECOVERED, /* distributed recovery completed */

PMSIGNAL_FTS_PROMOTED_MIRROR, /* FTS has detected failed primary and promoted mirror*/

NUM_PMSIGNALS /* Must be last value of enum! */
} PMSignalReason;

Expand Down
2 changes: 2 additions & 0 deletions src/include/storage/procsignal.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ typedef enum
PROCSIG_QUERY_FINISH, /* query finish */
PROCSIG_RESOURCE_GROUP_MOVE_QUERY, /* move query to a new resource group */

PROCSIG_FTS_PROMOTED_MIRROR, /* FTS has detected failed primary and promoted mirror*/

NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;

Expand Down
31 changes: 31 additions & 0 deletions src/interfaces/libpq/fe-connect.c
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,8 @@ static const PQEnvironmentOption EnvironmentOptions[] =
static const char uri_designator[] = "postgresql://";
static const char short_uri_designator[] = "postgres://";

static bool bypass_conn_close_at_cancel = false;

static bool connectOptions1(PGconn *conn, const char *conninfo);
static bool connectOptions2(PGconn *conn);
static int connectDBStart(PGconn *conn);
Expand Down Expand Up @@ -4369,6 +4371,15 @@ PQfreeCancel(PGcancel *cancel)
free(cancel);
}

/*
* PQbypassConnCloseAtCancel:
* set a flag, that allows to fix an issue with internal hanging in
* 'internal_cancel'. For details refer to comments in 'internal_cancel'.
*/
void PQbypassConnCloseAtCancel(pqbool bypass)
{
bypass_conn_close_at_cancel = bypass;
}

/*
* PQcancel and PQrequestCancel: attempt to request cancellation of the
Expand Down Expand Up @@ -4462,6 +4473,26 @@ internal_cancel(SockAddr *raddr, int be_pid, int be_key,
*/
#ifndef WIN32
retry5:

/*
* GPDB: in case this function is called by the coordinator process to
* communicate with the segments (when performing cancel/terminate of a
* backend) and one of the segments has hanged keeping the listening socket
* open, we will be stuck in this function for infinite amount of time
* (hanging on the 'poll' in a loop). The check below helps to avoid this
* situation:
* 1. The flag 'bypass_conn_close_at_cancel' is set from a signal handler.
* 2. The signal is originally initiated by the FTS, which detects that a
* segment went down.
* 3. If the signal comes before the 'poll' is called, it will be just
* skipped and this function will return an error.
* 4. If the signal comes after the 'poll' is called, the 'poll' will be
* interrupted with EINTR, and the next iteration will be skipped and this
* function will return an error.
*/
if (bypass_conn_close_at_cancel)
goto cancel_errReturn;

pollFds[0].fd = tmpsock;
pollFds[0].events = POLLIN;
pollFds[0].revents = 0;
Expand Down
2 changes: 2 additions & 0 deletions src/interfaces/libpq/libpq-fe.h
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,8 @@ extern void PQfreeCancel(PGcancel *cancel);
/* issue a cancel request */
extern int PQcancel(PGcancel *cancel, char *errbuf, int errbufsize);

extern void PQbypassConnCloseAtCancel(pqbool bypass); /* GPDB only */

/* issue a finsh request */
extern int PQrequestFinish(PGcancel *cancel, char *errbuf, int errbufsize);

Expand Down
Loading

0 comments on commit 8001db6

Please sign in to comment.