Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail reconfigure cmd on invalid auto subscription #298

Merged
merged 4 commits into from
May 28, 2024

Conversation

dorjesinpo
Copy link
Collaborator

Fix. If auto subscription expression is invalid, the reconfigure admin command should fail

@dorjesinpo dorjesinpo requested a review from a team as a code owner May 21, 2024 22:28
@dorjesinpo dorjesinpo added the bug Something isn't working label May 21, 2024
@dorjesinpo dorjesinpo force-pushed the fix/domain-reconfigure branch 3 times, most recently from ce81937 to 33761a0 Compare May 21, 2024 23:14
@dorjesinpo dorjesinpo force-pushed the fix/domain-reconfigure branch from 33761a0 to 6571134 Compare May 21, 2024 23:21
@dorjesinpo dorjesinpo requested a review from 678098 May 22, 2024 00:00
@@ -686,19 +686,25 @@ int DomainManager::processCommand(mqbcmd::DomainsResult* result,
}
}

DecodeAndUpsertValue unused;
DecodeAndUpsertValue configureResult;
d_configProvider_p->clearCache(domainSp->name());
d_configProvider_p->getDomainConfig(
Copy link
Collaborator

@678098 678098 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do anything, just my observation here

Note that docs for getDomainConfig are incorrect. They state that the method is asynchronous:

    /// Asynchronously get the configuration for the specified `domainName`
    /// and invoke the specified `callback` with the result.

In fact, this method is synchronous and thread safe, protected by mutex.
It means that you are safe to access configureResult just after the getDomainConfig call because you are guaranteed that the method was already executed.

I will update the docs later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Domain::configure calls mqbi::Queue::configure with false as the wait argument.
Even if it was true, we call it from the cluster thread and do not synchronize. There are 3 threads involved here.

The validation is synchronous, yes. The reconfiguration of queues is asynchronous.

result->makeSuccess();

if (configureResult.isError()) {
result->makeError().message() = configureResult.error().d_details;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return -1; in the error branch?

// Validate newConfig.subscriptions()

bsl::size_t size = newConfig.subscriptions().size();
bool result = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather rename the flag to make its meaning more obvious:

Suggested change
bool result = true;
bool success = true;

The problem with result is that it can mean everything: positive and negative result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, let's be more implicit then, like allSubscriptionsAreValid

else {
if (expression.text().length()) {
errorDescription << "Invalid Expression: [ expression: "
<< expression;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< expression;
<< expression << " ]";

}
else {
if (expression.text().length()) {
errorDescription << "Invalid Expression: [ expression: "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rather leave a more explicit comment, so our users can clearly understand why it fails here

Something like
This expression version expects only empty expressions: [ expression: <...>, version: <...> ]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second thought, let's reject unsupported versions.

@@ -429,7 +464,10 @@ int Domain::configure(bsl::ostream& errorDescription,
}

// Validate config. Return early if the configuration is not valid.
if (int rc = validateConfig(errorDescription, d_config, finalConfig)) {
if (int rc = validateConfig(errorDescription,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (int rc = validateConfig(errorDescription,
if (const int rc = validateConfig(errorDescription,

def test_configure_invalid(self, cluster: Cluster):
"""
Configure the priority queue to evaluate auto subscription negatively.
Make sure the queue does not get the message.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these both tests, there are no messages posted.
Does it worth it to extend these tests with producer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much of a value, I think.
The goal is to check if queue is opened or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should probably remove mentions of messages from these tests' docs

] = "invalid expression"

cluster.reconfigure_domain(tc.DOMAIN_PRIORITY_SC, succeed=None)
# TODO: why not succeed=False
Copy link
Collaborator

@678098 678098 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a problem with reconfigure_domain not checking a negative result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we want to make sure the command fails (so the whoever has issued the command can roll back)
The code insists on "processed successfully"

    def command(self, command, succeed=None, timeout=BLOCK_TIMEOUT):
        """
        Send the specified 'cmd' command to the broker, prefixing it with 'CMD '.
        """

        cmd = "CMD " + command
        self.send(cmd)

        return (
            None
            if succeed is None
            else self.capture(f"'{cmd}' processed successfully", timeout=timeout)
        )
        ```

Signed-off-by: dorjesinpo <[email protected]>
@dorjesinpo dorjesinpo force-pushed the fix/domain-reconfigure branch from 48ff5d6 to 2eb05f8 Compare May 22, 2024 14:37
newConfig.subscriptions()[i].expression();

if (mqbconfm::ExpressionVersion::E_VERSION_1 == expression.version()) {
if (expression.text().length()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (expression.text().length()) {
if (!expression.text().empty()) {

I would prefer this for clarity and also in general for collection types it's usually much more efficient to check if a collection is empty than get its length.

<< expression << ", rc: " << context.lastError()
<< ", reason: \"" << context.lastErrorMessage()
<< "\" ]";
result = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just return rc_INVALID_SUBSCRIPTION here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I would prefer to validate and report everything

Comment on lines 146 to 172
for (bsl::size_t i = 0; i < size; ++i) {
const mqbconfm::Expression& expression =
newConfig.subscriptions()[i].expression();

if (mqbconfm::ExpressionVersion::E_VERSION_1 == expression.version()) {
if (expression.text().length()) {
bmqeval::CompilationContext context(allocator);

if (!bmqeval::SimpleEvaluator::validate(expression.text(),
context)) {
errorDescription
<< "Expression validation failed: [ expression: "
<< expression << ", rc: " << context.lastError()
<< ", reason: \"" << context.lastErrorMessage()
<< "\" ]";
allSubscriptionsAreValid = false;
}
}
}
else {
errorDescription
<< "Unsupported version: [ expression: " << expression << " ]";
allSubscriptionsAreValid = false;
}
}

return allSubscriptionsAreValid ? 0 : rc_INVALID_SUBSCRIPTION;
Copy link
Collaborator

@hallfox hallfox May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block is pretty complex, can we break it down into something more like this?

for (Subscriptions::const_iterator it = newConfig.subscriptions().cbegin(),
                                   end = newConfig.subscriptions().cend();
       it != end;
      ++it)
{
    const mqbconfm::Expression& expression = it->expression();
    if (!validateExpression(errorDescription, expression, allocator)) {
        return rc_INVALID_SUBSCRIPTION;
    } 
}
return 0;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we're looking to log every invalid subscription expression (which I see the value of)

bool allSubscriptionsAreValid = true;
for (...) {
    const mqbconfm::Expression& expression = it->expression();
    allSubscriptionsAreValid = allSubscriptionsAreValid && validateExpression(errorDescription, expression, allocator);
}
return allSubscriptionsAreValid;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I may even suggest trying to figure out which expresssions are invalid first before populating errorDescription so instead of getting a wall of

"Expression validation failed: [ expression: "
                        << expression << ", rc: " << context.lastError()
                        << ", reason: \"" << context.lastErrorMessage()
                        << "\" ]";

for each expression, we get one error message with a list of invalid expressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hallfox I believe with this code we do not evaluate validateExpression if the bool flag was set to false before due to short-circuit evaluation with && operator
allSubscriptionsAreValid && validateExpression(errorDescription, expression, allocator);

Copy link
Collaborator

@hallfox hallfox May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah duh, good catch. It should be broken up into two lines

bool isValidExpression = validateExpression(errorDescription, expression, allocator);
allSubscriptionsAreValid = allSubscriptionsAreValid && isValidExpression;

Another reason to filter out invalid expressions first before building the error message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the iterators approach since we don't actually need the exact index of a subscription

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each expression, we get one error message with a list of invalid expressions.

We might have details pointing out why exactly each expression is invalid. For example, if the expression is too complex, or if we have a syntax error. If we print just the expressions, it might make the debug process more difficult

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of expressions is huge, though, we might use a LimitedPrinter instead, or print at most N errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we will see
single 22May2024_13:59:23.696 (139804138272448) ERROR *gmq.tsk.bmqbrkr.bmqbrkr bmqbrkr.m.cpp:324 Error processing command [rc: -2] error = 'Expression validation failed: [ expression: [ version = E_VERSION_1 text = "invalid expression" ], rc: -100, reason: "syntax error, unexpected property at offset 8" ]' config = '[ name = "bmq.test.mmap.priority.sc" mode = [ priority = ] storage = [ domainLimits = [ messages = 2000 messagesWatermarkRatio = 0.8 bytes = 2097152 bytesWatermarkRatio = 0.8 ] queueLimits = [ messages = 1000 messagesWatermarkRatio = 0.8 bytes = 1048576 bytesWatermarkRatio = 0.8 ] config = [ fileBacked = ] ] maxConsumers = 0 maxProducers = 0 maxQueues = 0 msgGroupIdConfig = NULL maxIdleTime = 0 messageTtl = 300 maxDeliveryAttempts = 0 deduplicationTimeMs = 300000 consistency = [ strong = ] subscriptions = [ [ appId = "" expression = [ version = E_VERSION_1 text = "invalid expression" ] ] ] ]'

I think, this is sufficient enough and there is no need to limit since this per admin command

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(there is no such type as mqbconfm::Subscriptions so involving iterator is not going to look prettier)

src/groups/mqb/mqbblp/mqbblp_domain.cpp Outdated Show resolved Hide resolved
Signed-off-by: dorjesinpo <[email protected]>
Copy link
Collaborator

@678098 678098 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

And also, what do you think about merging 2 integration tests in one test? My concern here is that launching a cluster for each test takes 10-15 seconds. If we can change the tests scenario so we can launch cluster once, we can save some time.

In this PR, there are 2 scenarios:
_1:

  • configure invalid auto subscription
  • expect open queue fails
  • reconfigure valid auto subscription
  • expect queue opens

_2:

  • configure valid auto subscription
  • expect queue opens
  • try reconfigure invalid auto subscription - expect admin command fail
  • expect queue opens

These 2 scenarios could be merged into one to save cluster start time:

  • configure invalid auto subscription
  • expect open queue fails
  • reconfigure valid auto subscription
  • expect queue opens
  • try reconfigure invalid auto subscription - expect admin command fail
  • expect queue opens

So basically the test might check that the domain configuration remains in valid state once this state is reached.
If you don't want to increase the scope of this PR, but also agree that this change is good, I can do it myself in a separate PR once this is merged.

@@ -95,18 +95,48 @@ void afterAppIdUnregisteredDispatched(mqbi::Queue* queue,
mqbi::Storage::AppIdKeyPair(appId, mqbu::StorageKey()));
}

/// Validates an application subscription.
bool validdateSubscriptionExpression(bsl::ostream& errorDescription,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 d

Suggested change
bool validdateSubscriptionExpression(bsl::ostream& errorDescription,
bool validateSubscriptionExpression(bsl::ostream& errorDescription,

src/integration-tests/test_auto_subscriptions.py Outdated Show resolved Hide resolved
src/integration-tests/test_auto_subscriptions.py Outdated Show resolved Hide resolved
Signed-off-by: dorjesinpo <[email protected]>
@dorjesinpo dorjesinpo merged commit c5bfaac into main May 28, 2024
17 checks passed
alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024
* Fail reconfigure cmd on invalid auto subscription

Signed-off-by: dorjesinpo <[email protected]>

* Addressing review

Signed-off-by: dorjesinpo <[email protected]>

* addressing review

Signed-off-by: dorjesinpo <[email protected]>

* merging two tests

Signed-off-by: dorjesinpo <[email protected]>

---------

Signed-off-by: dorjesinpo <[email protected]>
alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024
* Fail reconfigure cmd on invalid auto subscription

Signed-off-by: dorjesinpo <[email protected]>

* Addressing review

Signed-off-by: dorjesinpo <[email protected]>

* addressing review

Signed-off-by: dorjesinpo <[email protected]>

* merging two tests

Signed-off-by: dorjesinpo <[email protected]>

---------

Signed-off-by: dorjesinpo <[email protected]>
alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024
* Fail reconfigure cmd on invalid auto subscription

Signed-off-by: dorjesinpo <[email protected]>

* Addressing review

Signed-off-by: dorjesinpo <[email protected]>

* addressing review

Signed-off-by: dorjesinpo <[email protected]>

* merging two tests

Signed-off-by: dorjesinpo <[email protected]>

---------

Signed-off-by: dorjesinpo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants