Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix github actions build failure #1054

Merged
merged 170 commits into from
Aug 17, 2022

Conversation

wind57
Copy link
Contributor

@wind57 wind57 commented Jul 26, 2022

No description provided.

@ryanjbaxter
Copy link
Contributor

Thanks!

@wind57
Copy link
Contributor Author

wind57 commented Jul 26, 2022

note for myself, things to do here:

  • cache docker images via actions/upload-artifact@v2 and actions/download-artifact@v2
  • check if a certain test phase does not run tests in every module
  • why is the cache not re-used? It fails on a SocketTimeout about a certain junit-platform pom dependency, but shouldn't this one be part of the ~/m2 already? why is it trying to download it again?
  • check if tests per job_instance are actually run (I already did that when I made the PR, but somehow the logs are misleading in this direction)

@@ -188,11 +184,17 @@ jobs:

diff=$((right_bound - left_bound))
sliced_array=("${CLASSNAMES[@]:$left_bound:$diff}")
TEST_ARG=$(echo $sliced_array | sed 's/ /,/g')
TEST_ARG=$(echo ${sliced_array[@]} | sed 's/ /,/g')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bug here, I was only running the first test from a sliced_array, fixed.

@wind57
Copy link
Contributor Author

wind57 commented Jul 27, 2022

just found one more problem here with abstract classes... it is also present in circleci, so need to fix that also.

@@ -1,107 +0,0 @@
/*
* Copyright 2013-2020 the original author or authors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two files were not used, I noticed this by accident

# - find all tests
# - exclude Fabric8IstionIT
# - only take classes that have @Test inside them
# - ignore the ones that have 'abstract class'. we do this because otherwise we would pass
Copy link
Contributor Author

@wind57 wind57 Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this small comment explains the problem, I noticed this by accident and could reproduce locally too... fixed via this logic.

@@ -43,6 +43,11 @@ jobs:
# needed for .withReuse(true) to work
echo "testcontainers.reuse.enable=true" > ~/.testcontainers.properties
- checkout
- restore_cache:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore cache for tests

@wind57
Copy link
Contributor Author

wind57 commented Jul 27, 2022

this gets funner and funner, it seems with 653 tests we are hitting this issue

@wind57
Copy link
Contributor Author

wind57 commented Aug 5, 2022

I am still in the blur what is going on here. The test indeed fails (ActuatorRefreshKafkaIT), for me locally too, so I started to debug it a bit. The test itself does a few things:

  • starts configuration-watcher
  • starts configuration-watcher-it (the test itself)
  • starts an instance of kafka

So we start the application, then create a configmap with the same name and a proper label, this should trigger the informer (it does), in turn this will call applicationEventPublisher::publishEvent that is of type RefreshRemoteApplicationEvent (this happens too). In my understanding, at this point in time, I should see a message in springCloudBus topic, but when I have a lister:

bin/kafka-console-consumer.sh --topic springCloudBus --from-beginning --bootstrap-server localhost:9092

I don't see anything there. Weird. Still trying to figure out what is going on

@@ -140,6 +141,8 @@ void after() throws Exception {
k8SUtils.waitForDeploymentToBeDeleted(SPRING_CLOUD_K8S_CONFIG_WATCHER_IT_DEPLOYMENT_NAME, NAMESPACE);
}

// TODO figure out why this one fails on bus-starter-4.0.0-SNAPSHOT
@Disabled
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am disabling it for now. I've spent a few good days trying to figure out what is going on here, but my limited understanding of bus/binders/functions/kafka/etc did not help. I'll try to spent some more time in the future, as I don't like disabling this at all :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the same test in 2.1.x branch and when I do bin/kafka-console-consumer.sh --topic springCloudBus --from-beginning --bootstrap-server localhost:9092, I do get a message in the kafka topic. interesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can reproduce this locally? What do I need to do to reproduce it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be a bit of mouthful to explain how to reproduce this locally, but in the meanwhile I am on to something (I think). May be you know this off the top of your head.

So in 2.1.xwe get a Bean from BusRefreshAutoConfiguration called : RefreshListener:

    @Bean
    @ConditionalOnProperty(
        value = {"spring.cloud.bus.refresh.enabled"},
        matchIfMissing = true
    )
    @ConditionalOnBean({ContextRefresher.class})
    public RefreshListener refreshListener(ContextRefresher contextRefresher, ServiceMatcher serviceMatcher) {
        return new RefreshListener(contextRefresher, serviceMatcher);
    }

After this one is invoked, the message ends up in the kafka queue.

What is interested is that this Bean is present in main - I can get it from the context, but it's not present in the "applicationListeners" of the app. It's almost like there should be some code that registers it, but it's not. I'm sorry if this is obscure explanation, but I am still trying to figure out...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, the autoconfiguration runs in main though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, it does not even matter if that name is wrong. You would still not see a message in the kafka queue, and that is the main problem. If you look at my example, I do not even have a consumer ( there is no spring-cloud-kubernetes-configuration-watcher-it deployed ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not know if you noticed, but the problem is for sure in a SNAPSHOT version. While this test was failing, the circleci one was passing - and we had a bug there where the cache was not updating with the latest snapshots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking to some people yesterday and there were some changes in Kafka recently that seems to correlate with the timeframe in which we saw these failures. I am looking into it some more. Hopefully will have an update next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I finally nailed down that was happening after I was able to reproduce the problem using a simple Spring Cloud Bus + Kafka app. This exception was being thrown when we called triggerRefresh

java.lang.NoSuchMethodError: 'org.springframework.util.concurrent.ListenableFuture org.springframework.kafka.core.KafkaTemplate.send(org.apache.kafka.clients.producer.ProducerRecord)'
	at org.springframework.integration.kafka.outbound.KafkaProducerMessageHandler.handleRequestMessage(KafkaProducerMessageHandler.java:513) ~[spring-integration-kafka-6.0.0-M4.jar:6.0.0-M4]

The reason why we didn't see it is because the ExecutorService was swallowing it 🤦‍♂️.

I added a try/catch with a log statement around that call so we can catch anything like this in the future.

The reason for the NoSuchMethodError was because there was a chance in Spring Core to ListenableFuture but the Spring Boot snapshots were using Spring Integration Milestone 4 which didn't have the corresponding change to support the change to ListenableFuture. I got the Boot team to move Spring Integration to snapshots and that fixed the underlying problem.

So we should be able to remove the @Ignore from the test now and that build should pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE!! Indeed it builds just fine now :) take a look when you have time

@wind57 wind57 requested a review from ryanjbaxter August 6, 2022 17:36
@ryanjbaxter ryanjbaxter merged commit d817507 into spring-cloud:main Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants