Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More helpful error messages. #519

Open
pardueaws opened this issue May 2, 2024 · 3 comments
Open

More helpful error messages. #519

pardueaws opened this issue May 2, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@pardueaws
Copy link

pardueaws commented May 2, 2024

Feature name
Meaningful error messages.

Is your feature request related to a problem? Please describe.
Customer has deployed Workload Discovery, but is not seeing all of their resources. We have found errors in the GremlinAppSync file (when searching for #500) but the error message is not helpful.

Describe the feature you'd like to see implemented
Can the errors provide more information about what exactly is failing in the discovery service?

Describe the value this feature will add to AWS Perspective
This would be helpful when users have problems with discovery..

@pardueaws pardueaws added the enhancement New feature or request label May 2, 2024
@svozza
Copy link
Contributor

svozza commented May 2, 2024

Errors around resource missing will be in the ECS logs, the instructions are at the bottom of the page section titled To retrieve the logs for the discovery component.: (https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html).

There is also an extensive flowchart for diagnosing common issues in the troubleshooting section of the README:

(https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html).

Out of interest, has this been deployed in AWS_ORGANIZATION mode? There is a known issue with writes to OpenSearch being dropped on the very first ingestion cycle the discovery process does, which would appear in the UI as missing resources.

@pardueaws
Copy link
Author

pardueaws commented May 2, 2024 via email

@svozza
Copy link
Contributor

svozza commented May 2, 2024

Then it's very likely the last issue I mentioned. To verify:

  1. identify a resource that is missing, for example, an EC2 instance and get the ARN.
  2. Log into the AppSync console and select the Workload Discovery GraphQL API.
  3. Choose Queries from the side panel.
  4. Choose the Login with User Pools button and authenticate with your WD password.
  5. Execute the following GraphQL query with the ARN from step 1:
query MyQuery {
  getResourceGraph(ids: ["<your-arn>"]) {
    edges {
      id
    }
    nodes {
      id
    }
  }
}
  1. Any successful response that isn't empty as below means that the resources are in Neptune but not OpenSearch:
{
  "data": {
    "getResourceGraph": {
      "edges": [],
      "nodes": []
    }
  }
}

The simplest way to rectify this is to clear the Neptune database and when the discovery process runs again, it will repopulate both databases:

  1. Log into the lambda console.
  2. Find the lambda function that writes to Neptune, it will have a name such as <stack-name>-GremlinAppSyncFunction-<ID-string>.
  3. Select the Test tab and create a test event with the following JSON:
{
  "arguments": {
  },
  "source": null,
  "prev": null,
  "info": {
    "parentTypeName": "Mutation",
    "fieldName": "deleteAllResources",
    "variables": {}
  },
  "stash": {}
}
  1. Execute the test event. Depending on how many resources are in Neptune, the lambda function may time out but it should still clear the DB.
  2. Wait for the discovery process to run again in 15 minutes and re-ingest the resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants