More helpful error messages. #519

pardueaws · 2024-05-02T20:00:04Z

Feature name
Meaningful error messages.

Is your feature request related to a problem? Please describe.
Customer has deployed Workload Discovery, but is not seeing all of their resources. We have found errors in the GremlinAppSync file (when searching for #500) but the error message is not helpful.

Describe the feature you'd like to see implemented
Can the errors provide more information about what exactly is failing in the discovery service?

Describe the value this feature will add to AWS Perspective
This would be helpful when users have problems with discovery..

svozza · 2024-05-02T21:25:58Z

Errors around resource missing will be in the ECS logs, the instructions are at the bottom of the page section titled To retrieve the logs for the discovery component.: (https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html).

There is also an extensive flowchart for diagnosing common issues in the troubleshooting section of the README:

(https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html).

Out of interest, has this been deployed in AWS_ORGANIZATION mode? There is a known issue with writes to OpenSearch being dropped on the very first ingestion cycle the discovery process does, which would appear in the UI as missing resources.

pardueaws · 2024-05-02T22:25:08Z

Yes, AWS_ORGANIZATION mode.

…

Sent from my iPhone On May 2, 2024, at 5:26 PM, Stefano Vozza ***@***.***> wrote: Errors around resource missing will be in the ECS logs, the instructions are at the bottom of the page section titled To retrieve the logs for the discovery component.: (https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html). There is also an extensive flowchart for diagnosing common issues in the troubleshooting section of the README: (https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html). Out of interest, has this been deployed in AWS_ORGANIZATION mode? There is a known issue with writes to OpenSearch being dropped on the very first ingestion cycle the discovery process does, which would appear in the UI as missing resources. — Reply to this email directly, view it on GitHub<#519 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BBR2EQLJJBENXJGEOMG5PO3ZAKVPZAVCNFSM6AAAAABHELDKCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJRG4YDKNZSGQ>. You are receiving this because you authored the thread.Message ID: ***@***.***>

svozza · 2024-05-02T22:59:12Z

Then it's very likely the last issue I mentioned. To verify:

identify a resource that is missing, for example, an EC2 instance and get the ARN.
Log into the AppSync console and select the Workload Discovery GraphQL API.
Choose Queries from the side panel.
Choose the Login with User Pools button and authenticate with your WD password.
Execute the following GraphQL query with the ARN from step 1:

query MyQuery {
  getResourceGraph(ids: ["<your-arn>"]) {
    edges {
      id
    }
    nodes {
      id
    }
  }
}

Any successful response that isn't empty as below means that the resources are in Neptune but not OpenSearch:

{
  "data": {
    "getResourceGraph": {
      "edges": [],
      "nodes": []
    }
  }
}

The simplest way to rectify this is to clear the Neptune database and when the discovery process runs again, it will repopulate both databases:

Log into the lambda console.
Find the lambda function that writes to Neptune, it will have a name such as <stack-name>-GremlinAppSyncFunction-<ID-string>.
Select the Test tab and create a test event with the following JSON:

{
  "arguments": {
  },
  "source": null,
  "prev": null,
  "info": {
    "parentTypeName": "Mutation",
    "fieldName": "deleteAllResources",
    "variables": {}
  },
  "stash": {}
}

Execute the test event. Depending on how many resources are in Neptune, the lambda function may time out but it should still clear the DB.
Wait for the discovery process to run again in 15 minutes and re-ingest the resources.

pardueaws added the enhancement New feature or request label May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More helpful error messages. #519

More helpful error messages. #519

pardueaws commented May 2, 2024 •

edited

Loading

svozza commented May 2, 2024

pardueaws commented May 2, 2024 via email

svozza commented May 2, 2024 •

edited

Loading

More helpful error messages. #519

More helpful error messages. #519

Comments

pardueaws commented May 2, 2024 • edited Loading

svozza commented May 2, 2024

pardueaws commented May 2, 2024 via email

svozza commented May 2, 2024 • edited Loading

pardueaws commented May 2, 2024 •

edited

Loading

svozza commented May 2, 2024 •

edited

Loading