Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error message, if returned search HTML does not contain required elements #273

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

phoehnel
Copy link

Change

This change adds a check, if the returned HTML from non-SERP scrapers like scraping robot or proxy contains the expected base structure.

This improves the problem from #272 where parsing fails on a bot-protection popup and returns position = 0.
The error that no results can be parsed on bot-protection still persists, but serpbear will show an error instead of dropping position to zero.

Behavior

  • If the returned HTML is no valid search result: An error is raised
  • If the returned HTML is valid, but the search has 0 results: Behavior unchanged - no Error and results = 0

Screenshot of changed version on error

Note, that the drop in the position graph was generated due to #272 and should not occur anymore with this change.
image

Log of changed version on error

serpbear  | [0] POST /api/refresh?id=5
serpbear  | [0] keywordIDs:  [ 5 ]
serpbear  | [0] START SCRAPE:  mykeword
serpbear  | [0] GET /api/keywords?domain=mydomain.de
serpbear  | [0] [ERROR] Scraped search results do not adhere to expected format. Unable to parse results
serpbear  | [0] [ERROR] Scraping Keyword :  mykeword
serpbear  | [0] [ERROR_MESSAGE]:  Error: [ERROR] Scraped search results do not adhere to expected format. Unable to parse results
serpbear  | [0]     at extractScrapedResult (/app/.next/server/chunks/941.js:322:15)
serpbear  | [0]     at scrapeKeywordFromGoogle (/app/.next/server/chunks/941.js:277:100)
serpbear  | [0]     at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
serpbear  | [0]     at async refreshAndUpdateKeyword (/app/.next/server/chunks/941.js:79:34)
serpbear  | [0]     at async refreshAndUpdateKeywords (/app/.next/server/chunks/941.js:59:37)
serpbear  | [0] [SUCCESS] Updating the Keyword:  mykeword
serpbear  | [0] time taken: 2985.039358ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant