Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidDataException Unable to find deck with gatherling id '84479' -- recurrence of the gatherling-deck-changed-id problem #7072

Closed
vorpal-buildbot opened this issue Feb 23, 2020 · 8 comments
Labels
triage Used by bot to label unlabeled tasks, will be removed automatically upon labeling

Comments

@vorpal-buildbot
Copy link
Contributor

Reported on Discord by bakert#2193

@triage-new-issues triage-new-issues bot added the triage Used by bot to label unlabeled tasks, will be removed automatically upon labeling label Feb 23, 2020
@bakert
Copy link
Member

bakert commented Feb 24, 2020

This may not be a recurrence of that problem as we have this entry in our db:

| 68137 |      3976 |         2 | 84479      | Deck - Guildgate Control |   1582396200 |   1582410504 |           3139 | https://gatherling.com/deck.php?mode=view&id=84479 |          308 | NULL         | NULL          |  NULL | NULL          | NULL                |     11 | 4a5ae1c65f944e733b5b2ffbba6dcf5cac4003e3 |       0 |        1 |

Meanwhile this page loads just fine and shows a deck:

https://gatherling.com/deck.php?mode=view&id=84479

image

Whereas with the former bug I'd expect the URL in the db to lead us to a blank page with id not known.

@bakert
Copy link
Member

bakert commented Feb 24, 2020

I am not able to repro this bug on local. Local finds the deck and adds it fine and passes over it in subsequent runs fine too.

@bakert
Copy link
Member

bakert commented Feb 24, 2020

The problem seems to be that https://gatherling.com/deck.php?mode=view&id=68249 has no matches in the db. The competition is partially scraped.

Edit: that's a pauper deck I'm not sure what I meant here.

@bakert
Copy link
Member

bakert commented Feb 24, 2020

First occurrence of the error is Sat, Feb 22, 11:28 PM (2 days ago) GMT so the failure that caused the partial scrape is probably an hour before that. Note that the PD server is on Australian time.

@bakert
Copy link
Member

bakert commented Feb 25, 2020

That's 10:28am AEDT on Feb 23.

There is nothing suspicious in the log at that time. But does the scraper log to uwsgi journal?

@bakert
Copy link
Member

bakert commented Feb 27, 2020

scraper.log just logs the last run so nothing interesting there.

@bakert
Copy link
Member

bakert commented Feb 27, 2020

OK so the problem is that https://pennydreadfulmagic.com/decks/68137/ (Gatherling identifier 84479) has an id in the decksite db and matches in the decksite db. Then when we try to add_ids in the scraper to the match between Gatherling id 84489 and Gatherling id 84479 we fail because 84479 is not in our list because it already has id and matches.

MariaDB [decksite]> SELECT * FROM deck WHERE identifier = '84489';
Empty set (0.32 sec)

@bakert
Copy link
Member

bakert commented Feb 27, 2020

84489 was 84486 when we scraped the tournament. This is the root cause of the issue. Gatherling ids should be stable. See PennyDreadfulMTG/gatherling#273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Used by bot to label unlabeled tasks, will be removed automatically upon labeling
Projects
None yet
Development

No branches or pull requests

2 participants