Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wr309567 crawl active courses #74

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Wr309567 crawl active courses #74

wants to merge 7 commits into from

Conversation

kristian-94
Copy link
Contributor

@kristian-94 kristian-94 commented Apr 1, 2019

This is another option of limiting the scope of the crawler, which will allow it to be more focused on courses that are active. This only crawls courses that have an enddate in the future if enabled, and adds the options to only crawl courses that have a certain block enabled. This way we don't crawl unnecessary pages of which there could be many on a big site.

Kristian Ringer added 4 commits April 1, 2019 15:07
    We have SQL that will return us a valid queue item, and we don't
    need to iterate through to validate queue items before crawling
    them.
… the queue

        We add all the recent courses once the crawler starts a new
        cycle, this is the appropriate place to add all the seed URL's
        of each new course we want to start crawling.
kristian-94 and others added 3 commits April 2, 2019 12:18
  - We make a check before parsing the html to check if this is a
  recent course, so we don't need to have a different queue query.
@brendanheywood brendanheywood changed the title Wr309567 Wr309567 crawl active courses Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant