Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Day 354 #703

Open
vaskoz opened this issue Aug 11, 2019 · 0 comments
Open

Day 354 #703

vaskoz opened this issue Aug 11, 2019 · 0 comments
Assignees

Comments

@vaskoz
Copy link
Owner

vaskoz commented Aug 11, 2019

Good morning! Here's your coding interview problem for today.

This problem was asked by Google.

Design a system to crawl and copy all of Wikipedia using a distributed network of machines.

More specifically, suppose your server has access to a set of client machines. Your client machines can execute code you have written to access Wikipedia pages, download and parse their data, and write the results to a database.

Some questions you may want to consider as part of your solution are:

  • How will you reach as many pages as possible?
  • How can you keep track of pages that have already been visited?
  • How will you deal with your client machines being blacklisted?
  • How can you update your database when Wikipedia pages are added or updated?
@vaskoz vaskoz self-assigned this Aug 11, 2019
@vaskoz vaskoz mentioned this issue Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant