You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Good morning! Here's your coding interview problem for today.
This problem was asked by Google.
Design a system to crawl and copy all of Wikipedia using a distributed network of machines.
More specifically, suppose your server has access to a set of client machines. Your client machines can execute code you have written to access Wikipedia pages, download and parse their data, and write the results to a database.
Some questions you may want to consider as part of your solution are:
How will you reach as many pages as possible?
How can you keep track of pages that have already been visited?
How will you deal with your client machines being blacklisted?
How can you update your database when Wikipedia pages are added or updated?
The text was updated successfully, but these errors were encountered:
Good morning! Here's your coding interview problem for today.
This problem was asked by Google.
Design a system to crawl and copy all of Wikipedia using a distributed network of machines.
More specifically, suppose your server has access to a set of client machines. Your client machines can execute code you have written to access Wikipedia pages, download and parse their data, and write the results to a database.
Some questions you may want to consider as part of your solution are:
The text was updated successfully, but these errors were encountered: