Day 354 #703

vaskoz · 2019-08-11T16:51:00Z

Good morning! Here's your coding interview problem for today.

This problem was asked by Google.

Design a system to crawl and copy all of Wikipedia using a distributed network of machines.

More specifically, suppose your server has access to a set of client machines. Your client machines can execute code you have written to access Wikipedia pages, download and parse their data, and write the results to a database.

Some questions you may want to consider as part of your solution are:

How will you reach as many pages as possible?
How can you keep track of pages that have already been visited?
How will you deal with your client machines being blacklisted?
How can you update your database when Wikipedia pages are added or updated?

vaskoz self-assigned this Aug 11, 2019

vaskoz mentioned this issue Feb 2, 2020

Day 432 #881

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Day 354 #703

Day 354 #703

vaskoz commented Aug 11, 2019

Day 354 #703

Day 354 #703

Comments

vaskoz commented Aug 11, 2019