You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It might be helpful to add arguments to allow for a user to specify a max_crawl_depth (folder depth max) or max_crawl_total (max total number of files). This is not something we need currently, but just a potentially useful addition.
The text was updated successfully, but these errors were encountered:
Have optional max_crawl_depth and max_crawl_total args at the crawler. The state of the crawl (all local queues and in-flight tasks) will be pickled and stored in S3 as a checkpoint of sorts. Then once the service stops, the user can access a 'crawlNext' token that will pick up where the previous 'max' was met. The 'crawlNext' token will be deleted in 24 hours to save space as these queues can get pretty hefty, and the state of a repo could change pretty drastically.
It might be helpful to add arguments to allow for a user to specify a
max_crawl_depth
(folder depth max) ormax_crawl_total
(max total number of files). This is not something we need currently, but just a potentially useful addition.The text was updated successfully, but these errors were encountered: