Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Each Nutch crawl to its own ES index #745

Open
ahmadia opened this issue Oct 30, 2015 · 2 comments
Open

Each Nutch crawl to its own ES index #745

ahmadia opened this issue Oct 30, 2015 · 2 comments
Milestone

Comments

@ahmadia
Copy link
Contributor

ahmadia commented Oct 30, 2015

Right now we're just dumping everything into nutch. Well, we're already reconfiguring Nutch for each crawl when we do visualization, should I have each nutch crawl dump into a different index?

@ahmadia ahmadia added this to the v0.4 milestone Oct 30, 2015
@ahmadia
Copy link
Contributor Author

ahmadia commented Oct 30, 2015

Discussion with Katrina in Flowdock. Ideal would be one index per project, and then a crawl_id field. I don't think Nutch can do the latter, but I'll look at what options are available to the indexer.

@ahmadia
Copy link
Contributor Author

ahmadia commented Nov 2, 2015

I'm going to punt this to 0.5 since we can't control Ache for this yet and it's a little late in 0.4 to be mucking with our data model.

@ahmadia ahmadia modified the milestones: v0.5, v0.4 Nov 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant