-
Notifications
You must be signed in to change notification settings - Fork 762
A Quick Guide to Creating a Profile
Alex Osborne edited this page Jul 4, 2018
·
3 revisions
Profiles can be created from either the Heritrix job or profile page. These pages display the details of a job or profile respectively. To create a new profile, choose a job or profile that the new profile will to be based on. Click on the job or profile from the Main Console page. The job/profile page will be displayed. On the bottom of the page, enter the name of the new profile in the "Copy job to" or "Copy profile to" text box. Select the "as profile" checkbox and then click "copy." A new profile will be created. At this point you can configure the profile the same way that a job is configured, by editing the crawler-beans.cxml file.
Note
- A profile will not be modified if the profile or job it was based on is changed.
makeprofile.png (image/png)
makeprofile.png (image/png)
Structured Guides:
User Guide
- Introduction
- New Features in 3.0 and 3.1
- Your First Crawl
- Checkpointing
- Main Console Page
- Profiles
- Heritrix Output
- Common Heritrix Use Cases
- Jobs
- Configuring Jobs and Profiles
- Processing Chains
- Credentials
- Creating Jobs and Profiles
- Outside the User Interface
- A Quick Guide to Creating a Profile
- Job Page
- Frontier
- Spring Framework
- Multiple Machine Crawling
- Heritrix3 on Mac OS X
- Heritrix3 on Windows
- Responsible Crawling
- Adding URIs mid-crawl
- Politeness parameters
- BeanShell Script For Downloading Video
- crawl manifest
- JVM Options
- Frontier queue budgets
- BeanShell User Notes
- Facebook and Twitter Scroll-down
- Deduping (Duplication Reduction)
- Force speculative embed URIs into single queue.
- Heritrix3 Useful Scripts
- How-To Feed URLs in bulk to a crawler
- MatchesListRegexDecideRule vs NotMatchesListRegexDecideRule
- WARC (Web ARChive)
- When taking a snapshot Heritrix renames crawl.log
- YouTube
- H3 Dev Notes for Crawl Operators
- Development Notes
- Spring Crawl Configuration
- Build Box
- Potential Cleanup-Refactorings
- Future Directions Brainstorming
- Documentation Wishlist
- Web Spam Detection for Heritrix
- Style Guide
- HOWTO Ship a Heritrix Release
- Heritrix in Eclipse