-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker #360
Comments
+1 |
Just noting that if anyone would like to see a Dockerfile merged please submit it as a pull request and include the documentation/examples you feel appropriate. I'm willing merge it and connect it to Docker Hub under the IIPC group but I don't use Docker much myself so you'll need to do the legwork and testing. :-) |
I find myself unable to really stress-test my own docker image. It works for some toy samples but I'm not sure about more involved scenarios and how docker handles this. Mine was more for short-term and low url count crawls. 😃 |
I added the |
So, after a request I added a Now my |
All I had in mind was a a pull request that adds the Dockerfile itself and maybe a section named something like 'Running Heritrix under Docker' with some brief usage instructions to docs/operating.rs. By testing I just meant manually verifying the instructions work not automated tests. :-) |
Ok. I'm working on it. I did not yet add a description on how to build the docker image. Would a I found the following Docker Hub users:
Which should then also be used in the documentation. (instead of just |
Thanks. That looks great. I've merged it and pushed the main and contrib images to iipc/heritrix. I had intended to automate this with the autobuilder but it seems the free tier of that has been discontinued. I'll look into alternative options but I guess it's not too difficult to build them manually after each release. I used the IIPC Docker org because the Heritrix "interim" releases are currently maintained by some members of the IIPC community and several of us (including someone from IA) have access to that org. |
I can take a look at using GH Actions. It seems to me that the tags correspond to the releases. Then, we can probably also transfer all the old images from my hub account to the iipc one, if necessary? And thanks about the IIPC explanation. :-) As for the tags, I had Then, I also added the Docker wiki page. If anyone plans to rename it, please update the link in |
I wrote a Docker file for the current version(s). Maybe you want to look into it and integrate it here.
It works for me but I only have some simple use-cases (like API tests with python3), so I do not know how it performs under stress. And whether users require more configuration options. (But they could theoretically bind-mount other files if required.)
See Docker-Hub: https://hub.docker.com/r/ekoerner/heritrix
My
Dockerfile
(currently in private repository, so I can't provide any link, just the content here)Build it:
docker build --build-arg version=3.4.0-20210923 -t heritrix .
Build
heritrix-contrib
(requires Java 8, with Java 11 (JRE/JDK) some JNI error, maybe related to #265?)docker build --build-arg version=3.4.0-20210923 --build-arg contrib=1 --build-arg java=8-jre -t heritrix-contrib .
Example
docker-compose.yml
(also on DockerHub currently)UPDATE: I added the
-r <jobname>
option to my image on dockerhub. Simply set theJOBNAME=jobname
environment variable to run the jobjobname
. Take care to mount the (preconfigured) job folder into the image, see above. Only works from version 3.4.0-20210803, see pull request #406.UPDATE2: I added a
contrib
image that usesheritrix-contrib
. For now it only includesyoutube-dl
as extra dependency and it only works with Java 8 JRE. Thecontrib
image is only available from version 3.4.0-20210923.UPDATE3: Added a custom user to make it a bit more secure (e. g., no package installs possible anymore). Note that
-b /
is required to make the web UI visible in the docker image.The text was updated successfully, but these errors were encountered: