-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accumulo-proxy #22 - Reduce Docker image size #23
base: main
Are you sure you want to change the base?
Conversation
cff4e29
to
f8137ac
Compare
After discussion on the main issue I've pulled the alpine change into this pull. This is probably where I'm going to stop on this pull, there's a further discussion to be had around Hadoop docs but for now this pull has reduced the image size from 1.86GB to 1.06GB. |
f8137ac
to
5f09502
Compare
…final image size to 1.46GB from 1.86GB. I also tried to keep the download_verify_and_extract() function readable given it now downloads the TARs, extracts them and useful symlinks in place. I moved the HASH strings to SHA512 to match the apache.org artifacts found along side the binaries. Switched to alpine after discussion on the issue.
5f09502
to
8147a65
Compare
@@ -13,20 +13,18 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
FROM openjdk:8 | |||
FROM openjdk:8-alpine3.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you select alpine? I was looking at the openjdk tags on dockerhub to see what else was small and saw openjdk:8-jre-slim
. It seems to be based on Debian, is a similar size, and was updated more recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I spotted it to be fair, it's about 3M smaller so might be worth it and as you say it seems to be getting kept up to date.
Note, all values here unless specified otherwise are from the Docker command.
I did a quick rebuild with the new one and this is the sizes:
- openjdk:8-alpine3.9 1058822286 bytes
- openjdk:8-jre-slim 1136341123 bytes (+73 MB)
So it was actually larger to use the smaller base image, not what I expected, I did have to add wget and removed the installation of bash from the current Dockerfile as it was already present in 8-jre-slim
On disk the pulled images were different to dockerhub values
- openjdk:8-alpine3.9 - 105MB (disk) 70.09MB (dockerhub)
- opendjk:8-jre-slim - 184MB (disk) (dockerhub)
Looking into it, it seems that dockerhub reports the compressed size, not the size on disk once you pull the image.
I did the comparison for the final image sizes:
- openjdk:8-alpine3.9 base - 1.06GB
- openjdk:8-jre-slim base - 1.14GB
I tested both builds so they both function the same, perhaps taking the 80MB increase is worth it for being up to date?
What are your thoughts, stick with what we have or update to 8-jre-slim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of 8-jre-slim because its more up to date and its based on Debian which I think a larger base of contributors and users may be familiar with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice analysis and write up of the analysis.
@volmasoft on the issue removing the hadoop docs folder was mentioned, but I did not see it. Based on the default Accumulo classpath, maybe everything except |
I wasn't sure if it's allowable to remove it e.g. does the license restrict it, as I'm no expert in this arena I was hoping someone else would be able to provide guidance. Looking at the official license (https://apache.org/licenses/LICENSE-2.0) it appears that we should be fine (see 4. Redistribution) So perhaps we can modify this but if so do we need to put any documentation/license information anywhere? I didn't want to do something in good faith but potentially cause issues for us down the line so hence I haven't included that change currently. |
I think we would be fine making a modification w/o doing anything additional. We could also wget the two needed jars from maven central possibly instead of getting the hadoop tar ball and then trimming it.
I think if anything was done to reduce the Hadoop size, it would make sense to do it in its own PR. |
@volmasoft @keith-turner This PR got lost. Either of you OK with merging as-is? |
@@ -71,7 +71,7 @@ You can test if this will work for you by executing the following steps | |||
|
|||
Start the accumulo-proxy container and enter it | |||
```commandline | |||
docker run -it --rm -p 42424:42424 --network="host" --name accumulo-proxy accumulo-proxy:latest bash; | |||
docker run -it --rm -p 42424:42424 --network="host" --name accumulo-proxy accumulo-proxy:latest sh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the impact of this, but I do know a lot of Accumulo scripts use bash specifically.
It seems like there are some good improvements here. If there is no activity on this PR I may open a new one and try to incorporate some of these changes at some point. |
Updated the Dockerfile to change some of the behavior and reduce the image size to 1.46GB from 1.86GB.
I also tried to keep the download_verify_and_extract() function readable given it now downloads the TARs, extracts them and useful symlinks in place.
I moved the HASH strings to SHA512 to match the apache.org artifacts found along side the binaries.
I've put this up early as there's some discussion on the issue #22