Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accumulo-proxy #22 - Reduce Docker image size #23

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions DOCKER.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ You can test if this will work for you by executing the following steps

Start the accumulo-proxy container and enter it
```commandline
docker run -it --rm -p 42424:42424 --network="host" --name accumulo-proxy accumulo-proxy:latest bash;
docker run -it --rm -p 42424:42424 --network="host" --name accumulo-proxy accumulo-proxy:latest sh;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the impact of this, but I do know a lot of Accumulo scripts use bash specifically.

```

Once inside the container, execute the curl command to attempt to connect to the monitor webserver:
Expand Down Expand Up @@ -119,10 +119,10 @@ docker rm accumulo-proxy;
## Troubleshooting
It can often be difficult to know where to start with troubleshooting inside containers, if you need to enter the container without starting the proxy we support this:
```commandline
docker run -it --rm -p 42424:42424 --network="host" --name accumulo-proxy accumulo-proxy:latest bash
docker run -it --rm -p 42424:42424 --network="host" --name accumulo-proxy accumulo-proxy:latest sh
```

The container is very slim so if need be you can add additional tools using `apt`.
The container is very slim so if need be you can add additional tools using `apk`.

If you wish to manually execute the accumulo-proxy in the container you can:
```commandline
Expand All @@ -131,4 +131,4 @@ If you wish to manually execute the accumulo-proxy in the container you can:

Some resources for additional help:
* [Main Accumulo Website](https://accumulo.apache.org/)
* [Contact Us page](https://accumulo.apache.org/contact-us/)
* [Contact Us page](https://accumulo.apache.org/contact-us/)
60 changes: 35 additions & 25 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,18 @@
# See the License for the specific language governing permissions and
# limitations under the License.

FROM openjdk:8
FROM openjdk:8-alpine3.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you select alpine? I was looking at the openjdk tags on dockerhub to see what else was small and saw openjdk:8-jre-slim. It seems to be based on Debian, is a similar size, and was updated more recently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I spotted it to be fair, it's about 3M smaller so might be worth it and as you say it seems to be getting kept up to date.

Note, all values here unless specified otherwise are from the Docker command.

I did a quick rebuild with the new one and this is the sizes:

  • openjdk:8-alpine3.9 1058822286 bytes
  • openjdk:8-jre-slim 1136341123 bytes (+73 MB)

So it was actually larger to use the smaller base image, not what I expected, I did have to add wget and removed the installation of bash from the current Dockerfile as it was already present in 8-jre-slim

On disk the pulled images were different to dockerhub values

  • openjdk:8-alpine3.9 - 105MB (disk) 70.09MB (dockerhub)
  • opendjk:8-jre-slim - 184MB (disk) (dockerhub)

Looking into it, it seems that dockerhub reports the compressed size, not the size on disk once you pull the image.

I did the comparison for the final image sizes:

  • openjdk:8-alpine3.9 base - 1.06GB
  • openjdk:8-jre-slim base - 1.14GB

I tested both builds so they both function the same, perhaps taking the 80MB increase is worth it for being up to date?

What are your thoughts, stick with what we have or update to 8-jre-slim?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of 8-jre-slim because its more up to date and its based on Debian which I think a larger base of contributors and users may be familiar with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice analysis and write up of the analysis.


EXPOSE 42424

WORKDIR /opt/accumulo-proxy

ARG HADOOP_VERSION=3.2.1
ARG ZOOKEEPER_VERSION=3.5.7
ARG ACCUMULO_VERSION=2.0.0
ARG ACCUMULO_PROXY_VERSION=2.0.0-SNAPSHOT

ARG HADOOP_HASH=a57962a24d178193349917730bf95cdc99bde9df
ARG ZOOKEEPER_HASH=619928c8553b62775119e3d7d143a4714a160365
ARG ACCUMULO_HASH=b72bf5c3dcaa25387933a032925046234f30e17a
ARG HADOOP_SHA512_HASH=d62709c3d7144fcaafc60e18d0fa03d7d477cc813e45526f3646030cd87dbf010aeccf3f4ce795b57b08d2884b3a55f91fe9d74ac144992d2dfe444a4bbf34ee
ARG ZOOKEEPER_SHA512_HASH=b9baa1ecb3d4dc0ef648ce7c74da4c5267ee89534c7614b8f27d3b0bc52004dcfbb8cecec810ffb7c8c45053daf8a5e849ce60ba241280fa1e2ab1d3b4990494
ARG ACCUMULO_SHA512_HASH=1e2b822e0fd6ba5293b09203eb0c5cc230e9f111361634b4d5665b0ddd2b28f42d76699cb08aaeff9b3242efd5fe369bfc871a7dc361e935980889bcb7b4568f

# Download from Apache mirrors instead of archive #9
ENV APACHE_DIST_URLS \
Expand All @@ -36,43 +34,55 @@ ENV APACHE_DIST_URLS \
https://www.apache.org/dist/ \
https://archive.apache.org/dist/

ENV HADOOP_HOME /opt/hadoop
ENV ZOOKEEPER_HOME /opt/apache-zookeeper
ENV ACCUMULO_HOME /opt/accumulo

RUN set -eux; \
download_bin() { \
local f="$1"; shift; \
local hash="$1"; shift; \
local distFile="$1"; shift; \
download_verify_and_extract() { \
local expectedHash="$1"; \
local distFile="$2"; \
local extractPath="$3"; \
local symlinkPath="$4"; \
local success=; \
local distUrl=; \
for distUrl in ${APACHE_DIST_URLS}; do \
if wget -nv -O "/tmp/${f}" "${distUrl}${distFile}"; then \
success=1; \
if wget -nv -O "/tmp/download.tar.gz" "${distUrl}${distFile}"; then \
# Checksum the download
echo "${hash}" "/tmp/${f}" | sha1sum -c -; \
echo "${expectedHash} /tmp/download.tar.gz" | sha512sum -c -; \
# Extract the download
mkdir "${extractPath}"; \
tar xzf "/tmp/download.tar.gz" -C "${extractPath}" --strip 1;\
# Symlink
ln -s "${extractPath}" "${symlinkPath}"; \
# Tidy up the download
rm -f "/tmp/download.tar.gz"; \
# Set success now we've done all our checks and tidied up
success=1; \
break; \
fi; \
done; \
[ -n "${success}" ]; \
};\
\
download_bin "apache-zookeeper.tar.gz" "${ZOOKEEPER_HASH}" "zookeeper/zookeeper-${ZOOKEEPER_VERSION}/apache-zookeeper-${ZOOKEEPER_VERSION}-bin.tar.gz"; \
download_bin "hadoop.tar.gz" "$HADOOP_HASH" "hadoop/core/hadoop-${HADOOP_VERSION}/hadoop-$HADOOP_VERSION.tar.gz"; \
download_bin "accumulo.tar.gz" "${ACCUMULO_HASH}" "accumulo/${ACCUMULO_VERSION}/accumulo-${ACCUMULO_VERSION}-bin.tar.gz";
download_verify_and_extract "${ZOOKEEPER_SHA512_HASH}" "zookeeper/zookeeper-${ZOOKEEPER_VERSION}/apache-zookeeper-${ZOOKEEPER_VERSION}-bin.tar.gz" "/opt/apache-zookeeper-${ZOOKEEPER_VERSION}" "${ZOOKEEPER_HOME}"; \
download_verify_and_extract "${HADOOP_SHA512_HASH}" "hadoop/core/hadoop-${HADOOP_VERSION}/hadoop-$HADOOP_VERSION.tar.gz" "/opt/hadoop-${HADOOP_VERSION}" "${HADOOP_HOME}"; \
download_verify_and_extract "${ACCUMULO_SHA512_HASH}" "accumulo/${ACCUMULO_VERSION}/accumulo-${ACCUMULO_VERSION}-bin.tar.gz" "/opt/accumulo-${ACCUMULO_VERSION}" "${ACCUMULO_HOME}" ;

# Install the dependencies into /opt/
RUN tar xzf /tmp/hadoop.tar.gz -C /opt/ && ln -s /opt/hadoop-${HADOOP_VERSION} /opt/hadoop
RUN tar xzf /tmp/apache-zookeeper.tar.gz -C /opt/ && ln -s /opt/apache-zookeeper-${ZOOKEEPER_VERSION}-bin /opt/apache-zookeeper
RUN tar xzf /tmp/accumulo.tar.gz -C /opt/ && ln -s /opt/accumulo-${ACCUMULO_VERSION} /opt/accumulo && sed -i 's/\${ZOOKEEPER_HOME}\/\*/\${ZOOKEEPER_HOME}\/\*\:\${ZOOKEEPER_HOME}\/lib\/\*/g' /opt/accumulo/conf/accumulo-env.sh
# Fix the ZooKeeper classpath for Accumulo
RUN sed -i 's/\${ZOOKEEPER_HOME}\/\*/\${ZOOKEEPER_HOME}\/\*\:\${ZOOKEEPER_HOME}\/lib\/\*/g' /opt/accumulo/conf/accumulo-env.sh

ENV HADOOP_HOME /opt/hadoop
ENV ZOOKEEPER_HOME /opt/apache-zookeeper
ENV ACCUMULO_HOME /opt/accumulo
# Add bash as a dependency for accumulo-proxy and accumulo shell scripts
RUN apk --no-cache add bash

# Add the proxy binary
COPY target/accumulo-proxy-${ACCUMULO_PROXY_VERSION}-bin.tar.gz /tmp/
RUN tar xzf /tmp/accumulo-proxy-${ACCUMULO_PROXY_VERSION}-bin.tar.gz -C /opt/accumulo-proxy --strip 1
ENV ACCUMULO_PROXY_HOME /opt/accumulo-proxy
ADD target/accumulo-proxy-${ACCUMULO_PROXY_VERSION}-bin.tar.gz /opt/
RUN ln -s "/opt/accumulo-proxy-${ACCUMULO_PROXY_VERSION}/" "${ACCUMULO_PROXY_HOME}"

# Ensure Accumulo is on the path.
ENV PATH "${PATH}:${ACCUMULO_HOME}/bin"

WORKDIR ${ACCUMULO_PROXY_HOME}

CMD ["/opt/accumulo-proxy/bin/accumulo-proxy", "-p", "/opt/accumulo-proxy/conf/proxy.properties"]