Skip to content

Commit

Permalink
simplify AWS setup
Browse files Browse the repository at this point in the history
  • Loading branch information
galargh committed Mar 17, 2022
1 parent 4e05e28 commit 72b52fe
Show file tree
Hide file tree
Showing 27 changed files with 201 additions and 449 deletions.
72 changes: 0 additions & 72 deletions .github/workflows/build.yml

This file was deleted.

5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,8 @@ node_modules
/kiwix-tools

bin/zimdump

*.tfstate
*.tfstate.*
*.terraform
*.terraform.*
48 changes: 24 additions & 24 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
# This Dockerfile creates a self-contained image in which mirrorzim.sh can be executed
# This Dockerfile creates a self-contained image in which mirrorzim.sh can be executed.
# It also runs ipfs daemon.
#
# You can build the image as follows (remember to use this repo as context for the build):
# docker build . -f Dockerfile -t distributed-wikipedia-mirror
# docker build . --platform=linux/amd64 -f Dockerfile -t distributed-wikipedia-mirror
#
# You can then run the container anywhere as follows
# docker run --rm -v $(pwd)/snapshots:/github/workspace/snapshots -v $(pwd)/tmp:/github/workspace/tmp distributed-wikipedia-mirror <mirrorzim.sh arguments>
# NOTE(s):
# - volume attached at /github/workspace/snapshots will contain downloaded zim files after the run
# - volume attached at /github/workspace/tmp will contain created website directories after the run
# You can then run the container anywhere as follows:
# docker run --ulimit nofile=65536:65536 -p 4001:4001/tcp -p 4001:4001/udp distributed-wikipedia-mirror <mirrorzim_arguments>

FROM openzim/zim-tools:3.1.0 AS openzim
FROM stedolan/jq:latest AS jq
FROM openzim/zim-tools:3.1.0 AS zimdump
FROM ipfs/go-ipfs:v0.12.0 AS ipfs
FROM node:16

FROM node:16.14.0-buster-slim
RUN apt-get update && apt-get install --no-install-recommends --assume-yes rsync moreutils

RUN apt update && apt upgrade && apt install -y curl wget rsync
COPY --from=jq /usr/local/bin/jq /usr/local/bin/
COPY --from=zimdump /usr/local/bin/zimdump /usr/local/bin/
COPY --from=ipfs /usr/local/bin/ipfs /usr/local/bin/

COPY --from=openzim /usr/local/bin/zimdump /usr/local/bin
COPY assets /root/assets
COPY bin /root/bin
COPY src /root/src
COPY tools /root/tools
COPY mirrorzim.sh package.json tsconfig.json /root/

COPY tools/docker_entrypoint.sh /usr/local/bin
RUN mkdir /root/snapshots /root/tmp
RUN cd /root && yarn

RUN mkdir -p /github/distributed-wikipedia-mirror
RUN mkdir -p /github/distributed-wikipedia-mirror/snapshots
RUN mkdir -p /github/distributed-wikipedia-mirror/tmp
RUN mkdir -p /github/workspace
EXPOSE 4001/tcp
EXPOSE 4001/udp

COPY . /github/distributed-wikipedia-mirror

RUN cd /github/distributed-wikipedia-mirror && yarn

VOLUME [ "/github/workspace" ]

WORKDIR /github/distributed-wikipedia-mirror
ENTRYPOINT [ "docker_entrypoint.sh" ]
WORKDIR /root
ENTRYPOINT [ "tools/entrypoint.sh" ]
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,21 +254,25 @@ $ ./mirrorzim.sh --languagecode=cu --wikitype=wikipedia --hostingdnsdomain=cu.wi
## Docker build

A `Dockerfile` with all the software requirements is provided.
For now it is only a handy container for running the process on non-Linux
systems or if you don't want to pollute your system with all the dependencies.
In the future it will be end-to-end blackbox that takes ZIM and spits out CID
and repo.
It is an end-to-end blackbox that takes mirrorzim.sh arguments, spits out CID
and runs IPFS daemon.

To build the docker image:
To run the publicly available docker image:

```sh
docker build . -t distributed-wikipedia-mirror-build
docker run --ulimit nofile=65536:65536 -p 4001:4001/tcp -p 4001:4001/udp public.ecr.aws/c4h1q7d1/distributed-wikipedia-mirror:latest <mirrorzim_arguments>
```

To use it as a development environment:
Alternatively, to build the docker image:

```sh
docker run -it -v $(pwd):/root/distributed-wikipedia-mirror --net=host --entrypoint bash distributed-wikipedia-mirror-build
docker build . --platform=linux/amd64 -f Dockerfile -t distributed-wikipedia-mirror
```

And then, to run it:

```sh
docker run --ulimit nofile=65536:65536 -p 4001:4001/tcp -p 4001:4001/udp distributed-wikipedia-mirror <mirrorzim_arguments>
```

# How to Help
Expand Down Expand Up @@ -340,7 +344,3 @@ We are working on improving deduplication between snapshots, but for now YMMV.
## Code

If you would like to contribute more to this effort, look at the [issues](https://github.com/ipfs/distributed-wikipedia-mirror/issues) in this github repo. Especially check for [issues marked with the "wishlist" label](https://github.com/ipfs/distributed-wikipedia-mirror/labels/wishlist) and issues marked ["help wanted"](https://github.com/ipfs/distributed-wikipedia-mirror/labels/help%20wanted).

## GitHub Actions Workflow

The GitHub Actions workflow that is available in this repository takes information about the wiki website that you want to mirror, downloads its' zim, unpacks it, converts it to a website and uploads it to S3 as a tar.gz package which is publicly accessible.
50 changes: 0 additions & 50 deletions action.yml

This file was deleted.

29 changes: 11 additions & 18 deletions mirrorzim.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ usage() {
echo " [--hostingdnsdomain=<HOSTING_DNS_DOMAIN>]"
echo " [--hostingipnshash=<HOSTING_IPNS_HASH>]"
echo " [--mainpageversion=<MAIN_PAGE_VERSION>]"
echo " [--push=<true|false>]"
echo ""
echo "OPTIONS"
echo ""
Expand All @@ -28,7 +27,6 @@ usage() {
echo " -d, --hostingdnsdomain string - the DNS domain name the mirror will be hosted at e.g. tr.wikipedia-on-ipfs.org"
echo " -i, --hostingipnshash string - the IPNS hash the mirror will be hosted at e.g. QmVH1VzGBydSfmNG7rmdDjAeBZ71UVeEahVbNpFQtwZK8W"
echo " -v, --mainpageversion string - an override hack used on Turkish Wikipedia, it sets the main page version as there are issues with the Kiwix version id"
echo " -p, --push boolean - push to local ipfs instance (defaults to true)"
exit 2
}

Expand Down Expand Up @@ -68,10 +66,6 @@ case $i in
MAIN_PAGE_VERSION="${i#*=}"
shift
;;
-p=*|--push=*)
PUSH="${i#*=}"
shift
;;
--default)
DEFAULT=YES
shift
Expand Down Expand Up @@ -116,10 +110,6 @@ if [ -z ${MAIN_PAGE_VERSION+x} ]; then
MAIN_PAGE_VERSION=""
fi

if [ -z ${PUSH+x} ]; then
PUSH="true"
fi

printf "\nEnsure zimdump is present...\n"
PATH=$PATH:$(realpath ./bin)
which zimdump &> /dev/null || (curl --progress-bar -L https://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64-3.0.0.tar.gz | tar -xvz --strip-components=1 -C ./bin zim-tools_linux-x86_64-3.0.0/zimdump && chmod +x ./bin/zimdump)
Expand Down Expand Up @@ -154,11 +144,14 @@ node ./bin/run $TMP_DIRECTORY \
${HOSTING_IPNS_HASH:+--hostingipnshash=$HOSTING_IPNS_HASH} \
${MAIN_PAGE_VERSION:+--mainpageversion=$MAIN_PAGE_VERSION}

if [[ "$PUSH" == "true" ]]; then
./tools/add_website_to_ipfs.sh "$ZIM_FILE" "$TMP_DIRECTORY" "-p"
else
printf "\n\n-------------------------\nD O N E !\n-------------------------\n"
printf "ZIM: $ZIM_FILE\n"
printf "TMP: $TMP_DIRECTORY"
printf "\n-------------------------\n"
fi
printf "\nAdding the processed tmp directory to IPFS\n(this part may take long time on a slow disk):\n"
CID=$(ipfs add -r --cid-version 1 --pin=false --offline -Q -p $TMP_DIRECTORY)
MFS_DIR="/${ZIM_FILE}__$(date +%F_%T)"

# pin by adding to MFS under a meaningful name
ipfs files cp /ipfs/$CID "$MFS_DIR"

printf "\n\n-------------------------\nD O N E !\n-------------------------\n"
printf "MFS: $MFS_DIR\n"
printf "CID: $CID"
printf "\n-------------------------\n"
3 changes: 0 additions & 3 deletions packer/README.md

This file was deleted.

50 changes: 0 additions & 50 deletions packer/provisioner.sh

This file was deleted.

Loading

0 comments on commit 72b52fe

Please sign in to comment.