Skip to content

Commit

Permalink
docs: ways one can access mirror (#124)
Browse files Browse the repository at this point in the history
* docs: ways one can access mirror
* docs: reword the existing mirrors section (#125)

Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Piotr Galar <[email protected]>
  • Loading branch information
lidel and galargh authored Mar 25, 2022
1 parent 0b87f67 commit d099c96
Showing 1 changed file with 19 additions and 7 deletions.
26 changes: 19 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ Putting Wikipedia Snapshots on IPFS and working towards making it fully read-wri

## Existing Mirrors

There are various ways one can access the mirrors: through a [DNSLink](https://docs.ipfs.io/concepts/glossary/#dnslink), public [gateway](https://docs.ipfs.io/concepts/glossary/#gateway) or directly with a [CID](https://docs.ipfs.io/concepts/glossary/#cid).

You can [read all about the available methods here](https://blog.ipfs.io/2021-05-31-distributed-wikipedia-mirror-update/#improved-access-to-wikipedia-mirrors).

### DNSLinks

- https://en.wikipedia-on-ipfs.org
- https://tr.wikipedia-on-ipfs.org
- https://my.wikipedia-on-ipfs.org
Expand All @@ -19,7 +25,13 @@ Putting Wikipedia Snapshots on IPFS and working towards making it fully read-wri
- https://ru.wikipedia-on-ipfs.org
- https://fa.wikipedia-on-ipfs.org

Each mirror has a link to original [Kiwix](https://kiwix.org) ZIM archive in the footer.
### CIDs

The latest CIDs that the DNSLinks point at can be found in [snapshot-hashes.yml](snapshot-hashes.yml).

---

Each mirror has a link to the original [Kiwix](https://kiwix.org) ZIM archive in the footer. It can be dowloaded and opened offline with the [Kiwix Reader](https://www.kiwix.org/en/download/).

## Table of Contents

Expand Down Expand Up @@ -144,7 +156,7 @@ Make sure you use go-ipfs 0.12 or later, it has automatic sharding of big direct

### Step 3: Download the latest snapshot from kiwix.org

Source of ZIM files is at https://download.kiwix.org/zim/wikipedia/
Source of ZIM files is at https://download.kiwix.org/zim/wikipedia/
Make sure you download `_all_maxi_` snapshots, as those include images.

To automate this, you can also use the `getzim.sh` script:
Expand Down Expand Up @@ -172,8 +184,8 @@ $ zimdump dump ./snapshots/wikipedia_tr_all_maxi_2021-01.zim --dir ./tmp/wikiped

> ### ℹ️ ZIM's main page
>
> Each ZIM file has "main page" attribute which defines the landing page set for the ZIM archive.
> It is often different than the "main page" of upstream Wikipedia.
> Each ZIM file has "main page" attribute which defines the landing page set for the ZIM archive.
> It is often different than the "main page" of upstream Wikipedia.
> Kiwix Main page needs to be passed in the next step, so until there is an automated way to determine "main page" of ZIM, you need to open ZIM in Kiwix reader and eyeball the name of the landing page.
### Step 5: Convert the unpacked zim directory to a website with mirror info
Expand Down Expand Up @@ -250,7 +262,7 @@ Make sure at least two full reliable copies exist before updating DNSLink.

## mirrorzim.sh

It is possible to automate steps 3-6 via a wrapper script named `mirrorzim.sh`.
It is possible to automate steps 3-6 via a wrapper script named `mirrorzim.sh`.
It will download the latest snapshot of specified language (if needed), unpack it, and add it to IPFS.

To see how the script behaves try running it on one of the smallest wikis, such as `cu`:
Expand All @@ -261,9 +273,9 @@ $ ./mirrorzim.sh --languagecode=cu --wikitype=wikipedia --hostingdnsdomain=cu.wi

## Docker build

A `Dockerfile` with all the software requirements is provided.
A `Dockerfile` with all the software requirements is provided.
For now it is only a handy container for running the process on non-Linux
systems or if you don't want to pollute your system with all the dependencies.
systems or if you don't want to pollute your system with all the dependencies.
In the future it will be end-to-end blackbox that takes ZIM and spits out CID
and repo.

Expand Down

0 comments on commit d099c96

Please sign in to comment.