diff --git a/README.md b/README.md index 58fe7ac..3d69500 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ Putting Wikipedia Snapshots on IPFS and working towards making it fully read-wri - https://my.wikipedia-on-ipfs.org - https://ar.wikipedia-on-ipfs.org - https://zh.wikipedia-on-ipfs.org +- https://uk.wikipedia-on-ipfs.org - https://ru.wikipedia-on-ipfs.org - https://fa.wikipedia-on-ipfs.org @@ -115,24 +116,31 @@ It is advised to use separate IPFS node for this: ```console $ export IPFS_PATH=/path/to/IPFS_PATH_WIKIPEDIA_MIRROR -$ ipfs init -p server,local-discovery,badgerds,randomports --empty-repo +$ ipfs init -p server,local-discovery,flatfs,randomports --empty-repo ``` -#### Tune datastore for speed +#### Tune DHT for speed -Make sure repo is initialized with datastore backed by `badgerds` for improved performance, or if you choose to use slower `flatfs` at least use it with `sync` set to `false`. +Wikipedia has a lot of blocks, to publish them as fast as possible, +enable [Accelerated DHT Client](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#accelerated-dht-client): -**NOTE:** While badgerv1 datastore _is_ faster, one may choose to avoid using it with bigger builds like English because of [memory issues due to the number of files](https://github.com/ipfs/distributed-wikipedia-mirror/issues/85). Potential workaround is to use [`filestore`](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#ipfs-filestore) that avoids duplicating data and reuses unpacked files as-is. +```console +$ ipfs config --json Experimental.AcceleratedDHTClient true +``` -#### Enable HAMT sharding +#### Tune datastore for speed -Configure your IPFS node to enable directory sharding +Make sure repo uses `flatfs` with `sync` set to `false`: -```sh -$ ipfs config --json 'Experimental.ShardingEnabled' true +```console +$ ipfs config --json 'Datastore.Spec.mounts' "$(ipfs config 'Datastore.Spec.mounts' | jq -c '.[0].child.sync=false')" ``` -This step won't be necessary when automatic sharding lands in go-ipfs (wip). +**NOTE:** While badgerv1 datastore is faster is nome configurations, we choose to avoid using it with bigger builds like English because of [memory issues due to the number of files](https://github.com/ipfs/distributed-wikipedia-mirror/issues/85). Potential workaround is to use [`filestore`](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#ipfs-filestore) that avoids duplicating data and reuses unpacked files as-is. + +#### HAMT sharding + +Make sure you use go-ipfs 0.12 or later, it has automatic sharding of big directories. ### Step 3: Download the latest snapshot from kiwix.org diff --git a/mirrorzim.sh b/mirrorzim.sh index a80c6da..f065e70 100755 --- a/mirrorzim.sh +++ b/mirrorzim.sh @@ -84,7 +84,7 @@ fi printf "\nEnsure zimdump is present...\n" PATH=$PATH:$(realpath ./bin) -which zimdump &> /dev/null || (curl --progress-bar -L https://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64-3.0.0.tar.gz | tar -xvz --strip-components=1 -C ./bin zim-tools_linux-x86_64-3.0.0/zimdump && chmod +x ./bin/zimdump) +which zimdump &> /dev/null || (curl --progress-bar -L https://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64-3.1.0.tar.gz | tar -xvz --strip-components=1 -C ./bin zim-tools_linux-x86_64-3.1.0/zimdump && chmod +x ./bin/zimdump) printf "\nDownload and verify the zim file...\n" ZIM_FILE_SOURCE_URL="$(./tools/getzim.sh download $WIKI_TYPE $WIKI_TYPE $LANGUAGE_CODE all maxi latest | grep 'URL:' | cut -d' ' -f3)" diff --git a/snapshot-hashes.yml b/snapshot-hashes.yml index 73f309b..a6b4705 100644 --- a/snapshot-hashes.yml +++ b/snapshot-hashes.yml @@ -37,13 +37,20 @@ zh: date: 2021-03-16 ipns: ipfs: https://dweb.link/ipfs/bafybeiazgazbrj6qprr4y5hx277u4g2r5nzgo3jnxkhqx56doxdqrzms6y +uk: + name: Ukrainian + original: uk.wikipedia.org + source: wikipedia_uk_all_maxi_2022-03.zim + date: 2022-03-09 + ipns: + ipfs: https://dweb.link/ipfs/bafybeibiqlrnmws6psog7rl5ofeci3ontraitllw6wyyswnhxbwdkmw4ka ru: name: Russian original: ru.wikipedia.org - source: wikipedia_ru_all_maxi_2021-03.zim - date: 2021-03-25 + source: wikipedia_ru_all_maxi_2022-03.zim + date: 2022-03-12 ipns: - ipfs: https://dweb.link/ipfs/bafybeieto6mcuvqlechv4iadoqvnffondeiwxc2bcfcewhvpsd2odvbmvm + ipfs: https://dweb.link/ipfs/bafybeiezqkklnjkqywshh4lg65xblaz2scbbdgzip4vkbrc4gn37horokq fa: name: Persian original: fa.wikipedia.org