forked from ArchiveTeam/ArchiveBot
-
Notifications
You must be signed in to change notification settings - Fork 0
/
INSTALL.pipeline
86 lines (60 loc) · 2.45 KB
/
INSTALL.pipeline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
To run the pipeline, you will need:
- a Python 3.3+ installation
- Pip (for Python 3.3+)
- seesaw (automatically installed by Pip)
- rsync
- wpull (automatically installed by Pip)
- PhantomJS 1.9.8
Quick install, for Debian and Debian-esque systems:
sudo apt-get update
sudo apt-get install build-essential python3-dev python3-pip \
libxml2-dev libxslt-dev zlib1g-dev libssl-dev libsqlite3-dev \
libffi-dev git tmux fontconfig-config fonts-dejavu-core \
libfontconfig1 libjpeg-turbo8 libjpeg8
(If your distro does not have a recent Python, try deadsnakes ppa for
Ubuntu or compile it easily with yyuu's pyenv. You may also want to use
virtualenv to micromanage your Python dependencies.)
Set up a dedicated account:
adduser archivebot
As user archivebot:
ssh-keygen
[keep hitting Enter]
cat ~/.ssh/id_rsa.pub
[send the public key to yipdw]
cd ~/
git clone https://github.com/ArchiveTeam/ArchiveBot
cd ArchiveBot
git submodule update --init
pip3 install --user -r pipeline/requirements.txt
As user archivebot, in first tmux session:
ssh -C -L 127.0.0.1:16379:127.0.0.1:6379 \
As user archivebot, in second tmux session:
cd ~/ArchiveBot/pipeline
mkdir -p ~/warcs4fos
export RSYNC_URL=rsync://fos.textfiles.com/archivebot/
export REDIS_URL=redis://127.0.0.1:16379/0
export FINISHED_WARCS_DIR=$HOME/warcs4fos
~/.local/bin/run-pipeline3 pipeline.py --disable-web-server \
--concurrent 2 NAME 2>&1 | \
tee "pipeline-$(date -u +"%Y-%m-%dT%H_%M_%SZ").log"
Adjust --concurrent as needed.
If you want your pipeline to only handle !ao/!archiveonly jobs, run it
with the AO_ONLY environment variable set:
AO_ONLY=1 ~/.local/bin/run-pipeline3 pipeline.py \
--disable-web-server --concurrent 2 NAME
or
export AO_ONLY=1
~/.local/bin/run-pipeline3 pipeline.py --disable-web-server \
--concurrent 2 NAME
As user archivebot, in third tmux session:
export RSYNC_URL=rsync://fos.textfiles.com/archivebot/
~/ArchiveBot/uploader/uploader.py $HOME/warcs4fos
If you start multiple pipelines, you can safely point them to the
same FINISHED_WARCS_DIR and run just one uploader.
To gracefully stop the pipeline:
touch ~/ArchiveBot/pipeline/STOP
To gracefully stop the uploader, hit ctrl-c in its tmux session.
To upgrade, run:
pip3 install --user --upgrade -r pipeline/requirements.txt
vim:ts=2:sw=2:tw=72:et