Lightweight copies of databases with personal data obfuscated.
Two developers one dump π©
To keep customer data safe and reduce the liability on developers, live database access is only granted to people who really need it for operational support. You really don't want to be in that group, you'll get called at all hours and be asked to help solve horrible problems for angry customers and bosses.
Developer: π€ I need a copy of the production database to do my work
Developer: π¬ Hey privileged user can I have a copy of acme-cms-production
database?
Privileged user: π¬ Sure, I'll send you a link.
Privileged user: π 1. ssh to database server. 2. Run a oneliner. 3. Copy and paste oneliner for Developer to run.
Privileged user: π¬ Here you go! {ONELINER GOES HERE}
Developer: π¬ Thanks.
Developer: {ONELINER GOES HERE} Running...
ππΆ
Developer: π¬ Got it thanks! π
Privileged user: π¬ You are most welcome. π Have a great day! π
Note: This is a clock.co.uk specific workflow!
This package requires zstd
and unzstd
binaries. You can get them from:
Debian-based Linux (Ubuntu, etc):
apt install zstd
OSX:
brew install zstd
NB: With this tool it is possible to exhaust available disk space on the MongoDB server. For now, it is essential to manually check the size of the database you are going to dump, and the available disk space. Furthermore, if possible avoid running the tool close to the beginning of a new hour; Clock's servers are snapshotted every hour on the hour, and it is best to avoid storing the temporary dump files in those backups.
-
First ssh into database server
-
If this is your first dump π©
git clone https://github.com/clocklimited/mongo-wrangler.git
cd mongo-wrangler
Then either:
If you have node installed -
./dump.js {database name}
If you want to exclude additional collections use -e
./dump.js -e member,subscriber,duck,log {database name}
Or, using the binary - no node required:
./dump.sh database-name
-
Check the output for instructions and copy and paste the correct onliner
-
Send to the requester
-
Paste oneliner sent to you.
-
π
These collections are excluded by default. If you need them please ask the privileged user to include them by providing -i collectionName
to the dump script, e.g. ./dump.js -i sessions {database name}
.
If you find other big collections that are slowing up your dumps or taking a lot of space please send a PR or edit dump.js
on a per project bases.
Properties containing these properties are obfuscated by default. This can cause some data you don't want getting obfuscated. You'll have to ask the privileged to do a customer dump or submit a PR if this causes a problem.
The latest version of mongo-wrangler supports two executables for Node-free dumping and restoration.
Linux and MacOS binaries are available in dist/
. You simply need to clone the repo and use these as you would through Node. Some options are supported via environment variables.
Or, you can use the automatic cross-platform scripts at dump.sh
or restore.sh
.
If you are developing changes to mongo-wrangler, you should be able to do to get a newer runtime:
nave use stable
wget http://github.com/isaacs/nave/raw/main/nave.sh
bash ./nave.sh use stable
Add -v
or VERBOSE=1
on either command to see the commands run and get verbose output
Use -n
or NO_INDEX=1
to ignore indexes on restore
./restore.js -n {DB} {URL}
This solution relies on xfer.clock.co.uk