Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new job to backup pins to s3 #1844

Open
wants to merge 69 commits into
base: main
Choose a base branch
from
Open

Conversation

joshghent
Copy link
Contributor

@joshghent joshghent commented Sep 6, 2022

⚠️ Requires a new migration on the psa_pins_request table to add the new backup_urls column. ⚠️

About this job

This new cron job which runs every 4 hours, grabs 10,000 pin requests, gets the car file and uploads it to s3.
It behaves in a similar way to nftstorage/backup.
It uses Dagula to grab the car file from the IPFS peer.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2022

@mbommerez mbommerez linked an issue Sep 6, 2022 that may be closed by this pull request
Copy link
Contributor

@flea89 flea89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting together 🎉
I did a really quick review pass and left some high-level comments for you.

Also noticed some types still need updating across the board.

packages/cron/src/jobs/pins-backup.js Outdated Show resolved Hide resolved
packages/cron/src/jobs/pins-backup.js Outdated Show resolved Hide resolved
packages/cron/test/pins-backup.spec.js Outdated Show resolved Hide resolved
.github/workflows/cron-backup-pins.yaml Outdated Show resolved Hide resolved
packages/cron/src/bin/pins-backup.js Show resolved Hide resolved
packages/cron/src/jobs/pins-backup.js Show resolved Hide resolved
Comment on lines 19 to 29
this.MAX_DAG_SIZE = 1024 * 1024 * 1024 * 32 // don't try to transfer a DAG that's bigger than 32GB
this.log = debug('backup:pins')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to this DAGs?
Is there a different approach we have in mind for those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't... but it's a pattern that is found in other areas of the system. So might be worth flagging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@alanshaw alanshaw Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was along the lines of that being the max in our ToS and wanting to have a cap at some point on the size of data we're willing to have uploaded/pinned. If you uploaded more than what we've said is the max allowed then we shouldn't be obliged to store that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't have that limit for pins do we? I can't find the ToS for pins, but I might just be missing it?

Assuming there isn't one, there's a chance content bigger than that is stored there and users should expect it to be migrated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've commented out the size check part, because, unless we have strong reasons not to, we should move all the files to eipfs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add logging to keep track of enormous dags

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 pieces of logging:

  1. every time for whatever reason dag export fails we log the number of bytes read to that point
  2. If a successful export is greater than 32TiB, we log it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32TiB or 32 GiB? i think we want to know about pins >= 32 GiB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I actually added 32 GiB in the code 👍 (see here)

packages/cron/src/jobs/pins-backup.js Show resolved Hide resolved
packages/cron/src/jobs/pins-backup.js Show resolved Hide resolved
@joshghent
Copy link
Contributor Author

Ok, so before this is merged it needs a the migration running which adds a new column backup_urls to the psa_pins_request.
This cron job will need verifying on staging and make sure that it successfully backs up the pins.
After that, it can be safely merged to production.

@joshghent joshghent force-pushed the feat/794-copy-pins-to-eips branch from 2d4715a to 6017c94 Compare September 11, 2022 20:55
@github-actions
Copy link
Contributor

github-actions bot commented Sep 11, 2022

package-lock.json changes

Summary

Status Count
ADDED 98
UPDATED 6
Click to toggle table visibility
Name Status Previous Current
@achingbrain/ip-address ADDED - 8.1.0
@achingbrain/nat-port-mapper ADDED - 1.0.7
@achingbrain/ssdp ADDED - 4.0.1
@aws-sdk/lib-storage ADDED - 3.194.0
@aws-sdk/middleware-endpoint ADDED - 3.193.0
@aws-sdk/util-middleware ADDED - 3.193.0
@chainsafe/libp2p-noise ADDED - 7.0.3
@libp2p/connection ADDED - 2.0.4
@libp2p/crypto ADDED - 1.0.7
@libp2p/interface-connection-encrypter ADDED - 1.0.3
@libp2p/interface-connection ADDED - 3.0.2
@libp2p/interface-keys ADDED - 1.0.3
@libp2p/interface-peer-id ADDED - 1.0.5
@libp2p/interface-peer-info ADDED - 1.0.3
@libp2p/interface-peer-store ADDED - 1.2.2
@libp2p/interface-record ADDED - 2.0.1
@libp2p/interface-transport ADDED - 1.0.4
@libp2p/interfaces ADDED - 2.0.4
@libp2p/logger ADDED - 2.0.2
@libp2p/mplex ADDED - 1.2.2
@libp2p/multistream-select ADDED - 1.0.6
@libp2p/peer-collections ADDED - 2.2.0
@libp2p/peer-id-factory ADDED - 1.0.19
@libp2p/peer-id ADDED - 1.1.16
@libp2p/peer-record ADDED - 1.0.12
@libp2p/peer-store ADDED - 1.0.17
@libp2p/tcp ADDED - 3.1.2
@libp2p/tracked-map ADDED - 1.0.8
@libp2p/utils ADDED - 3.0.2
@libp2p/websockets ADDED - 3.0.4
@multiformats/mafmt ADDED - 11.0.3
@multiformats/multiaddr-to-uri ADDED - 9.0.2
@multiformats/multiaddr ADDED - 10.5.0
@noble/secp256k1 UPDATED 1.4.0 1.7.0
@stablelib/aead ADDED - 1.0.1
@stablelib/binary ADDED - 1.0.1
@stablelib/bytes ADDED - 1.0.1
@stablelib/chacha ADDED - 1.0.1
@stablelib/chacha20poly1305 ADDED - 1.0.1
@stablelib/constant-time ADDED - 1.0.1
@stablelib/hash ADDED - 1.0.1
@stablelib/hkdf ADDED - 1.0.1
@stablelib/hmac ADDED - 1.0.1
@stablelib/int ADDED - 1.0.1
@stablelib/keyagreement ADDED - 1.0.1
@stablelib/poly1305 ADDED - 1.0.1
@stablelib/random ADDED - 1.0.2
@stablelib/sha256 ADDED - 1.0.1
@stablelib/wipe ADDED - 1.0.1
@stablelib/x25519 ADDED - 1.0.3
@web3-storage/fast-unixfs-exporter ADDED - 0.2.1
abortable-iterator ADDED - 4.0.2
aws-sdk ADDED - 2.1239.0
byte-access ADDED - 1.0.1
clone-regexp ADDED - 3.0.0
conf UPDATED 10.1.1 10.2.0
convert-hrtime ADDED - 5.0.0
dagula ADDED - 3.1.1
datastore-core ADDED - 7.0.3
default-gateway ADDED - 6.0.3
event-iterator ADDED - 2.0.0
format-number ADDED - 3.0.0
freeport-promise ADDED - 2.0.0
function-timeout ADDED - 0.1.1
hashlru ADDED - 2.3.0
interface-blockstore UPDATED 2.0.2 2.0.3
interface-datastore UPDATED 6.0.3 6.1.1
is-loopback-addr ADDED - 2.0.1
is-regexp ADDED - 3.1.0
it-foreach ADDED - 0.1.1
it-handshake ADDED - 4.1.2
it-length-prefixed ADDED - 7.0.1
it-merge ADDED - 1.0.4
it-pair ADDED - 2.0.3
it-pb-stream ADDED - 2.0.2
it-pushable ADDED - 2.0.2
it-reader ADDED - 6.0.1
it-sort ADDED - 1.0.1
it-stream-types ADDED - 1.0.4
it-ws ADDED - 5.0.3
jmespath ADDED - 0.16.0
jsbn ADDED - 1.1.0
libp2p ADDED - 0.37.3
longbits ADDED - 1.1.0
mime-db UPDATED 1.51.0 1.52.0
mime-types UPDATED 2.1.34 2.1.35
mortice ADDED - 3.0.1
mutable-proxy ADDED - 1.0.0
netmask ADDED - 2.0.2
observable-webworkers ADDED - 2.0.1
p-queue ADDED - 7.3.0
private-ip ADDED - 2.3.4
protons-runtime ADDED - 2.0.2
sanitize-filename ADDED - 1.6.3
set-delayed-interval ADDED - 1.0.0
super-regex ADDED - 0.2.0
time-span ADDED - 5.1.0
truncate-utf8-bytes ADDED - 1.0.2
ts-mocha ADDED - 9.0.2
uint8-varint ADDED - 1.0.4
uint8arraylist ADDED - 2.3.3
utf8-byte-length ADDED - 1.0.4
wherearewe ADDED - 2.0.1
xsalsa20 ADDED - 1.2.0

@joshghent joshghent requested a review from flea89 September 11, 2022 20:57
@flea89
Copy link
Contributor

flea89 commented Sep 12, 2022

@alanshaw, while there are still a few tweaks required (ie. some types are missing/need fixing), I wonder if you could review this PR to see if the approach is what you expected it to be.
@joshghent is off for a few days, it'd be great to have your thoughts so that he can tidy everything up and action feedback (if any) from you.

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check with @olizilla that the DB change is similar to what he's expecting to do.


on:
schedule:
- cron: '*/30 * * * *'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just schedule for the max amount of time a job can run for 6h?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly the job has 2 goals:

  1. move historical pins to EIPFS
  2. keep moving new psa requests to EIPFS until we move to pickup.

For 2 I guess it's ideal to keep moving stuff as promptly as possible (30 min make sense, even less than that?) while we know the first runs of the job will be super slow (since they will have to go through all the historical data).

Isn't a solution to satisfy both words to keep the schedule as is and set concurrency on the job?

packages/cron/src/jobs/pins-backup.js Show resolved Hide resolved
@@ -307,11 +307,13 @@ CREATE TABLE IF NOT EXISTS psa_pin_request
meta jsonb,
deleted_at TIMESTAMP WITH TIME ZONE,
inserted_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL
updated_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL,
backup_urls TEXT[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice if this was NOT NULL DEFAULT [] so you don't have to distinguish between null and empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed :) FWIW, I used the same definition as backup_urls in uploads. Should I change that one too?

packages/cron/src/jobs/pins-backup.js Outdated Show resolved Hide resolved
packages/cron/src/jobs/pins-backup.js Outdated Show resolved Hide resolved
packages/cron/src/jobs/pins-backup.js Outdated Show resolved Hide resolved
packages/cron/test/pins-backup.spec.js Show resolved Hide resolved
@joshghent joshghent force-pushed the feat/794-copy-pins-to-eips branch from 4edf2bf to b1fd1a9 Compare September 28, 2022 13:33
@joshghent joshghent temporarily deployed to production September 28, 2022 13:50 Inactive
@flea89 flea89 temporarily deployed to production October 24, 2022 21:12 Inactive
@flea89 flea89 requested a review from olizilla October 25, 2022 07:59
@flea89 flea89 temporarily deployed to production October 25, 2022 08:17 Inactive
@flea89 flea89 temporarily deployed to production October 25, 2022 08:39 Inactive
@flea89 flea89 force-pushed the feat/794-copy-pins-to-eips branch from e6d131b to df32027 Compare October 25, 2022 08:48
@flea89 flea89 force-pushed the feat/794-copy-pins-to-eips branch from df32027 to 52984d6 Compare October 25, 2022 08:51
@flea89 flea89 temporarily deployed to production October 25, 2022 08:57 Inactive
@flea89 flea89 temporarily deployed to production October 25, 2022 09:39 Inactive
@flea89 flea89 force-pushed the feat/794-copy-pins-to-eips branch from 2fe2914 to 9dec63f Compare October 25, 2022 10:19
@flea89 flea89 temporarily deployed to production October 25, 2022 10:25 Inactive
@flea89
Copy link
Contributor

flea89 commented Oct 25, 2022

@olizilla can you please give this a thorough review 🙏
Let me know if you want to take it from here, or you will need more support from my end as well.

let reportInterval
const libp2p = await getLibp2p()
try {
const dagula = await Dagula.fromNetwork(libp2p, { peer })
Copy link
Contributor

@flea89 flea89 Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache instances for same peer location.

It should be fine to keep a single lilp2p instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

throw (err)
} finally {
if (bytesReceived > this.MAX_UPLOAD_DAG_SIZE) {
this.log(`⚠️ CID: ${cid} dag is greater than ${this.fmt(this.MAX_UPLOAD_DAG_SIZE)}`)
Copy link
Contributor

@flea89 flea89 Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a per batch summary:

  • failed cids (and their size)
  • Successful (bigger than MAX_UPLOAD_DAG_SIZE)

Remove all unnecessary per cids logging, log just errors by default. (leave it with higher debug)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Default logging (DEBUG=backupPins:log ) is quite succinct now, while the job can be run manually passing a more verbose DEBUG=backupPins:* through workflow inputs.

Example of default logging:

❯ DEBUG=backupPins:log npm test --workspace=packages/cron

Screenshot 2022-10-31 at 16 24 12

❯ DEBUG=backupPins:log npm test --workspace=packages/cron

Screenshot 2022-10-31 at 16 26 17

* @returns {Promise<number | undefined>}
*/

// Given for PIN requests we never limited files size we shouldn't check this. ie.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete stale code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Bucket: bucketName,
Key: key,
Body: bak.content,
Metadata: { structure: 'Complete' }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look into sending checksum for file, reject the upload if the bytes of car don't match the cid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is required.
Looking at the headers sent by the client

  'content-type': 'application/xml',
  'content-length': '11985',
  Expect: '100-continue',
  host: '127.0.0.1',
  'x-amz-user-agent': 'aws-sdk-js/3.53.1',
  'user-agent': 'aws-sdk-js/3.53.1 os/darwin/21.6.0 lang/js md/nodejs/16.14.0 md/crt-avail api/s3/3.53.1',
  'amz-sdk-invocation-id': '25110079-acaf-425f-8933-3527fd8366c7',
  'amz-sdk-request': 'attempt=1; max=3',
  'x-amz-date': '20221031T122520Z',
  'x-amz-content-sha256': 'ebd8a0f42b66a7756aaee73e6275d918143525f125137b584e6b079b364a6b5f',
  authorization: 'AWS4-HMAC-SHA256 Credential=minioadmin/20221031/us-east-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-user-agent, Signature=90453c633c07234480f3319eb3c1b058d25b39a077eb3653063165c5bc137722'
}

you can see it sends x-amz-content-sha256 which is the sha256 of the payload and implies the payload is signed (see docs.

If every chunk is hashed and verified I don't think we need the overall one? Or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Existing PSA_Requests should be available to Elastic Provider
5 participants