Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LND subscriptions #726

Merged
merged 24 commits into from
Jan 8, 2024
Merged

Conversation

ekzyis
Copy link
Member

@ekzyis ekzyis commented Jan 2, 2024

This PR adds confirming invoices by subscribing to invoice updates using subscribeToInvoices since this will be relevant for good UX in #715.

This should increase the speed since the worker only polls LND every 5 seconds but our frontend query polls every second. Difference in speed can also be seen in these two videos:

subscription
2024-01-02.01-26-03.mp4
polling
2024-01-02.01-27-21.mp4

We could run this for a few weeks in parallel with the old battle-tested polling as a fallback.


TODO since a429bf2:

  • test deposits on a429bf2 with following test cases:
    • 1a) worker is running (subscriptions together with polling), create invoice, pay invoice
    • 1b) worker is running (subscriptions standalone), create invoice, pay invoice
    • 1c) worker is not running (subscriptions together with polling), create invoice, start worker, pay invoice
    • 1d) worker is not running (subscriptions together with polling), create invoice, pay invoice, start worker
    • 1e) worker is not running (subscriptions standalone), create invoice, start worker, pay invoice
    • 1f) worker is not running (subscriptions standalone), create invoice, pay invoice, start worker
  • test withdrawals on a429bf2 with following test cases:
    • 2a) worker is running (subscriptions together with polling), paste bolt11 invoice into withdrawal form, invoice gets paid using payViaPaymentRequest
    • 2b) worker is running (subscriptions standalone), paste bolt11 invoice, invoice gets paid using payViaPaymentRequest
    • 2c) worker is not running (subscriptions together with polling), paste bolt11 invoice, invoice gets paid using payViaPaymentRequest, start worker
    • 2d) worker is not running (subscriptions together with polling), paste bolt11 invoice, start worker, invoice gets paid using lncli payinvoice
    • 2e) worker is not running (subscriptions standalone), paste bolt11 invoice, invoice gets paid using payViaPaymentRequest, start worker
    • 2f) worker is not running (subscriptions standalone), paste bolt11 invoice, start worker, invoice gets paid using lncli payinvoice

I think these test cases should cover everything 🤔

And in general, payViaPaymentRequest should trigger the same behavior as lncli payinvoice. But for 2d) and 2f), I will comment out the payViaPaymentRequest code lines and then pay the invoice manually via lncli payinvoice to simulate a successful payment after the worker was restarted.


update:

2d) and 2f) results in unknown failure since the payment is not known to LND (since I commented out payViaPaymentRequest). but this makes sense so is expected behavior. Both work when I simply return in checkWithdrawal instead of failing the withdrawal if it wasn't found (this skip queueing a new check). Using lncli payinvoice then makes the withdrawal successful.

@ekzyis ekzyis added the enhancement improvements to existing features label Jan 2, 2024
@ekzyis ekzyis changed the title Use parallel invoice subscriptions Use invoice subscriptions in parallel with polling Jan 2, 2024
worker/index.js Outdated Show resolved Hide resolved
@ekzyis ekzyis marked this pull request as draft January 2, 2024 00:36
@ekzyis
Copy link
Member Author

ekzyis commented Jan 2, 2024

Ok, we definitely need to make sure that our invoice code is idempotent when we want to run both methods in parallel for a while.

  1. confirm_invoice is idempotent because it's a noop if confirmed_at is anything but NULL:

IF confirmed_at IS NULL THEN
UPDATE "Invoice" SET "msatsReceived" = lnd_received, "confirmedAt" = now_utc(), updated_at = now_utc()
WHERE hash = lnd_id;
UPDATE users SET msats = msats + lnd_received WHERE id = user_id;
END IF;

  1. Sending user notifications is not idempotent so we need to prevent this line getting hit multiple times:

sendUserNotification(dbInv.userId, {
title: `${numWithUnits(msatsToSats(inv.received_mtokens), { abbreviate: false })} were deposited in your account`,
body: dbInv.comment || undefined,
tag: 'DEPOSIT',
data: { sats: msatsToSats(inv.received_mtokens) }
}).catch(console.error)

  1. Cancel code is idempotent:

if (inv.is_canceled) {
return serialize(models,
models.invoice.update({
where: {
hash: inv.id
},
data: {
cancelled: true
}
}))
}

  1. Hodl invoice confirmation is idempotent:

if (inv.is_held && !isHeldSet) {
// this is basically confirm_invoice without setting confirmed_at since it's not settled yet
// and without setting the user balance since that's done inside the same tx as the HODL invoice action.
await serialize(models,
models.invoice.update({ where: { hash }, data: { msatsReceived: Number(inv.received_mtokens), isHeld: true } }))
// remember that we already executed this if clause
// (even though the query above is idempotent but imo, this makes the flow more clear)
isHeldSet = true
}

  1. Scheduling a new job is not idempotent:

if (!expired) {
// recheck in 5 seconds if the invoice is younger than 5 minutes
// otherwise recheck in 60 seconds
const startAfter = new Date(inv.created_at) > datePivot(new Date(), { minutes: -5 }) ? 5 : 60
await boss.send('checkInvoice', { hash, isHeldSet }, { ...walletOptions, startAfter })
}

  1. Canceling hodl invoices is idempotent:

if (expired && inv.is_held) {
await cancelHodlInvoice({ id: hash, lnd })
}

@ekzyis
Copy link
Member Author

ekzyis commented Jan 2, 2024

This is still a draft since I still think I am making too many assumptions here. For example, I am not entirely sure yet when invoice_updated gets fired. It might trigger in cases where we don't want to call checkInvoice - for example, for withdrawals.

However, the changes are small enough that a review at this stage also makes sense imo. I expect the code to not much change at this point. It just needs more testing, especially because I mentioned to run this in parallel with the old code.

However, with enough testing, we don't even need to run this in parallel with the checkInvoice job? 🤔

And maybe I can also go a step further and also convert checkWithdrawal to use LND subscriptions instead of polling if it requires similar small changes only.

@huumn
Copy link
Member

huumn commented Jan 2, 2024

If you haven't yet, you can check my Outer Space code which uses subscriptions instead of polling. The main difference IIRC is that you want to store an invoice cursor so that when the worker restarts it can get any invoices it missed while it was off.

We should be able to switch completely off of polling but using both while migrating might make sense.


Withdrawals can be subscribed too as well, but they don't have a cursor iirc so we'll continue to need polling as a fallback.

@ekzyis ekzyis force-pushed the worker-invoice-subscriptions branch from 49bac29 to bb74b5f Compare January 4, 2024 01:51
@ekzyis
Copy link
Member Author

ekzyis commented Jan 4, 2024

If you haven't yet, you can check my Outer Space code which uses subscriptions instead of polling. The main difference IIRC is that you want to store an invoice cursor so that when the worker restarts it can get any invoices it missed while it was off.

Did that in bb74b5f

Withdrawals can be subscribed too as well, but they don't have a cursor iirc so we'll continue to need polling as a fallback.

You're right, according to the ln-service docs of subscribeToPayments, there is no cursor.

However, I believe we don't need a cursor since iiuc, LND already does what we want:

TrackPayments returns an update stream for every payment that is not in a terminal state. Note that if payments are in-flight while starting a new subscription, the start of the payment stream could produce out-of-order and/or duplicate events. In order to get updates for every in-flight payment attempt make sure to subscribe to this method before initiating any payments.

-- https://lightning.engineering/api-docs/api/lnd/router/track-payments

So since we don't need to get updates for every in-flight payment attempt, I think this subscription is fine. We only need to be subscribed to "every payment that is not in a terminal state".

And TrackPayments is what ln-service uses under the hood. See here.

So my current conclusion is that we might not even need polling for withdrawals.

@huumn
Copy link
Member

huumn commented Jan 4, 2024

So my current conclusion is that we might not even need polling for withdrawals.

We don't need polling, but when the worker starts up, we will need to check the status of any pending payments in case any payments completed while the worker was down.

@ekzyis
Copy link
Member Author

ekzyis commented Jan 4, 2024

when the worker starts up, we will need to check the status of any pending payments in case any payments completed while the worker was down.

Ah, makes sense! I was only thinking about pending payments that were not completed while the worker was down.

@ekzyis ekzyis force-pushed the worker-invoice-subscriptions branch from 7b8ce18 to a429bf2 Compare January 4, 2024 18:50
@huumn
Copy link
Member

huumn commented Jan 4, 2024

Ah, makes sense! I was only thinking about pending payments.

That's what nice about the cursor on receives. We can always resume where we left off.

I asked Bos why there wasn't a cursor for withdrawals last year, "y no api symmetry?" I forget what he said, but he recommended still using subscriptions for withdrawals but checking for missed payment statuses if the connection is lost/restarted.

@ekzyis
Copy link
Member Author

ekzyis commented Jan 4, 2024

checking for missed payment statuses if the connection is lost/restarted.

I am doing this on worker startup in a429bf2 now:

// queue status check of all pending withdrawals since they might have been paid by LND while worker was down
await models.$queryRaw`
  INSERT INTO pgboss.job (name, data, retrylimit, retrybackoff, startafter)
  SELECT 'checkWithdrawal', json_build_object('id', w.id, 'hash', w.hash), 21, true, now() + interval '10 seconds'
  FROM "Withdrawl" w WHERE w.status IS NULL`

However, since we do polling and subscription in parallel (so we still queue a job during create_withdrawal), there will be two jobs queued for every withdrawal that was still pending in the database when the worker is restarted.

I need to add some deduplication since else, there will always be two jobs running in parallel for these withdrawals until they are settled since they requeue themselves.

So basically some JOIN with pgboss.job (or AND NOT EXISTS) to check if there is already a checkWithdrawal job queued for this id.


update: or don't use pgboss for this and just fetch these withdrawals and pipe them through the checkWithdrawal once ourselves. But it's nice to queue jobs using a simple SQL query with pgboss. Just not nice while/if we're running subscriptions and polling in parallel.

@huumn
Copy link
Member

huumn commented Jan 4, 2024

Alternatively, on startup:

  1. subscribe to withdrawals
  2. loop over withdrawals that are still pending and check them

This should allow you to get rid of the queuing all together.

@ekzyis ekzyis marked this pull request as ready for review January 5, 2024 00:37
@ekzyis ekzyis force-pushed the worker-invoice-subscriptions branch from 4c58384 to ec25e58 Compare January 5, 2024 00:51
worker/index.js Outdated Show resolved Hide resolved
@ekzyis
Copy link
Member Author

ekzyis commented Jan 6, 2024

edit: For some reason, the callback does not get triggered if hodl invoices expire. Will take another look at this tomorrow.

Okay, I think I isolated the problem now. SubscribeToInvoice does get called if the hodl invoice expires since from "unpaid" -> "expired" is considered to be a state transition. LND simply considers the invoice to be "canceled" when it expires (inv.is_canceled is true).

However, it does not get called if the invoice was already paid and is thus in the is_held state.

In that case, "expired" no longer exists. The invoice must be manually settled or canceled now. This does work in our happy path (when there is no error when the client uses the hash and hmac of the paid invoice to trigger an action) but not in the bad path since we relied on the polling there.

So I think I'll just insert a one time job for that case that gets run when the invoice would expire and then cancels it if it's not settled already.

@ekzyis
Copy link
Member Author

ekzyis commented Jan 6, 2024

Okay, this should be it now.

video

Timestamps:

00:00 - 00:10 Deposits

00:13 - 00:26 Withdrawals

00:27 - 00:38 Successful anon zap / HODL invoice

00:39 - 01:15 Expired anon zap / HODL invoice without paying

01:20 - 02:20 Simulate error after anon zap payment and wait for expiration to cancel invoice

@ekzyis ekzyis marked this pull request as ready for review January 6, 2024 19:25
worker/wallet.js Outdated Show resolved Hide resolved
worker/wallet.js Outdated Show resolved Hide resolved
@huumn
Copy link
Member

huumn commented Jan 7, 2024

Oh man! So snappy! I should've done this literally years ago. I'll give it a review first thing tomorrow

@ekzyis ekzyis changed the title Use invoice subscriptions in parallel with polling Use LND subscriptions Jan 7, 2024
@ekzyis
Copy link
Member Author

ekzyis commented Jan 7, 2024

Oh man! So snappy! I should've done this literally years ago. I'll give it a review first thing tomorrow

It could get even more snappier since we're still polling every second in the frontend but looks snappy enough for #715 so far :)

And in production, payments will probably take a lot longer since this is just running locally. So the 1 second poll might be less significant in production. And one second is just the worst case. On average we would only wait 500 ms extra.

worker/wallet.js Outdated Show resolved Hide resolved
@huumn
Copy link
Member

huumn commented Jan 7, 2024

Working on this locally. I noticed we don't have a good strategy for dealing with the invoice_updated callback failing. (or the withdrawal callback)

For example:

  1. invoice_updated is called with confirm_index = 1 and the db update fails. As is, we just log it.
  2. invoice_update is called for another invoice confirm_index = 2 and the update succeeds. Next time the worker restarts, we subscribe at confirm_index = 2 so the invoice in (1) will never confirm in the db.

This should be rare but it's a critical problem. AFAICT we have a few options:

  1. edit: this isn't really a solution exit the worker process immediately when this happens (but concurrency - especially across the worker cluster - could still allow the above to happen)
  2. fallback to polling in such cases
  3. check invoices on startup like with withdrawals (basically always keep our "migration" code) AND have a periodic job that does this
  4. record add_index and on startup subscribeToInvoices with the lowest add_index that hasn't confirmed

(4) doesn't solve this problem for withdrawals and requires a restart, so I think (2) or (3) are among our best bets, but I think (3) is the better bet:

  1. on the basis of code simplicity because we'll always need (3) for withdrawals
  2. if the DB connection is failing, we might not be able to insert a job to poll so we'll need (3) anyway

@huumn
Copy link
Member

huumn commented Jan 7, 2024

Would you mind explaining why exactly:

  1. we subscribe to single hodl invoices? The comment there says a lot of things, but I can't tell which case this subscription is handling.
  2. we need the finalizeHodlInvoice job? This is where we refund the payment possibly, right? Does the invoice_updated subscription to the single invoice not tell us when an invoice is expired?

edit I think this answers (2):

SubscribeToInvoice does get called if the hodl invoice expires since from "unpaid" -> "expired" is considered to be a state transition. LND simply considers the invoice to be "canceled" when it expires (inv.is_canceled is true).

However, it does not get called if the invoice was already paid and is thus in the is_held state.

So if (1) isn't doing that, what is (1) doing?

@ekzyis
Copy link
Member Author

ekzyis commented Jan 7, 2024

Would you mind explaining why exactly:

  1. we subscribe to single hodl invoices? The comment there says a lot of things, but I can't tell which case this subscription is handling.
  2. we need the finalizeHodlInvoice job? This is where we refund the payment possibly, right? Does the invoice_updated subscription to the single invoice not tell us when an invoice is expired?

edit I think this answers (2):

SubscribeToInvoice does get called if the hodl invoice expires since from "unpaid" -> "expired" is considered to be a state transition. LND simply considers the invoice to be "canceled" when it expires (inv.is_canceled is true).
However, it does not get called if the invoice was already paid and is thus in the is_held state.

So if (1) isn't doing that, what is (1) doing?

(1) handles the transition from "unpaid" -> "is_held". SubscribeToInvoices does not get called if an HODL invoice was paid since it's not settled yet. It only gets called on invoice creation and invoice settlement.

@huumn
Copy link
Member

huumn commented Jan 7, 2024

Thanks! That makes a lot of sense.

@ekzyis
Copy link
Member Author

ekzyis commented Jan 7, 2024

Working on this locally. I noticed we don't have a good strategy for dealing with the invoice_updated callback failing. (or the withdrawal callback)

For example:

  1. invoice_updated is called with confirm_index = 1 and the db update fails. As is, we just log it.
  2. invoice_update is called for another invoice confirm_index = 2 and the update succeeds. Next time the worker restarts, we subscribe at confirm_index = 2 so the invoice in (1) will never confirm in the db.

This should be rare but it's a critical problem. AFAICT we have a few options:

  1. edit: this isn't really a solution exit the worker process immediately when this happens (but concurrency - especially across the worker cluster - could still allow the above to happen)
  2. fallback to polling in such cases
  3. check invoices on startup like with withdrawals (basically always keep our "migration" code) AND have a periodic job that does this
  4. record add_index and on startup subscribeToInvoices with the lowest add_index that hasn't confirmed

(4) doesn't solve this problem for withdrawals and requires a restart, so I think (2) or (3) are among our best bets, but I think (3) is the better bet:

  1. on the basis of code simplicity because we'll always need (3) for withdrawals
  2. if the DB connection is failing, we might not be able to insert a job to poll so we'll need (3) anyway

Ah, makes sense. Yes, I think (3) is the best option, too.

@ekzyis ekzyis force-pushed the worker-invoice-subscriptions branch from 023ddda to 2ad9d8f Compare January 8, 2024 00:10
@huumn
Copy link
Member

huumn commented Jan 8, 2024

I made some very small readability changes as I reviewed, updated some of the comments to help someone like me who was less familiar with the code, and made sure we aren't suppressing some of the errors (which I might've overdone but better to be a little overzealous).

This still needs another testing/review pass, but my polar network needs to be updated for TrackPayments I think.

@huumn
Copy link
Member

huumn commented Jan 8, 2024

I'm not going to be pushing to this anymore tonight fyi. Just wanted to make sure I shared what I had in case you had other mods chambered.

@ekzyis ekzyis force-pushed the worker-invoice-subscriptions branch from b04d65c to 2ad9d8f Compare January 8, 2024 08:20
@huumn
Copy link
Member

huumn commented Jan 8, 2024

BTW you keep force pushing over your prs

@ekzyis
Copy link
Member Author

ekzyis commented Jan 8, 2024

BTW you keep force pushing over your prs

Oh no, I thought I disabled the mirroring from my repo on gitea. Will check after the call, thanks for letting me know

update: Sorry, pushing from gitea to github was still enabled. I realized I don't need my forks anymore. I can push branches to this repo, probably since a long time lol

@huumn
Copy link
Member

huumn commented Jan 8, 2024

Going to merge. I think this is rock solid now.

@ekzyis I made some more significant changes so please look it over at your convenience and just in case.

@huumn huumn merged commit 2151323 into stackernews:master Jan 8, 2024
1 check passed
@ekzyis
Copy link
Member Author

ekzyis commented Jan 8, 2024

@ekzyis I made some more significant changes so please look it over at your convenience and just in case.

Understood, will try to break this in 8 hours :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improvements to existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants