Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

Somehow record.nonce is not None #4483

Closed
chadwhitacre opened this issue May 24, 2017 · 52 comments
Closed

Somehow record.nonce is not None #4483

chadwhitacre opened this issue May 24, 2017 · 52 comments
Labels

Comments

@chadwhitacre
Copy link
Contributor

We have a sanity check during email verification, and somehow we're violating it.

@chadwhitacre
Copy link
Contributor Author

Several folks have hit this. We're up to 17 events in Sentry.

@chadwhitacre
Copy link
Contributor Author

Alright, so record.verified is True.

@chadwhitacre
Copy link
Contributor Author

... and there are no packages, but record.nonce is not None.

@chadwhitacre
Copy link
Contributor Author

Alright, so what is the circumstance in which nonce is set at all? (There's only one such circumstance, right?)

@chadwhitacre
Copy link
Contributor Author

Hypothesis: verification restart is implicated.

@chadwhitacre
Copy link
Contributor Author

We create a nonce during start_email_verification. Where might we update it?

@chadwhitacre
Copy link
Contributor Author

Is record.nonce ever None at this point?

@chadwhitacre
Copy link
Contributor Author

gratipay::DATABASE=> select count(id) from emails;
┌───────┐
│ count │
├───────┤
│  8229 │
└───────┘
(1 row)

gratipay::DATABASE=> select count(id) from emails where nonce is null and verified is not null;
┌───────┐
│ count │
├───────┤
│  6459 │
└───────┘
(1 row)

gratipay::DATABASE=> select count(id) from emails where nonce is not null and verified is null;
┌───────┐
│ count │
├───────┤
│  1760 │
└───────┘
(1 row)

gratipay::DATABASE=> select count(id) from emails where nonce is not null and verified is not null;
┌───────┐
│ count │
├───────┤
│    10 │
└───────┘
(1 row)

gratipay::DATABASE=> select count(id) from emails where nonce is null and verified is null;
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

gratipay::DATABASE=>

@chadwhitacre
Copy link
Contributor Author

The 10 that have both nonce and verified include two of the people that are definitely affected by this bug, and several others that have definitely used package claiming—including me!

@chadwhitacre
Copy link
Contributor Author

verification_start is after verification_end for all of them. Is that normal?

@chadwhitacre
Copy link
Contributor Author

No!

gratipay::DATABASE=> select count(*) from emails where verification_start > verification_end;
┌───────┐
│ count │
├───────┤
│    10 │
└───────┘
(1 row)

@chadwhitacre
Copy link
Contributor Author

gratipay::DATABASE=> select count(*) from emails where verification_start <= verification_end;
┌───────┐
│ count │
├───────┤
│  6459 │
└───────┘
(1 row)

@chadwhitacre
Copy link
Contributor Author

Does that account for everyone who has claimed packages?

@chadwhitacre
Copy link
Contributor Author

We have 13 claimants of npm packages (query lifted from gratipay/inside.gratipay.com#1045):

gratipay::DATABASE=> select count(distinct(owner)) as nowners, count(team_id) as npackages from teams_to_packages tp join teams t on tp.team_id = t.id;
┌─────────┬───────────┐
│ nowners │ npackages │
├─────────┼───────────┤
│      13 │       224 │
└─────────┴───────────┘
(1 row)

@chadwhitacre
Copy link
Contributor Author

Hypothesis: all 10 are people who already had a verified email and then claimed packages. The remaining three had no verified email before claiming a package.

@chadwhitacre
Copy link
Contributor Author

So I want to see everyone who has claimed a package, and what their email verification record looks like.

@chadwhitacre
Copy link
Contributor Author

gratipay::DATABASE=> select distinct(owner), e.address, e.verified, e.nonce, e.verification_start, e.verification_end from teams_to_packages tp join teams t on tp.team_id = t.id join participants p on p.username = t.owner join emails e on p.id=e.participant_id order by owner;

@chadwhitacre
Copy link
Contributor Author

It appears that verification_start is only set in one place, and only ever to now().

@chadwhitacre
Copy link
Contributor Author

chadwhitacre commented Jun 26, 2017

So why do we have null for verification_start? Are all of those from prior to #4488? Or do we have an insert with a default value?

Oh! We don't. 😊

gratipay::DATABASE=> select count(*) from emails where verification_start is null;
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

@chadwhitacre
Copy link
Contributor Author

The INSERT case will have the default of CURRENT_TIMESTAMP.

@chadwhitacre
Copy link
Contributor Author

chadwhitacre commented Jun 26, 2017

Alright, so the fact that verification_start is after verification_end is highly suggestive of hitting the else.

@chadwhitacre
Copy link
Contributor Author

It feels like we're not handling that case properly, that we're deleting package claims but not ... doing something else necessary.

@chadwhitacre
Copy link
Contributor Author

Is the problem that we're getting into that code path when we aren't expecting to? Or that the code path is misbehaved even when we get there properly?

@chadwhitacre
Copy link
Contributor Author

get_email_verification_nonce
  ⇖ get_email_verification_link
    ⇖ start_email_verification
      ⇖ emails/modify.json.spt
        ⇖ js/gratipay/emails.js
        ⇖ js/gratipay/packages.js

@chadwhitacre
Copy link
Contributor Author

This should throw an IntegrityError if the address is verified for a different participant.

https://github.com/gratipay/gratipay.com/blob/2096/gratipay/models/participant/email.py#L169-L170

What about for the same participant? This will throw an IntegrityError in that case, too, right? Is that what we want?

@chadwhitacre
Copy link
Contributor Author

Actually, that's protected against just before, where we load existing.

@chadwhitacre
Copy link
Contributor Author

Gosh I really just have this hunch that what happens is that isaacs et al. are coming through the else when claiming their packages—so that would've been via packages.js—and so they get a nonce with a late start and no packages linked to it. But the other condition is that the address is already verified. How do all three line up at once?

@chadwhitacre
Copy link
Contributor Author

Because the bug manifests on finish_email_verification, not start_email_verification. It's when trying to use the nonce that gets kicked out during start_.

@chadwhitacre
Copy link
Contributor Author

The bug manifests here, when these three are true:

  • the email address is verified
  • there are no packages being claimed
  • the record does have a nonce

@chadwhitacre
Copy link
Contributor Author

It feels like they should not be getting into the start_ else code path in the first place.

(I suspect the verification_end being before _start weirdness is not directly related to the bug.)

@chadwhitacre
Copy link
Contributor Author

Isaac hit this before we deployed bulk claiming. Therefore this bug exists even with manual claiming of multiple packages, one at a time. Is that the only case where it manifests?

@chadwhitacre
Copy link
Contributor Author

Do we clear nonce on verify? Did Isaac come at us with an old nonce?

@chadwhitacre
Copy link
Contributor Author

No, not an old nonce. That was the point of #4483 (comment). Until the Recent 10, everyone is either verified or has a nonce. Only with the Recent 10 do we have both.

@chadwhitacre
Copy link
Contributor Author

Alright, I believe the start_ else path is implicated, because the update we do there would seem to be able to explain how verification_start can end up later than verification_end.

If there is an existing nonce, there won't be a trace of it in the claims table because of the delete.

verification_end is nullable with no default and only ever gets set in one place, to now().

@chadwhitacre
Copy link
Contributor Author

So to end up with verification_end prior to verification_start—in some cases just prior—means the user is hitting save_email_address and then start_email_verification.

save_email_address comes under finish_email_verification.

So they must be hitting finish just before hitting start. Hmm ... 🤔

@chadwhitacre
Copy link
Contributor Author

in some cases just prior

gratipay::DATABASE=> select verification_start, verification_end, verification_start - verification_end delta from emails where verification_start > verification_end order by verification_end;
┌───────────────────────────────┬───────────────────────────────┬──────────────────────────┐
│      verification_start       │       verification_end        │          delta           │
├───────────────────────────────┼───────────────────────────────┼──────────────────────────┤
│ 2017-05-10 00:26:05.293957+00 │ 2014-12-09 19:24:52.29397+00  │ 882 days 05:01:12.999987 │
│ 2017-06-17 09:36:00.309672+00 │ 2014-12-10 03:11:20.383187+00 │ 920 days 06:24:39.926485 │
│ 2017-06-17 18:30:53.417246+00 │ 2015-02-24 10:09:42.301325+00 │ 844 days 08:21:11.115921 │
│ 2017-05-17 18:06:01.163001+00 │ 2017-05-17 18:05:31.882616+00 │ 00:00:29.280385          │
│ 2017-05-19 01:14:29.792552+00 │ 2017-05-19 01:02:22.695691+00 │ 00:12:07.096861          │
│ 2017-05-30 14:03:10.047133+00 │ 2017-05-30 13:55:22.432168+00 │ 00:07:47.614965          │
│ 2017-06-15 19:40:41.072505+00 │ 2017-06-15 15:34:25.358339+00 │ 04:06:15.714166          │
│ 2017-06-17 09:35:58.9947+00   │ 2017-06-17 09:35:30.625728+00 │ 00:00:28.368972          │
│ 2017-06-17 13:37:30.956824+00 │ 2017-06-17 13:31:15.339565+00 │ 00:06:15.617259          │
│ 2017-06-20 14:02:43.245817+00 │ 2017-06-20 09:30:44.071424+00 │ 04:31:59.174393          │
└───────────────────────────────┴───────────────────────────────┴──────────────────────────┘
(10 rows)

@chadwhitacre
Copy link
Contributor Author

Interesting that the deltas are all over the place!

@chadwhitacre
Copy link
Contributor Author

verification_end

The place where verification_end is set is also the only place where verified is set to true.

@chadwhitacre
Copy link
Contributor Author

And verified = true is one of the three conditions we need to account for to reproduce this bug.

@chadwhitacre
Copy link
Contributor Author

chadwhitacre commented Jun 26, 2017

We need to get verified set to true, and then start an email verification with a nonce but no packages.

@chadwhitacre
Copy link
Contributor Author

My hunch is that the user is verifying their email address first, independent of package claiming, and then claiming a package using that email.

@chadwhitacre
Copy link
Contributor Author

chadwhitacre commented Jun 26, 2017

Because they have to get into save_email_address to get verified=true + the verification_end.

emails/verify.html.spt
  ⇘ finish_email_verification
    ⇘ save_email_address

@chadwhitacre
Copy link
Contributor Author

30 seconds delta is about the minimum time for these two user actions.

@chadwhitacre
Copy link
Contributor Author

Perhaps they generate both emails, then click one, then click the other?

@chadwhitacre
Copy link
Contributor Author

Alright, getting into a test over here, trying to repro ...

diff --git a/tests/py/test_email.py b/tests/py/test_email.py
index a964a86..f3dbfd5 100644
--- a/tests/py/test_email.py
+++ b/tests/py/test_email.py
@@ -324,6 +324,22 @@ class TestEndpoints(Alice):
                                       )
         assert response.code == 400
 
+    def test_find_root_cause_for_record_nonce_none_bug(self):
+        foo = self.make_package()
+        address = '[email protected]'
+
+        self.hit_email_spt('add-email', address)
+        nonce1 = self.alice.get_email(address).nonce
+
+        self.hit_email_spt('start-verification', address, package_ids=[foo.id])
+        nonce2 = self.alice.get_email(address).nonce
+
+        result1 = self.alice.finish_email_verification(address, nonce1)
+        result2 = self.alice.finish_email_verification(address, nonce2)
+
+        import pdb; pdb.set_trace()
+        assert '???'
+
 
 class TestFunctions(Alice):

@chadwhitacre
Copy link
Contributor Author

chadwhitacre commented Jun 26, 2017

So the hypothesis is that they hit save_email_address to get verified and an end time.

Then ... what do they do in order to hit finish_email_verification with a) a nonce, and b) no packages? 🤔

@chadwhitacre
Copy link
Contributor Author

I need to get into that else to get the nonce and late start ...

@chadwhitacre
Copy link
Contributor Author

Alright, an add-email followed by a start-verification gets me verified and through the else with a nonce (and a late start) ... but I still have packages from the start-verification. How do I get packages to be empty?

@chadwhitacre
Copy link
Contributor Author

Either packages claiming are deleted or they are never created in the first place.

@chadwhitacre
Copy link
Contributor Author

I'm just thinking that our verification flows might not be idempotent. We delay up to 60 seconds on send, would be quite understandable for folks to reissue a verification thinking we'd dropped the ball.

@chadwhitacre
Copy link
Contributor Author

Yes!

(Pdb) l
233                 record = self.get_email(email, cursor, and_lock=True)
234                 if record is None:
235                     return VERIFICATION_FAILED, None, None
236                 packages = self.get_packages_claiming(cursor, nonce)
237                 import pdb; pdb.set_trace()
238  ->             if record.verified and not packages:
239     
240                     # If an email address is already verified, then we should only
241                     # be going through this flow if they are verifying ownership of
242                     # some packages.
243     
(Pdb) packages
[]
(Pdb) record.verified
True
(Pdb) record.nonce
u'9ac4c0c8-130e-43b2-8064-94ce17703dc5'
(Pdb)

@chadwhitacre
Copy link
Contributor Author

Moving to a PR in #4534 ...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant