Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPG intermittent self-test failure #3543

Closed
claudioandre-br opened this issue Dec 20, 2018 · 25 comments
Closed

GPG intermittent self-test failure #3543

claudioandre-br opened this issue Dec 20, 2018 · 25 comments

Comments

@claudioandre-br
Copy link
Member

claudioandre-br commented Dec 20, 2018

Intermitent

. Inside my Circle

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (3xOMP) FAILED (cmp_all(192))

The following issue is not GPG related. It is related to batch mode.

. GPG random test
. Or, you can also use ../run/john alltests.in --max-run=30 --format=sha512crypt

# wget http://openwall.info/wiki/_media/john/test.gpg.tar.gz
# ../run/gpg2john *.asc > ~/file4

# $JtR ~/file4
Using default input encoding: UTF-8
Loaded 4 password hashes with 4 different salts (gpg, OpenPGP / GnuPG Secret Key [32/64])
Cost 1 (s2k-count) is 65536 for all loaded hashes
Cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) is 2 for all loaded hashes
Cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256 10:Twofish 11:Camellia128 12:Camellia192 13:Camellia256]) is 3 for all loaded hashes
Will run 2 OpenMP threads
Proceeding with single, rules:Wordlist
recovery.c:424:28: runtime error: 1.63246e+18 is outside the range of representable values of type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior recovery.c:424:28 in 
Press 'q' or Ctrl-C to abort, almost any other key for status
qwerty2          (Brandon User)
alamakota        (Original)
normal           (Łąśóęśżźć Normal)
Warning: Only 4 candidates buffered for the current salt, minimum 8
needed for performance.
Almost done: Processing the remaining buffered candidate passwords, if any
Warning: Only 1 candidates buffered for the current salt, minimum 8
needed for performance.
Proceeding with wordlist:../run/password.lst, rules:Wordlist
qwerty           (Random User)
4g 0:00:00:07 DONE 2/3 (2018-12-20 11:31) 0.5012g/s 8251p/s 8252c/s 8252C/s 123456..john
Use the "--show" option to display all of the cracked passwords reliably
Session completed

=================================================================
==80==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1024 byte(s) in 1 object(s) allocated from:
    #0 0x4ed437 in __interceptor_malloc (/cwd/bin/bleeding/run/john+0x4ed437)
    #1 0xff5fa3 in mem_alloc /cwd/src/memory.c:76:14
    #2 0xf6fd3f in crk_init /cwd/src/cracker.c:218:27
    #3 0x102bb31 in single_init /cwd/src/single.c:407:2
    #4 0x102a9bc in do_single_crack /cwd/src/single.c:839:2
    #5 0xf46fdd in do_single_pass /cwd/src/batch.c:25:2
    #6 0xf46d38 in do_batch_crack /cwd/src/batch.c:51:3
    #7 0xfa11ef in john_run /cwd/src/john.c:1875:4
    #8 0xf9f270 in main /cwd/src/john.c:2102:2
    #9 0x7f6405c7409a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

SUMMARY: AddressSanitizer: 1024 byte(s) leaked in 1 allocation(s).
@claudioandre-br
Copy link
Member Author

claudioandre-br commented Dec 20, 2018

The ticket contains two unrelated issues.

If the leak is on purpose, just close the ticket.

@claudioandre-br claudioandre-br changed the title gpg failure and leak Batch mode leak Dec 20, 2018
@magnumripper
Copy link
Member

Leak fixed now. What about the self-test fail?

@claudioandre-br
Copy link
Member Author

What about the self-test fail?

It is intermittent. Happens once in a while.

@magnumripper
Copy link
Member

Then let's check this out

@magnumripper magnumripper reopened this Dec 21, 2018
@magnumripper magnumripper changed the title Batch mode leak GPG intermittent self-test failure Dec 21, 2018
@kholia
Copy link
Member

kholia commented Dec 23, 2018

These failures have been plaguing the GPG format for a long time it seems. I was never able to get to the bottom of the issue earlier.

Earlier, I was looking for the problem in the gpg-opencl format (when it had fall-back on CPU code functionality in it).

It now seems that the problem exists within the base CPU code itself? This makes sense to me and would explain those weird gpg-opencl failures earlier.

@magnumripper
Copy link
Member

I bet something in OpenSSL is not thread safe. Have we ever seen problems with a non-omp build?

@kholia
Copy link
Member

kholia commented Dec 24, 2018

I cannot reproduce the GPG format failures with the following command,

$ while true; do OMP_NUM_THREADS=3 ../run/john --format=gpg --test; sleep 1; done

I am running Ubuntu 18.04.1 64-bit here.

@magnumripper
Copy link
Member

@claudioandre-br what version of OpenSSL was that?

@claudioandre-br
Copy link
Member Author

Crypto library: OpenSSL
OpenSSL library version: 01000208f
OpenSSL 1.0.2h  3 May 2016

@kholia
Copy link
Member

kholia commented Dec 26, 2018

@claudioandre-br How do I reproduce this problem locally? Are you able to reproduce the problem on Ubuntu 18.0.1 LTS or something similar?

@magnumripper
Copy link
Member

Try --stress-test=0. How often does it fail? Once in 10? Once in 100? Using how many threads?

@kholia
Copy link
Member

kholia commented Dec 29, 2018

I am running ../run/john --format=gpg --stress-test=0 on Ubuntu 18.04.1 LTS which has OpenSSL 1.1.0g-2ubuntu4.3.

...
#529 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS
#530 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS
#531 Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (4xOMP) PASS

I see no failures at all. I tried OMP_NUM_THREADS=3 but no failures again.

@claudioandre-br Can you please give some pointers on how to reproduce this bug?

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

Looks like this (or similar) is still an issue, gpg format often failing on Linux cfarm29 6.1.0-21-powerpc64le #1 SMP Debian 6.1.90-1 (2024-05-03) ppc64le GNU/Linux: #5491 (comment)

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

On the above system, I am unable to trigger the issue with --stress-test=0, but I am with a shell while loop around a --test-full=0. This suggests that maybe the problem is exposed/hidden by variations in memory layout (ASLR).

The failure is usually on cmp_all of the full size, but on one occasion it was on get_key(47) (with mkpc 64), which suggests we have heap memory corruption.

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

I could also get it to fail in an --enable-asan build, without triggering ASan. This suggests we may have memory corruption via a system library (a bug in there, or in our usage of the library). Or a race condition in our code, but the one failure on get_key suggests otherwise.

solar@cfarm29:~/john/run$ while :; do ../run/john-asan --test-full=0 --format=gpg -tune=2 || break; done
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) PASS
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

This hack didn't catch anything:

+++ b/src/gpg_common_plug.c
@@ -1220,6 +1220,10 @@ int gpg_common_check(unsigned char *keydata, int ks)
        // other things (like mpz integer creation) are needed, we know that
        // our sizes will not overflow.
        out = mem_alloc((gpg_common_cur_salt->datalen) + 0x10000);
+       unsigned char *out_last = &out[gpg_common_cur_salt->datalen + 0x10000 - 1];
+       *out_last = 'c';
+#undef MEM_FREE
+#define MEM_FREE(out) { assert(*out_last == 'c'); free(out); out = NULL; }
        // Quick Hack
        if (!gpg_common_cur_salt->symmetric_mode)
                memcpy(ivec, gpg_common_cur_salt->iv, gpg_common_blockSize(gpg_common_cur_salt->cipher_algorithm));

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

In fact, even this works:

+++ b/src/gpg_common_plug.c
@@ -1220,6 +1220,10 @@ int gpg_common_check(unsigned char *keydata, int ks)
        // other things (like mpz integer creation) are needed, we know that
        // our sizes will not overflow.
        out = mem_alloc((gpg_common_cur_salt->datalen) + 0x10000);
+       unsigned char *out_last = &out[gpg_common_cur_salt->datalen];
+       *out_last = 'c';
+#undef MEM_FREE
+#define MEM_FREE(out) { assert(*out_last == 'c'); free(out); out = NULL; }

So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).

This and many other over-allocations were added by JimF in 7147798 in 2016, with commit title "gpg: bugs found in -test-full=0 using ASAN".

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))

This triggers the same for -cost=0,-8,-7 vs. -cost=0,8,7, so the problem doesn't look specific to (usage of) any one S2K, hash, nor cipher.

Edit: perhaps I am wrong - I actually don't know if -cost affects our self-tests, or only later benchmark/usage. If so, that wasn't a valid test.

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

perhaps I am wrong - I actually don't know if -cost affects our self-tests, or only later benchmark/usage. If so, that wasn't a valid test.

So I went to patch the test vectors in the source. With only the symmetric tests left (last 7 test vectors), the testing is really quick and it didn't fail for me. With the opposite change (removing these symmetric tests), after a while I got:

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (get_key(57))

which is the second time I see this format fail on get_key. The build was with ASan, which didn't trigger. So this confirms memory corruption via library.

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

The two gpg --homedir . --s2k-cipher-algo 3des --simple-sk-checksum --gen-key test vectors appear to be both required and sufficient to reproduce the issue.

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

So the extra 0x10000 looks unneeded (and in general it was a bad idea to be allocating/freeing memory during cracking).

This and many other over-allocations were added by JimF in 7147798 in 2016, with commit title "gpg: bugs found in -test-full=0 using ASAN".

OK, this over-allocation was in fact needed - we have major over-reads (being reads, they were not caught by my canary). About the smallest that works is 0x2700 extra bytes. I wondered if these over-read of uninitialized memory could result in the cmp_all failure mode, but initializing this memory to 0 didn't appear to make a difference.

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

I didn't look into this further yet, but just to write down my understanding from earlier today: when trying usually wrong passwords, we end up decrypting what isn't a valid key. Yet it has embedded length fields in it. Those may end up indicating bignums of up to 64 kbit each. This is why the over-reads. Then we pass those unusually huge bignums into OpenSSL routines, thereby torturing them (and wasting lots of CPU cycles, too, which I guess we'd rather avoid by detecting the over-reads instead). We also ask BN_bin2bn to do its own memory allocation for the output bignum - well, I'm not sure it's able to calculate its size correctly for an unusual input like this - will need to review/test OpenSSL code for that. This could very well be the issue.

@solardiz solardiz added this to the Definitely 2.0.0 milestone Jun 5, 2024
@solardiz
Copy link
Member

solardiz commented Jun 7, 2024

we pass those unusually huge bignums into OpenSSL routines, thereby torturing them (and wasting lots of CPU cycles, too, which I guess we'd rather avoid by detecting the over-reads instead). We also ask BN_bin2bn to do its own memory allocation for the output bignum - well, I'm not sure it's able to calculate its size correctly for an unusual input like this - will need to review/test OpenSSL code for that.

I reviewed current OpenSSL code - looks correct even for such inputs.

Our issue appears to be different. We have larger p and q fields in the salt struct to account for the up to 64kbit values, but our d field is tiny, yet we also write an up to 64kbit bignum in there (which we actually just skip).

I wonder why ASan does not trigger. The struct ends in unsigned char data[1]; which is idiomatic for a dynamically-allocated struct size. Are these maybe excluded from ASan protection?

Anyway, patching the code to skip this field without writing it to d makes the issue go away in my testing.

solardiz added a commit to solardiz/john that referenced this issue Jun 7, 2024
Importantly, such copying frequently overflowed the d[] array with a value
that we skip anyway.

Fixes openwall#3543
solardiz added a commit to solardiz/john that referenced this issue Jun 8, 2024
Importantly, such copying frequently overflowed the d[] array with a value
that we skip anyway.

Fixes openwall#3543
@solardiz
Copy link
Member

solardiz commented Jun 8, 2024

I wonder why ASan does not trigger. The struct ends in unsigned char data[1]; which is idiomatic for a dynamically-allocated struct size. Are these maybe excluded from ASan protection?

That's not how ASan works, so was probably a wrong guess. Another probably wrong guess is the overwrite is somehow contained to the struct (which is valid behavior in C), but results in us making incorrect calls into OpenSSL later (because of the overwritten salt data, which maybe wouldn't have passed our valid). I do not have a really good idea as to what was going on, but I think the issue is now fixed - at least I cannot reproduce it anymore after my fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants