Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmailSettings password encryption improvement #35403

Merged
merged 15 commits into from
Dec 9, 2024

Conversation

Jtang-1
Copy link
Contributor

@Jtang-1 Jtang-1 commented Nov 19, 2024

Product Description

Technical Summary

Ticket: https://dimagi.atlassian.net/browse/SAAS-16191

Introduces a solution for these code scan alerts:
https://github.com/dimagi/commcare-hq/security/code-scanning/295
https://github.com/dimagi/commcare-hq/security/code-scanning/296

CodeQL recommends using CBC cipher mode instead of ECB. From the doc:

ECB should not be used as a mode for encryption as it has dangerous weaknesses. Data is encrypted the same way every time, which means that the same plaintext input will always produce the same ciphertext. This behavior makes messages encrypted with ECB more vulnerable to replay attacks.

This PR introduces a encryption and decryption function that uses ECB as well as a helper function to convert existing ECB encrypted strings to a CBC encrypted string.

Many models store either a password, API key, etc. that is encrypted with ECB. Those fields will need to be migrated to be CBC encrypted. This PR begins the migration process for only EmailSettings model.

Feature Flag

Not specific to a feature flag but the model effected is only relevant for this feature flag https://www.commcarehq.org/hq/flags/edit/custom_email_gateway/

Safety Assurance

Safety story

Locally tested

Automated test coverage

Added tests that the encryption and decryption with CBC results in the expected plaintext.
Also tests that the function for reencryption from ECB to CBC on the EmailSettings model existing password field results in password being overwritten with the same plaintext password but cbc encrypted.

QA Plan

no QA

This is the first phase which includes:

  • Support read from both CBC and ECB decryption of the password field (depending on the prefix)
  • Write to password only with CBC encryption

Following PRs will:
1.

  • Re-encrypting password field to CBC mode (and include aes-cbc prefix) and overwriting the ECB encrypted password
  • Update: support read from only CBC decryption

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@dimagimon dimagimon added the reindex/migration Reindex or migration will be required during or before deploy label Nov 19, 2024
Copy link

sentry-io bot commented Nov 19, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: corehq/motech/utils.py

Function Unhandled Issue
b64_aes_decrypt UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 0: invalid start byte ...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@Jtang-1 Jtang-1 requested a review from jingcheng16 November 19, 2024 21:21
jingcheng16 and others added 4 commits November 19, 2024 15:08
removed test for encrypting since it's already covered by test_encrypt_decrypt_cbc_ascii
and there's no way to mock the randomized "iv" value via doctest.testmod
@Jtang-1 Jtang-1 force-pushed the jt+jc/encryption_improvement branch from d92e102 to 0820feb Compare November 19, 2024 23:11
@Jtang-1 Jtang-1 marked this pull request as ready for review November 19, 2024 23:12
@Jtang-1 Jtang-1 requested a review from kaapstorm as a code owner November 19, 2024 23:12
@Jtang-1 Jtang-1 mentioned this pull request Dec 4, 2024
4 tasks


class EmailSettings(models.Model):
domain = models.CharField(max_length=255, unique=True)
username = models.CharField(max_length=255)
password = models.CharField(max_length=255)
password = models.CharField(max_length=255) # This will be replaced by password_cbc after migration is done
password_cbc = models.CharField(max_length=255)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new field, just use a different prefix.

Define a new prefix, say,

# corehq/motech/const.py
ALGO_AES_CBC = 'aes-cbc'

And then, you can use it to encrypt and decrypt:

    @property
    def plaintext_password(self):
        if self.password.startswith(f'${ALGO_AES_CBC}$'):
            ciphertext = self.password.split('$', 2)[2]
            return b64_aes_cbc_decrypt(ciphertext)
        if self.password.startswith(f'${ALGO_AES}$'):
            ciphertext = self.password.split('$', 2)[2]
            return b64_aes_decrypt(ciphertext)
        return self.password

    @plaintext_password.setter
    def plaintext_password(self, plaintext):
        if plaintext != PASSWORD_PLACEHOLDER:
            ciphertext = b64_aes_cbc_encrypt(plaintext)
            self.password = f'${ALGO_AES_CBC}${ciphertext}'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion - this way is cleaner. ac20dac

Jing and I did consider this but decided to create another field so that we're not overriding existing data just to make the transition a bit safer. But looking at it again, it's pretty straightforward so i'm comfortable doing it this way.

Just curious - is the main benefit of this strategy to reduce overhead of creating a new field and eventually having to delete the original? Or is there anything else additional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main benefit is simplicity of implementation and rollout. It's much easier to roll out if there are no database columns to be added/removed.

@@ -29,8 +36,21 @@ def plaintext_password(self):
return b64_aes_decrypt(ciphertext)
return self.password

@property
def plaintext_password_cbc(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is confusing to me. Is the return value plaintext or CBC-encrpyted? I think it can be removed if Norman's suggestion of using an aes-cbc prefix is accepted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point, I ended up removing it in favor of Norman's suggestion.

@@ -12,8 +12,9 @@ def copy_and_reencrypt_password_to_password_cbc(apps, schema_editor):
EmailSettings = apps.get_model('email', 'EmailSettings')

for email_settings in EmailSettings.objects.all():
email_settings.password = reencrypt_ecb_to_cbc_mode(email_settings.password, f'${ALGO_AES}$')
email_settings.save()
if not email_settings.password.startswith(f'${ALGO_AES_CBC}$'):
Copy link
Contributor

@millerdev millerdev Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter could be applied to the database query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better 1601646

Comment on lines 14 to 17
for email_settings in EmailSettings.objects.all():
email_settings.password = reencrypt_ecb_to_cbc_mode(email_settings.password, f'${ALGO_AES}$')
email_settings.save()
if not email_settings.password.startswith(f'${ALGO_AES_CBC}$'):
email_settings.password = reencrypt_ecb_to_cbc_mode(email_settings.password, f'${ALGO_AES}$')
email_settings.save()
Copy link
Contributor

@millerdev millerdev Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will load all rows into memory at once, and then save each one individually, which has two potential problems:

  • May use a lot of memory.
  • Will take a long time because many individual updates are much less efficient than bulk updates.

What do you think of reimplementing this to load rows in batches and update each batch with a single batch update query?

Edit: for batch loading look at paginate_query, and for bulk updates look at bulk_update().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea but I don't think is necessary in this case. Jing and I checked and (I forgot the exact number) there's around 10 rows in that table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually didn't know bulk_update existed, that's really handy. thanks for sharing.

@dimagimon dimagimon removed the reindex/migration Reindex or migration will be required during or before deploy label Dec 9, 2024
@Jtang-1 Jtang-1 changed the title Jt+jc/encryption improvement EmailSettings password encryption improvement Dec 9, 2024
@@ -149,6 +149,23 @@ def reencrypt_ecb_to_cbc_mode(encrypted_text, existing_prefix=None):
return f'${ALGO_AES_CBC}${new_ciphertext}'


# Only needed for migration revert from CBC to ECB mode.
def reencrypt_cbc_to_ecb_mode(encrypted_text, existing_prefix=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only referenced in the migration, what do you think of moving it into the migration file? (And removing it from this PR since the migration has been removed.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to that, but this same function will be needed for migration of other models (in future PRs). Does it make sense to copy this same function into each migration file?

Copy link
Contributor

@millerdev millerdev Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Migration files should avoid importing things from other modules as much as possible because they should be frozen-in-time rather than depending on things that may be refactored and change in other ways over time. Granted, this function is probably unlikely to change over time, but I still think it's safer to include it in each migration file that needs it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, how would you advise i unit test the function in that case? I don't think i can import functions defined within a migration file

Comment on lines 43 to 46
def test_empty_password_reencrypt_cbc_to_ecb_mode_match_plaintext(self):
plaintext_password = ''
self.email_settings.password = plaintext_password
reencrypted_password = reencrypt_ecb_to_cbc_mode(self.email_settings.password, f'${ALGO_AES}$')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this test says cbc_to_ecb, but it uses the reencrypt_ecb_to_cbc_mode function. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops typo, fixed 0a94aa1

Comment on lines 45 to 48
self.email_settings.password = plaintext_password
reencrypted_password = reencrypt_ecb_to_cbc_mode(self.email_settings.password, f'${ALGO_AES}$')
self.email_settings.password = reencrypted_password
self.assertEqual(plaintext_password, reencrypted_password)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why does this test need to interact with self.email_settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was intended to assert against self.email_settings.plaintext_password, fixed b260ec9

@Jtang-1 Jtang-1 merged commit 8a72efa into master Dec 9, 2024
13 checks passed
@Jtang-1 Jtang-1 deleted the jt+jc/encryption_improvement branch December 9, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants