-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use natural keys for foreign key relationships during serialization/deserialization #33999
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any equivalent change that needs to be made to the load code, to make it know to interpret these as the "natural" key?
(My two comments are not blocking—I agree that this won't break anything that currently works)
Danny asked a couple of important questions that have led to some more discoveries.
Through testing this and the behavior I observed, my understanding was that this change would only result in a different output for the SQLUserData model, and not for Form and Case related models. My unjustified rationale was that the foreign key fields for Form and Case related models didn't reference the primary key directly, and that this somehow impacted how they were serialized, whereas the SQLUserData foreign key field to auth.User depended on the primary key. However this is not the case. I think this behavior is due to how our
So the All that to say, I don't foresee any immediate issues with this change, but we are not aligned with Django's recommended approach to using natural keys which could lead to unknown issues in future Django upgrades. Perhaps it is worth aligning with Django's recommendation just to be safe. |
Actually, it looks like we do return a tuple for LedgerTransaction, which means as it stands right now, data generated with this change would not be successfully imported. I'm leaning towards doing what I said above, which will entail returning tuples for all |
This updated any natural_key methods to ensure tuples were returned which ensures the serialized keys will be usable in get_by_natural_key.
This goes against Django's recommendation and has the potential to change behavior in future Django upgrades. However it is worth doing to both accommodate our custom SQL load code, and to avoid needing to do DB lookups for cases and forms for every object that foreign keys to either.
More updates! I ended up reverting back to returning the case_id and form_id for CommCareCase and XFormInstance instead of a tuple in 0d29b4e. After looking closer at Django's deserializer code for foreign keys with Daniel, the implications of returning a tuple as the natural key for CommCareCase and XFormInstance became more clear. Given that code, we would be incurring an additional DB lookup for a case or form when loading in large tables like CaseTransaction. There doesn't seem to be any reason we need to return tuples at the current time, but as I mentioned in comments in that change, Django upgrades could potentially break this in the future. So the remaining piece was to ensure foreign key fields for both iterable natural keys and non-iterable natural keys are loaded correctly. I added tests in 65cdf6c testing both of those paths which confirm data is loaded as expected. I figured I could use a full "end to end" test to ensure SQLUserData is read from our DB, serialized, deserialized, and loaded as expected so added that in ce3cb54. |
Includes some general cleanup as well
Product Description
Technical Summary
I added what is hopefully a thorough description of this change in the comment for the dump method, but I'll rephrase that here and provide some additional proof for why this is necessary.
When loading domain data into a fresh environment from a file generated by the existing
dump_domain_data
command, we hit a ForeignKey violation when loading SQLUserData, which foreign keys to the Django auth.User object via the primary key.The problem is that when we serialize our dumped data, we set
use_natural_primary_keys=True
to use natural keys for our sharded objects (the only place we definednatural_key
methods in our code). However the auth.User defines anatural_key
method, which ensures that the primary key is not included when serializing this object (docs here). Since SQLUserData foreign keys to this object, and we haveuse_natural_foreign_keys=False
in our serializer, the foreign key relationship is serialized using the auth.User's primary key, instead of its natural key.Setting
use_natural_foreign_keys=True
ensures that the serialized SQLUserData will reference auth.User's natural key. Also have glossed over it, but auth.User also defines aget_by_natural_key
method which is used when deserializing data, enabling SQLUserData to foreign key to the correct object at that point.Here's an example of what a serialized SQLUserData object looks like before and after this change, looking most closely at the value for the "django_user" field:
Before
After
The reason this came up recently is because SQLUserData is the first object we include in dumped data that foreign keys to auth.User. All other objects that foreign key to auth.User are excluded from the dump.
Feature Flag
Safety Assurance
Safety story
This code is run manually by developers and does not have any direct user facing impact.
I confirmed the following:
natural_key
is only used on sharded objects in our codeget_by_natural_key
is not used anywhere in our codejThe worst case is that this change corrupts the load_domain_data command, which is already broken because of this issue so 🤷🏻.
Automated test coverage
This area doesn't have the greatest test coverage. It is a bit tricky to ensure that all models are dumped and loaded properly, but I intend to spend time improving the test suite here. When debugging this specific issue, I used some simple unit tests, but those didn't feel worth committing as they really just verify django serialization behavior as it is documented.
QA Plan
N/A
Rollback instructions
Labels & Review