-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V1.3.3.013 Add type_id and rank to all contact and project linker tables #144
Conversation
chado/migrations/V1.3.3.013__add_type_id_to_contact_and_project_linkers.sql
Outdated
Show resolved
Hide resolved
Hi @laceysanderson all flyway tests passed for me. I used this PR to upgrade Chado in a test Tripal site and then added content to many of the content linker tables. All seemd to work fine. I left a comment above out the unique constraints. I understand the reasoning behind having a |
Thanks for the review @spficklin! My intent with this PR and the original issue was to bring all the linker tables into a consistent state. That said, some more examples of why this is needed would probably have been good to include! Justifying the type for all linker tables is really easy IMO and I'll include a number of examples below. The rank makes sense for ordering as you had mentioned but now that you brought up the comment above I'm not convinced it needs to be in the unique constraint 🤔 Why include a type_id in project linker tables?IMO it is always important to indicate the type of linkage between two things. For projects, this is often in the form of describing if the project produced the item, used an existing item, referenced an item, improved upon an item, etc. Here are some specific examples: project_featureWe often connect features with projects for multiple different reasons and it would be good to have a clear way to indicate this. For example, it would be really helpful to be able to specify if a genetic marker or gene is connected to a project because it was (a) generated by that project or (b) reused from a previous project or (c) referenced to provide context? If this is a project looking for the genetic control of flowering colour, then was the gene attached because it has been implicated in previous literature, because it showed up within a QTL region in this analysis or because it may be a homolog of a known flower colour gene in another species? project_analysisPicture a genome assembly project. It may be linked to an analysis describing how the scaffolds were generated, and another one describing the annotation process. You may want to use the project_analysis.type_id to indicate genome assembly or annotation. Alternatively, you may want to indicate genome assembly or annotation as the analysis type and use the project_analysis.type_id to indicate that this project funded/did both analyses. Then later another researcher might do some work using this genome version that indicates an issue in the assembly process. We may want to link this new analysis to the genome assembly project to provide warning to anyone using this version. |
Okay, I understand and that makes sense. Thanks for the explanation. So, it's really meant to explain the relationship between the two records. |
Yes, exactly :-) |
This is marked as a medium change since it is backwards compatible and simply adds columns to existing chado tables. As such it required 4+ approvals (2+ from PMC members and 1+ non-Tripal reviews). |
Just to update the conversation thread. @laceysanderson and I had a conversation and thoguht it would be best to remove There is a potential case where we might want rank in the unique constraint. For example, if we had a property table with details about a linking record (e.g. project_contactprop) then someone could add additional details to annotate each relationship that might be the same but have different properties. Perhaps a person performs the same action multiple times (and hence deserves a contact) but then has different properties we need to record about each time. However, no one is asking for that flexibility at the moment and we felt backwards compatiblity would be better served if we removed Thanks @laceysanderson for making the change. I approve the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like the idea of bringing all of linker tables into consistency; it's much better that they all have the these additional columns, and I appreciate parallel construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Tested both dockers and flyway migrations. Additional columns not present before, but present after migration
- I thought I would actually test the constraint with both Postgres versions.
As expected, I can create a duplicate entry in v13
chado=# insert into project (name) values ('testproject');
INSERT 0 1
chado=# insert into contact (name) values ('testcontact');
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# select * from project_contact;
project_contact_id | project_id | contact_id | type_id | rank
--------------------+------------+------------+---------+------
1 | 1 | 1 | | 0
2 | 1 | 1 | | 0
(2 rows)
And as expected, with v16 the constraint is now in effect for NULLs, and I cannot add duplicate records.
...
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
ERROR: duplicate key value violates unique constraint "project_contact_c1"
DETAIL: Key (project_id, contact_id, type_id)=(1, 1, null) already exists.
Thank you everyone for the reviews! That's four with two non-Tripal so according to the guidelines I can now merge this 🥰 🎉 |
Issue #140
Description
This PR adds a nullable type_id and rank column to all linker tables involving the project table and contact table. The unique constraint is added via a Postgresql procedure in order to use the
UNIQUE NULLS NOT DISTINCT
provided by PostgreSQL 15+ if available. It is backwards compatible because these columns are nullable.Tables updated:
Testing
Ensure the automated tests pass. This shows that this migration applies without error to chado across multiple PostgreSQL versions and both when chado is in the public schema and when it is in a named schema (i.e. teacup).
The automated testing also shows that it applies cleanly before PostgreSQL 15 and after. This shows the procedure that applies the unique constraint is not causing errors.
Manual Testing
Before PostgreSQL 15
Create a docker image/container locally that uses PostgreSQL 12.
This will open a bash session inside the new container. Use
flyway migrate
to apply the current migrations. Note: This will apply all previous approved migrations plus the one in this PR. Ensure no errors happen and it applies successfully.Use
psql -U postgres -h localhost -d chado
with the passwordchadotest
and query the various changed table specs to confirm they are updated as expected. For example,Note: Before PostgreSQL 15 you should expect to see the unique constraint to look like this:
After PostgreSQL 15
Create a docker image/container locally that uses PostgreSQL 12.
This will open a bash session inside the new container. Use
flyway migrate
to apply the current migrations. Note: This will apply all previous approved migrations plus the one in this PR. Ensure no errors happen and it applies successfully.Use
psql -U postgres -h localhost -d chado
with the passwordchadotest
and query the various changed table specs to confirm they are updated as expected. For example,Note: After PostgreSQL 16 you should expect to see the unique constraint to look like this: