Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.3.3.013 Add type_id and rank to all contact and project linker tables #144

Merged
merged 8 commits into from
Jun 24, 2024

Conversation

laceysanderson
Copy link
Contributor

@laceysanderson laceysanderson commented May 6, 2024

Issue #140

Description

This PR adds a nullable type_id and rank column to all linker tables involving the project table and contact table. The unique constraint is added via a Postgresql procedure in order to use the UNIQUE NULLS NOT DISTINCT provided by PostgreSQL 15+ if available. It is backwards compatible because these columns are nullable.

Tables updated:

  • _contact linkers: feature_contact, featuremap_contact, library_contact, nd_experiment_contact, project_contact, pubauthor_contact.
  • _project linkers: project_feature, project_pub, nd_experiment_project, project_analysis, project_stock, assay_project, project_dbxref, biomaterial_project

Testing

Ensure the automated tests pass. This shows that this migration applies without error to chado across multiple PostgreSQL versions and both when chado is in the public schema and when it is in a named schema (i.e. teacup).

The automated testing also shows that it applies cleanly before PostgreSQL 15 and after. This shows the procedure that applies the unique constraint is not causing errors.

Manual Testing

Before PostgreSQL 15

Create a docker image/container locally that uses PostgreSQL 12.

git clone https://github.com/GMOD/Chado chado-pr144
cd chado-pr144
git checkout 1.4-1.3.3.013-add-type_id-linkers
docker build --tag gmod/chado:pgsql12 --file docker/Dockerfile --build-arg PGSQL_VERSION=12 ./
docker run -it --rm --volume=$(pwd):/Chado gmod/chado:pgsql12

This will open a bash session inside the new container. Use flyway migrate to apply the current migrations. Note: This will apply all previous approved migrations plus the one in this PR. Ensure no errors happen and it applies successfully.

Use psql -U postgres -h localhost -d chado with the password chadotest and query the various changed table specs to confirm they are updated as expected. For example,

root@366f9b002abd:/Chado# psql -U postgres -h localhost -d chado
Password for user postgres:
psql (12.19 (Ubuntu 12.19-1.pgdg24.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

chado=# \d+ feature_contact
Table "public.feature_contact"

Column Type Collation Nullable Default Storage Stats target Description
feature_contact_id bigint not null nextval('feature_contact_feature_contact_id_seq'::regclass) plain
feature_id bigint not null plain
contact_id bigint not null plain
type_id bigint plain
rank integer 0 plain

Indexes:
"feature_contact_pkey" PRIMARY KEY, btree (feature_contact_id)
"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id)
"feature_contact_idx1" btree (feature_id)
"feature_contact_idx2" btree (contact_id)

Foreign-key constraints:
"feature_contact_contact_id_fkey" FOREIGN KEY (contact_id) REFERENCES contact(contact_id) ON DELETE CASCADE
"feature_contact_feature_id_fkey" FOREIGN KEY (feature_id) REFERENCES feature(feature_id) ON DELETE CASCADE
"feature_contact_type_id_fkey" FOREIGN KEY (type_id) REFERENCES cvterm(cvterm_id) ON DELETE SET NULL
Access method: heap

Note: Before PostgreSQL 15 you should expect to see the unique constraint to look like this:

"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id)

After PostgreSQL 15

Create a docker image/container locally that uses PostgreSQL 12.

git clone https://github.com/GMOD/Chado chado-pr144
cd chado-pr144
git checkout 1.4-1.3.3.013-add-type_id-linkers
docker build --tag gmod/chado:pgsql16 --file docker/Dockerfile --build-arg PGSQL_VERSION=16 ./
docker run -it --rm --volume=$(pwd):/Chado gmod/chado:pgsql16

This will open a bash session inside the new container. Use flyway migrate to apply the current migrations. Note: This will apply all previous approved migrations plus the one in this PR. Ensure no errors happen and it applies successfully.

Use psql -U postgres -h localhost -d chado with the password chadotest and query the various changed table specs to confirm they are updated as expected. For example,

root@859898a8b4ac:/Chado# psql -U postgres -h localhost -d chado
Password for user postgres:
psql (16.3 (Ubuntu 16.3-1.pgdg24.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

chado=# \d+ feature_contact
Table "public.feature_contact"

Column Type Collation Nullable Default Storage Compression Stats target Description
feature_contact_id bigint not null nextval('feature_contact_feature_contact_id_seq'::regclass) plain
feature_id bigint not null plain
contact_id bigint not null plain
type_id bigint plain
rank integer 0 plain

Indexes:
"feature_contact_pkey" PRIMARY KEY, btree (feature_contact_id)
"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id) NULLS NOT DISTINCT
"feature_contact_idx1" btree (feature_id)
"feature_contact_idx2" btree (contact_id)

Foreign-key constraints:
"feature_contact_contact_id_fkey" FOREIGN KEY (contact_id) REFERENCES contact(contact_id) ON DELETE CASCADE
"feature_contact_feature_id_fkey" FOREIGN KEY (feature_id) REFERENCES feature(feature_id) ON DELETE CASCADE
"feature_contact_type_id_fkey" FOREIGN KEY (type_id) REFERENCES cvterm(cvterm_id) ON DELETE SET NULL
Access method: heap

Note: After PostgreSQL 16 you should expect to see the unique constraint to look like this:

"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id) NULLS NOT DISTINCT

@laceysanderson laceysanderson changed the title V1.3.3.013 Add type_id and rank to all linker tables V1.3.3.013 Add type_id and rank to all contact and project linker tables May 31, 2024
@laceysanderson laceysanderson marked this pull request as ready for review May 31, 2024 23:21
@spficklin
Copy link
Contributor

Hi @laceysanderson all flyway tests passed for me. I used this PR to upgrade Chado in a test Tripal site and then added content to many of the content linker tables. All seemd to work fine. I left a comment above out the unique constraints. I understand the reasoning behind having a type_id in the contact linker tables that specifies the role the person has. I also understnd the purpose of rank so you can order those contacts. But I don't undrestand why the other linkers need a type_id and rank? Can you give an example for the project_feature linker table. Is there an issue this PR is responding to that has more details?

@laceysanderson
Copy link
Contributor Author

laceysanderson commented Jun 14, 2024

Thanks for the review @spficklin! My intent with this PR and the original issue was to bring all the linker tables into a consistent state. That said, some more examples of why this is needed would probably have been good to include! Justifying the type for all linker tables is really easy IMO and I'll include a number of examples below. The rank makes sense for ordering as you had mentioned but now that you brought up the comment above I'm not convinced it needs to be in the unique constraint 🤔

Why include a type_id in project linker tables?

IMO it is always important to indicate the type of linkage between two things. For projects, this is often in the form of describing if the project produced the item, used an existing item, referenced an item, improved upon an item, etc. Here are some specific examples:

project_feature

We often connect features with projects for multiple different reasons and it would be good to have a clear way to indicate this. For example, it would be really helpful to be able to specify if a genetic marker or gene is connected to a project because it was (a) generated by that project or (b) reused from a previous project or (c) referenced to provide context? If this is a project looking for the genetic control of flowering colour, then was the gene attached because it has been implicated in previous literature, because it showed up within a QTL region in this analysis or because it may be a homolog of a known flower colour gene in another species?

project_analysis

Picture a genome assembly project. It may be linked to an analysis describing how the scaffolds were generated, and another one describing the annotation process. You may want to use the project_analysis.type_id to indicate genome assembly or annotation. Alternatively, you may want to indicate genome assembly or annotation as the analysis type and use the project_analysis.type_id to indicate that this project funded/did both analyses. Then later another researcher might do some work using this genome version that indicates an issue in the assembly process. We may want to link this new analysis to the genome assembly project to provide warning to anyone using this version.

@spficklin
Copy link
Contributor

spficklin commented Jun 14, 2024

Okay, I understand and that makes sense. Thanks for the explanation. So, it's really meant to explain the relationship between the two records.

@laceysanderson
Copy link
Contributor Author

Yes, exactly :-)

@laceysanderson
Copy link
Contributor Author

laceysanderson commented Jun 17, 2024

This is marked as a medium change since it is backwards compatible and simply adds columns to existing chado tables. As such it required 4+ approvals (2+ from PMC members and 1+ non-Tripal reviews).

@spficklin
Copy link
Contributor

Just to update the conversation thread. @laceysanderson and I had a conversation and thoguht it would be best to remove rank from the unique contratint.

There is a potential case where we might want rank in the unique constraint. For example, if we had a property table with details about a linking record (e.g. project_contactprop) then someone could add additional details to annotate each relationship that might be the same but have different properties. Perhaps a person performs the same action multiple times (and hence deserves a contact) but then has different properties we need to record about each time.

However, no one is asking for that flexibility at the moment and we felt backwards compatiblity would be better served if we removed rank now and added it back in later if there was enough clamour for it. Adding it back in is backwards compatible whereas taking it out later is not.

Thanks @laceysanderson for making the change. I approve the PR.

@nmenda nmenda assigned nmenda and unassigned nmenda Jun 18, 2024
@nmenda nmenda self-requested a review June 18, 2024 20:24
Copy link
Member

@nmenda nmenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@scottcain scottcain self-requested a review June 20, 2024 16:46
Copy link
Member

@scottcain scottcain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the idea of bringing all of linker tables into consistency; it's much better that they all have the these additional columns, and I appreciate parallel construction.

Copy link

@dsenalik dsenalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Tested both dockers and flyway migrations. Additional columns not present before, but present after migration
  • I thought I would actually test the constraint with both Postgres versions.
    As expected, I can create a duplicate entry in v13
chado=# insert into project (name) values ('testproject');
INSERT 0 1
chado=# insert into contact (name) values ('testcontact');
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# select * from project_contact;
 project_contact_id | project_id | contact_id | type_id | rank 
--------------------+------------+------------+---------+------
                  1 |          1 |          1 |         |    0
                  2 |          1 |          1 |         |    0
(2 rows)

And as expected, with v16 the constraint is now in effect for NULLs, and I cannot add duplicate records.

...
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
ERROR:  duplicate key value violates unique constraint "project_contact_c1"
DETAIL:  Key (project_id, contact_id, type_id)=(1, 1, null) already exists.

@laceysanderson
Copy link
Contributor Author

Thank you everyone for the reviews! That's four with two non-Tripal so according to the guidelines I can now merge this 🥰 🎉

@laceysanderson laceysanderson merged commit ef04588 into 1.4 Jun 24, 2024
11 checks passed
@laceysanderson laceysanderson deleted the 1.4-1.3.3.013-add-type_id-linkers branch June 24, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants