V1.3.3.013 Add type_id and rank to all contact and project linker tables #144

laceysanderson · 2024-05-06T19:15:49Z

Issue #140

Description

This PR adds a nullable type_id and rank column to all linker tables involving the project table and contact table. The unique constraint is added via a Postgresql procedure in order to use the UNIQUE NULLS NOT DISTINCT provided by PostgreSQL 15+ if available. It is backwards compatible because these columns are nullable.

Tables updated:

_contact linkers: feature_contact, featuremap_contact, library_contact, nd_experiment_contact, project_contact, pubauthor_contact.
_project linkers: project_feature, project_pub, nd_experiment_project, project_analysis, project_stock, assay_project, project_dbxref, biomaterial_project

Testing

Ensure the automated tests pass. This shows that this migration applies without error to chado across multiple PostgreSQL versions and both when chado is in the public schema and when it is in a named schema (i.e. teacup).

The automated testing also shows that it applies cleanly before PostgreSQL 15 and after. This shows the procedure that applies the unique constraint is not causing errors.

Manual Testing

Before PostgreSQL 15

Create a docker image/container locally that uses PostgreSQL 12.

git clone https://github.com/GMOD/Chado chado-pr144
cd chado-pr144
git checkout 1.4-1.3.3.013-add-type_id-linkers
docker build --tag gmod/chado:pgsql12 --file docker/Dockerfile --build-arg PGSQL_VERSION=12 ./
docker run -it --rm --volume=$(pwd):/Chado gmod/chado:pgsql12

This will open a bash session inside the new container. Use flyway migrate to apply the current migrations. Note: This will apply all previous approved migrations plus the one in this PR. Ensure no errors happen and it applies successfully.

Use psql -U postgres -h localhost -d chado with the password chadotest and query the various changed table specs to confirm they are updated as expected. For example,

root@366f9b002abd:/Chado# psql -U postgres -h localhost -d chado
Password for user postgres:
psql (12.19 (Ubuntu 12.19-1.pgdg24.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

chado=# \d+ feature_contact
Table "public.feature_contact"

Column Type Collation Nullable Default Storage Stats target Description

feature_contact_id bigint not null nextval('feature_contact_feature_contact_id_seq'::regclass) plain

feature_id bigint not null plain

contact_id bigint not null plain

type_id bigint plain

rank integer 0 plain

Indexes:
"feature_contact_pkey" PRIMARY KEY, btree (feature_contact_id)
"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id)
"feature_contact_idx1" btree (feature_id)
"feature_contact_idx2" btree (contact_id)

Foreign-key constraints:
"feature_contact_contact_id_fkey" FOREIGN KEY (contact_id) REFERENCES contact(contact_id) ON DELETE CASCADE
"feature_contact_feature_id_fkey" FOREIGN KEY (feature_id) REFERENCES feature(feature_id) ON DELETE CASCADE
"feature_contact_type_id_fkey" FOREIGN KEY (type_id) REFERENCES cvterm(cvterm_id) ON DELETE SET NULL
Access method: heap

Note: Before PostgreSQL 15 you should expect to see the unique constraint to look like this:

"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id)

After PostgreSQL 15

Create a docker image/container locally that uses PostgreSQL 12.

git clone https://github.com/GMOD/Chado chado-pr144
cd chado-pr144
git checkout 1.4-1.3.3.013-add-type_id-linkers
docker build --tag gmod/chado:pgsql16 --file docker/Dockerfile --build-arg PGSQL_VERSION=16 ./
docker run -it --rm --volume=$(pwd):/Chado gmod/chado:pgsql16

This will open a bash session inside the new container. Use flyway migrate to apply the current migrations. Note: This will apply all previous approved migrations plus the one in this PR. Ensure no errors happen and it applies successfully.

Use psql -U postgres -h localhost -d chado with the password chadotest and query the various changed table specs to confirm they are updated as expected. For example,

root@859898a8b4ac:/Chado# psql -U postgres -h localhost -d chado
Password for user postgres:
psql (16.3 (Ubuntu 16.3-1.pgdg24.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

chado=# \d+ feature_contact
Table "public.feature_contact"

Column Type Collation Nullable Default Storage Compression Stats target Description

feature_contact_id bigint not null nextval('feature_contact_feature_contact_id_seq'::regclass) plain

feature_id bigint not null plain

contact_id bigint not null plain

type_id bigint plain

rank integer 0 plain

Indexes:
"feature_contact_pkey" PRIMARY KEY, btree (feature_contact_id)
"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id) NULLS NOT DISTINCT
"feature_contact_idx1" btree (feature_id)
"feature_contact_idx2" btree (contact_id)

Foreign-key constraints:
"feature_contact_contact_id_fkey" FOREIGN KEY (contact_id) REFERENCES contact(contact_id) ON DELETE CASCADE
"feature_contact_feature_id_fkey" FOREIGN KEY (feature_id) REFERENCES feature(feature_id) ON DELETE CASCADE
"feature_contact_type_id_fkey" FOREIGN KEY (type_id) REFERENCES cvterm(cvterm_id) ON DELETE SET NULL
Access method: heap

Note: After PostgreSQL 16 you should expect to see the unique constraint to look like this:

"feature_contact_c1" UNIQUE CONSTRAINT, btree (feature_id, contact_id, type_id) NULLS NOT DISTINCT

chado/migrations/V1.3.3.013__add_type_id_to_contact_and_project_linkers.sql

spficklin · 2024-06-14T18:10:26Z

Hi @laceysanderson all flyway tests passed for me. I used this PR to upgrade Chado in a test Tripal site and then added content to many of the content linker tables. All seemd to work fine. I left a comment above out the unique constraints. I understand the reasoning behind having a type_id in the contact linker tables that specifies the role the person has. I also understnd the purpose of rank so you can order those contacts. But I don't undrestand why the other linkers need a type_id and rank? Can you give an example for the project_feature linker table. Is there an issue this PR is responding to that has more details?

laceysanderson · 2024-06-14T21:03:06Z

Thanks for the review @spficklin! My intent with this PR and the original issue was to bring all the linker tables into a consistent state. That said, some more examples of why this is needed would probably have been good to include! Justifying the type for all linker tables is really easy IMO and I'll include a number of examples below. The rank makes sense for ordering as you had mentioned but now that you brought up the comment above I'm not convinced it needs to be in the unique constraint 🤔

Why include a type_id in project linker tables?

IMO it is always important to indicate the type of linkage between two things. For projects, this is often in the form of describing if the project produced the item, used an existing item, referenced an item, improved upon an item, etc. Here are some specific examples:

project_feature

We often connect features with projects for multiple different reasons and it would be good to have a clear way to indicate this. For example, it would be really helpful to be able to specify if a genetic marker or gene is connected to a project because it was (a) generated by that project or (b) reused from a previous project or (c) referenced to provide context? If this is a project looking for the genetic control of flowering colour, then was the gene attached because it has been implicated in previous literature, because it showed up within a QTL region in this analysis or because it may be a homolog of a known flower colour gene in another species?

project_analysis

Picture a genome assembly project. It may be linked to an analysis describing how the scaffolds were generated, and another one describing the annotation process. You may want to use the project_analysis.type_id to indicate genome assembly or annotation. Alternatively, you may want to indicate genome assembly or annotation as the analysis type and use the project_analysis.type_id to indicate that this project funded/did both analyses. Then later another researcher might do some work using this genome version that indicates an issue in the assembly process. We may want to link this new analysis to the genome assembly project to provide warning to anyone using this version.

spficklin · 2024-06-14T22:01:42Z

Okay, I understand and that makes sense. Thanks for the explanation. So, it's really meant to explain the relationship between the two records.

laceysanderson · 2024-06-14T22:20:32Z

Yes, exactly :-)

… its not part of the unique key.

laceysanderson · 2024-06-17T19:48:47Z

This is marked as a medium change since it is backwards compatible and simply adds columns to existing chado tables. As such it required 4+ approvals (2+ from PMC members and 1+ non-Tripal reviews).

spficklin · 2024-06-18T15:30:51Z

Just to update the conversation thread. @laceysanderson and I had a conversation and thoguht it would be best to remove rank from the unique contratint.

There is a potential case where we might want rank in the unique constraint. For example, if we had a property table with details about a linking record (e.g. project_contactprop) then someone could add additional details to annotate each relationship that might be the same but have different properties. Perhaps a person performs the same action multiple times (and hence deserves a contact) but then has different properties we need to record about each time.

However, no one is asking for that flexibility at the moment and we felt backwards compatiblity would be better served if we removed rank now and added it back in later if there was enough clamour for it. Adding it back in is backwards compatible whereas taking it out later is not.

Thanks @laceysanderson for making the change. I approve the PR.

nmenda

Looks good to me!

scottcain

I very much like the idea of bringing all of linker tables into consistency; it's much better that they all have the these additional columns, and I appreciate parallel construction.

dsenalik

Tested both dockers and flyway migrations. Additional columns not present before, but present after migration
I thought I would actually test the constraint with both Postgres versions.
As expected, I can create a duplicate entry in v13

chado=# insert into project (name) values ('testproject');
INSERT 0 1
chado=# insert into contact (name) values ('testcontact');
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# select * from project_contact;
 project_contact_id | project_id | contact_id | type_id | rank 
--------------------+------------+------------+---------+------
                  1 |          1 |          1 |         |    0
                  2 |          1 |          1 |         |    0
(2 rows)

And as expected, with v16 the constraint is now in effect for NULLs, and I cannot add duplicate records.

...
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
INSERT 0 1
chado=# insert into project_contact (project_id, contact_id, rank) values (1, 1, 0);
ERROR:  duplicate key value violates unique constraint "project_contact_c1"
DETAIL:  Key (project_id, contact_id, type_id)=(1, 1, null) already exists.

laceysanderson · 2024-06-24T19:53:33Z

Thank you everyone for the reviews! That's four with two non-Tripal so according to the guidelines I can now merge this 🥰 🎉

laceysanderson added 2 commits May 6, 2024 10:00

Start with _contact linkers.

64e6bed

Add procedure so we can add constraint based on postgresql version.

6b32d1b

laceysanderson mentioned this pull request May 8, 2024

Chado Contact by Role field tripal/tripal#1875

Merged

laceysanderson added 3 commits May 21, 2024 16:01

Merge branch '1.4' into 1.4-1.3.3.013-add-type_id-linkers

9d0e858

Merge branch '1.4' into 1.4-1.3.3.013-add-type_id-linkers

f380dc9

Decide to only table contact and project linkers in this PR/Migration.

498d414

laceysanderson changed the title ~~V1.3.3.013 Add type_id and rank to all linker tables~~ V1.3.3.013 Add type_id and rank to all contact and project linker tables May 31, 2024

laceysanderson marked this pull request as ready for review May 31, 2024 23:21

spficklin reviewed Jun 14, 2024

View reviewed changes

chado/migrations/V1.3.3.013__add_type_id_to_contact_and_project_linkers.sql Outdated Show resolved Hide resolved

laceysanderson added 3 commits June 17, 2024 12:56

Remove rank from the unique constraint.

1650624

Add comment to describe the role of the type_id in these linker tables.

e3d21d0

Adds a comment on the rank column to indicate the role it plays since…

2aa02ad

… its not part of the unique key.

laceysanderson added the medium change label Jun 17, 2024

spficklin approved these changes Jun 18, 2024

View reviewed changes

nmenda assigned nmenda and unassigned nmenda Jun 18, 2024

nmenda self-requested a review June 18, 2024 20:24

nmenda approved these changes Jun 18, 2024

View reviewed changes

scottcain self-requested a review June 20, 2024 16:46

scottcain approved these changes Jun 20, 2024

View reviewed changes

dsenalik approved these changes Jun 24, 2024

View reviewed changes

laceysanderson merged commit ef04588 into 1.4 Jun 24, 2024
11 checks passed

laceysanderson deleted the 1.4-1.3.3.013-add-type_id-linkers branch June 24, 2024 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1.3.3.013 Add type_id and rank to all contact and project linker tables #144

V1.3.3.013 Add type_id and rank to all contact and project linker tables #144

laceysanderson commented May 6, 2024 •

edited

Loading

spficklin commented Jun 14, 2024

laceysanderson commented Jun 14, 2024 •

edited

Loading

spficklin commented Jun 14, 2024 •

edited

Loading

laceysanderson commented Jun 14, 2024

laceysanderson commented Jun 17, 2024 •

edited

Loading

spficklin commented Jun 18, 2024

nmenda left a comment

scottcain left a comment

dsenalik left a comment

laceysanderson commented Jun 24, 2024

Column	Type	Nullable	Default	Storage
feature_contact_id	bigint	not null	nextval('feature_contact_feature_contact_id_seq'::regclass)	plain
feature_id	bigint	not null		plain
contact_id	bigint	not null		plain
type_id	bigint			plain
rank	integer		0	plain

V1.3.3.013 Add type_id and rank to all contact and project linker tables #144

V1.3.3.013 Add type_id and rank to all contact and project linker tables #144

Conversation

laceysanderson commented May 6, 2024 • edited Loading

Issue #140

Description

Testing

Manual Testing

Before PostgreSQL 15

After PostgreSQL 15

spficklin commented Jun 14, 2024

laceysanderson commented Jun 14, 2024 • edited Loading

Why include a type_id in project linker tables?

project_feature

project_analysis

spficklin commented Jun 14, 2024 • edited Loading

laceysanderson commented Jun 14, 2024

laceysanderson commented Jun 17, 2024 • edited Loading

spficklin commented Jun 18, 2024

nmenda left a comment

Choose a reason for hiding this comment

scottcain left a comment

Choose a reason for hiding this comment

dsenalik left a comment

Choose a reason for hiding this comment

laceysanderson commented Jun 24, 2024

laceysanderson commented May 6, 2024 •

edited

Loading

laceysanderson commented Jun 14, 2024 •

edited

Loading

spficklin commented Jun 14, 2024 •

edited

Loading

laceysanderson commented Jun 17, 2024 •

edited

Loading