Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support associations with composite foreign keys #3638

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

soundmonster
Copy link

@soundmonster soundmonster commented May 18, 2021

This PR introduces support for associations on composite foreign keys.

Motivation

It's possible to define composite foreign keys in migrations using refences(..., with: [...]), but Ecto doesn't currently support multi-column references out-of-the-box. Splitting the primary key into multiple columns allows using a part of the primary key for partitioning, constraints/validation, etc.

Rough idea

  • Allow lists of atoms (in addition to just an atom) for :reference, :foreign_keys, :join_keys and :type in schema macros, and
  • Allow only lists of atoms :owner_key and:related_key in association structs

Status

This is a work in progress with very limited functionality. Early feedback is highly appreciated! What has been worked on so far:

  • Schema support
    • has_one
    • has_many
    • has_through
    • belongs_to
    • many_to_many
  • Changeset support
  • Integration tests
    • has_one
    • has_many
    • has_through
    • belongs_to
    • many_to_many
  • Update docs

@soundmonster
Copy link
Author

Might be related to #3351

@@ -60,7 +60,7 @@ integration-test-base:
apk del .build-dependencies && rm -f msodbcsql*.sig mssql-tools*.apk
ENV PATH="/opt/mssql-tools/bin:${PATH}"

GIT CLONE https://github.com/elixir-ecto/ecto_sql.git /src/ecto_sql
GIT CLONE --branch composite_foreign_keys https://github.com/soundmonster/ecto_sql.git /src/ecto_sql
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch needs a few changes to the base migration used for the integration tests; I've opened a PR for this so I hope to be able to drop this line soon.

@@ -54,6 +54,8 @@ defmodule Ecto.Integration.Post do
has_one :update_permalink, Ecto.Integration.Permalink, foreign_key: :post_id, on_delete: :delete_all, on_replace: :update
has_many :comments_authors, through: [:comments, :author]
belongs_to :author, Ecto.Integration.User
belongs_to :composite, Ecto.Integration.CompositePk,
foreign_key: [:composite_a, :composite_b], references: [:a, :b], type: [:integer, :integer], on_replace: :nilify
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key names here can probably be inferred from the models themselves; I'm not sure if this is something I should target in this PR, too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being explicit is probably better for now.

@@ -152,6 +152,24 @@ defmodule Ecto.Integration.RepoTest do
assert TestRepo.all(PostUserCompositePk) == []
end

@tag :composite_pk
# TODO this needs a better name
test "insert, update and delete with associated composite pk #2" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the associated from the test before and then this one will be good. :)

@@ -35,7 +35,7 @@ defmodule Ecto.Association do
required(:cardinality) => :one | :many,
required(:relationship) => :parent | :child,
required(:owner) => atom,
required(:owner_key) => atom,
required(:owner_key) => atom | list(atom),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably normalize it to always be a list? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make sure we consider owner_key everywhere possible. Possibly the same change needs to be done for related_key!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that this changing to a list caused a regression in absinthe’s dataloader in my project. I’ve set up an initial PR there to address it for when this feature lands

Copy link
Member

@v0idpwn v0idpwn Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should be kept as atom() | list(atom), then? To avoid breaking compatibility (and making so that it only fails for composite keys)?

foreign_key_types = if is_list(foreign_key_type) do
foreign_key_type
else
List.duplicate(foreign_key_type, length(foreign_keys))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a test for this branch. :)

@josevalim
Copy link
Member

Ping. If this is ready for another review, please let me know. :)

@soundmonster
Copy link
Author

I'll have time to work on this next week. Thank you for the reminder!

@soundmonster
Copy link
Author

Two days of work later, the PR is somewhat further into the woods but quite some work is still needed around supporting composite keys for many-to-many associations. The most puzzling thing so far has been finding a good way to join tables on a variable list of columns. Tried and works (at least with Postgres):

  • Join on just the first column, then add the rest in a where clause (e.g. with Enum.reduce/3). Postgres' planner doesn't care if the join condition is defined in the JOIN clause or in a WHERE clause, but I'm not sure if that's the case with other databases.
  • Use a recursive function together with Ecto.Query.dynamic/2 to generate the on: option for the join. I'm not sure about the runtime overhead of doing this; it's also hard to generalize this because the bindings used by dynamic are often in different positions ("first and last" vs. "the last two"). However, it is much easier to understand what's happening and there's no "hd(keys) goes into the join, tl(keys) goes into a reduce with where" shenanigans.

I'm still very interested in getting this finished and would appreciate your thoughts on:

  • is this going in the right direction, at all?
  • what's the best way to handle a slow contributor? 😅 I'm committed to eventually finishing this up, but I'm not sure on timelines.

@josevalim
Copy link
Member

@soundmonster my suggestion is to not support composite foreign keys on many_to_many for now. What do you think? it will allow us to merge and review part of the work and you can take your own pace. :)

@soundmonster
Copy link
Author

Thank you for your reply, José. I've tried backtracking on ManyToMany, but the amount of changes needed to get the query planner and the HasThrough associations to work with both atoms and lists of atoms is more work than what I think is needed to complete support in ManyToMany. The current state of this branch passes all unit tests and only fails 6 integration tests (only checked with Postgres so far). I would have liked to split it into better reviewable chunks but I'm afraid that would be 2x more work 😅

@braulio
Copy link

braulio commented Nov 2, 2021

we look forward to completion!

@soundmonster soundmonster force-pushed the composite_foreign_keys branch 2 times, most recently from 157ad30 to aaf6b2f Compare November 17, 2021 15:39
@soundmonster
Copy link
Author

Status update:

  • composite keys work for all associations 🥳
  • unit tests are green (on my machine, workflows need approval)
  • integration tests are green with Postgres; I haven't yet gotten Earthly to run properly to test MySQL and MSSQL, maybe it's worth trying it out on CI?
  • there's still quite a few TODOs for cleanup, esp. removing List.wrap/1 everywhere
  • docs are still open.

Where I need help:

  • I'm using dynamic fragments to generate fields for association joins and wheres. This doesn't only affect composite keys, but all associations.
    • Is this expected to have a significantly negative performance impact?
    • Is there a better/other way to do this, apart from returning plain AST?
  • It looks to me like all detailed association docs are in association.ex and there's no dedicated guide, but I might be missing something. Could you please point me to documentation / examples on associations outside or association.ex?
  • Probably workflow approval here and maybe for the companion PR in ecto_sql.

@isavita
Copy link

isavita commented Dec 16, 2021

I am looking forward to completion as well!

@josevalim
Copy link
Member

josevalim commented Dec 16, 2021 via email

@soundmonster soundmonster force-pushed the composite_foreign_keys branch 2 times, most recently from 4768811 to b2a5bf7 Compare December 27, 2021 10:51
lib/ecto/changeset.ex Outdated Show resolved Hide resolved
lib/ecto/schema.ex Outdated Show resolved Hide resolved
raise ArgumentError, "foreign_key #{inspect name} must be distinct from corresponding association name"
end

if Keyword.get(opts, :define_field, true) do
__field__(mod, opts[:foreign_key], foreign_key_type, opts)
foreign_keys = List.wrap(opts[:foreign_key])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is why the line above works... yeah, we should clean up a bit.

foreign_key_type = opts[:type] || Module.get_attribute(mod, :foreign_key_type)

if name == Keyword.get(opts, :foreign_key) do
if [name] == Keyword.get(opts, :foreign_key) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name in Keyword.get(opts, :foreign_key)?

lib/ecto/association.ex Outdated Show resolved Hide resolved
@@ -261,6 +272,65 @@ defmodule Ecto.Association do
combine_assoc_query(query, source.where || [])
end

def transpose_values([[_]] = values), do: values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should @doc false this and the functions below.

@soundmonster soundmonster force-pushed the composite_foreign_keys branch from 919d366 to d42688c Compare May 16, 2023 12:39
@MarthinL
Copy link

I've extracted minimal schema code from my main project by which to test composite primary keys and came across something that's either a misbehaviour or something I have misunderstood about schemas. It's still a little complicated so please bear with me as I try to explain.

For the sake of the example, I have three schemas, Region, Coxel and Tie. As explained before, logically speaking I have only Region and Coxel, where Coxels have a self-referencing many-to-many relationship, like such:

erDiagram
    Region ||--o{ Coxel : has
    Coxel }o--o{ Coxel : "tied to"
Loading

But because in my use case each of the associations have a schema with additional attributes, the "hidden" associative table (coxelcoxels) is insufficient and I need to introduce a named schema which I call Tie (because it ties together two Coxels). So now we can represent the self-referencing many-to-many relationship in the manner it would actually be implemented in an RDBMS, like such"

erDiagram
    Region ||--o{ Coxel : has
    Region ||--o{ Tie : has
    Tie }o--|| Coxel : child
    Tie |o--|| Coxel : parent
Loading

If we then expand the diagram to show and track the actual database fields involved it becomes possible to describe the problem I am experiencing. Note that I've excluded many fields and associations used in my actual project since they don't impact on the problem but would obfuscate the issue.

erDiagram
    Region {
        id id PK
        string name
    } 
    Coxel {
        id region_id PK, FK
        id id PK
        text value
    }
    Tie {
        id region_id PK, FK "Tie region"
        id id PK
        string label
        id child_region_id FK "Child Coxel Region"
        id child_id FK
        id parent_region_id FK "Parent Coxel Region"
        id parent_id FK
    }
    Region ||--o{ Coxel : has
    Region ||--o{ Tie : has
    Tie }o--|| Coxel : child
    Tie |o--|| Coxel : parent

Loading

@MarthinL
Copy link

If we then expand the diagram to show and track the actual database fields involved it becomes possible to describe the problem I am experiencing. Note that I've excluded many fields and associations used in my actual project since they don't impact on the problem but would obfuscate the issue.

In code the three schemas look like this:

  # Region
  schema "regions" do
    field :name, :string

    timestamps()
  end


  # Coxel
  schema "coxels" do
    field :region_id, :id, primary_key: true
    field :value, :string

    belongs_to :region, Region,
               foreign_key: :region_id,
               references: :id,
               define_field: false

    has_many  :child_ties, Tie,
              foreign_key: [:parent_region_id, :parent_id],
              references: [:region_id, :id]

    has_many  :parent_ties, Tie,
              foreign_key: [:child_region_id, :child_id],
              references: [:region_id, :id]

    timestamps()
  end


  # Tie
  schema "ties" do
    field :label, :string
    field :region_id, :id, primary_key: true

    belongs_to :region, Region,
               foreign_key: :region_id,
               references: :id,
               define_field: false

    belongs_to :child, Coxel,
               foreign_key: [:child_region_id, :child_id],
               references: [:region_id, :id]

    belongs_to :child_region, Region,
               foreign_key: :child_region_id,
               references: :id,
               define_field: false

    belongs_to :parent, Coxel,
               foreign_key: [:parent_region_id, :parent_id],
               references: [:region_id, :id]

    belongs_to :parent_region, Region,
               foreign_key: :parent_region_id,
               references: :id,
               define_field: false

    timestamps()
  end

This all compiles and I am able to create regions and coxels that correctly references the regions.

If I manualy construct a Tie I can insert it and when I query Coxels again with preload([child_ties: :child]) it returns the correct data.

But I cannot get build_assoc to work. Granted, the test code I use is not directly the code from my main project that used to work, but a manually recreated version of it.

Given that where I run the test, cm2 and ct3 are two valid Coxel variables, I'm expecting

Ecto.build_assoc(cm2, :child_ties, %{label: "Main2Test3"}) 

or

Ecto.build_assoc(cm1, :child_ties, %{label: "Main2Test3", child_region_id: ct3.region_id, child_id: ct3.id})

or at the very least

Ecto.build_assoc(cm2, :child_ties, %{label: "Main2Test3", child_region_id: ct3.region_id, child_id: ct3.id, parent_region_id: cm1.region_id, parent_id: cm1.id}) 

to return a duely constructed Tie I can insert using Repo.insert(). But in all cases I get the error

** (KeyError) key [:child_region_id, :child_id] not found
    (ecto 3.10.1) lib/ecto/association.ex:873: Ecto.Association.Has.build/3

I constructed a separate test case without using composite primary keys which seems to work as expected but I lack the proficiency with Phoenix to be certain that the two cases are otherwise equivalent.

@MarthinL
Copy link

I've made some progresss isolating the error when doing build_assoc but can't tell what's really going wrong. Hopefully my feedback/findings would get you closer to an answer.

** (KeyError) key .... not found
(ecto 3.10.1) lib/ecto/association.ex:873: Ecto.Association.Has.build/3

I've set up four minimalistic projects to help isolate the error.

  • Project one uses the standard libraries and (obviously) no composite keys. The build_assoc statement succeeds.
  • Project two uses the modified libraries but no composite keys. The build_assoc statement succeeds.
  • Project three uses the modified libraries and the notation for composite keys in the association specification but with single item lists, essentially mimicking Project two above but with the modified syntax. The build_assoc statement fails with the error message
    ** (KeyError) key [:parent_id] not found
    (ecto 3.10.1) lib/ecto/association.ex:873: Ecto.Association.Has.build/3
  • Project four uses the modified libraries and region_id added to the composite keys. The build_assoc statement fails with the error message
    ** (KeyError) key [:parent_region_id, :parent_id] not found
    (ecto 3.10.1) lib/ecto/association.ex:873: Ecto.Association.Has.build/3

Although I don't have a minimal replication of that exact project to test with, I do have evidence that the build_assoc statement had been working as intended in the old version of the libraries modified for composite keys which I have modified in the manner documented in the thread above. I don't know enough about how association.ex and the build function is constructed to understand how the version that is in the code today achieves the same result my modification achieved but it clearly has a different outcome.

How can I be of more help? I'm committed to using composite keys and I wawnt to get my project into production so I don't lack motivation to help get the problem sorted. I just lack the context of how the ecto code goes about its business and I lack exposure and experience in open source project culture. Tell me what I can do to help and I will get it done.

@MarthinL
Copy link

MarthinL commented May 18, 2023

Although I don't have a minimal replication of that exact project to test with, I do have evidence that the build_assoc statement had been working as intended in the old version of the libraries modified for composite keys which I have modified in the manner documented in the thread above. I don't know enough about how association.ex and the build function is constructed to understand how the version that is in the code today achieves the same result my modification achieved but it clearly has a different outcome.

Update: I now have a project referencing the old version of the PR with my modified build function. I can confirm that the build_assoc call which fails with the current version works as expected with the old but modified code. It seems to be a logical conclusion that the reason it now fails has something to do with the alternative way you elected to address the issue whereby the build function did not cater for composite keys. The original issue I addressed with the change I made to the code was because the association specification in the schema didn't respond to the list syntax for own and foreign keys. It's baffling that the schema syntax now works but functions like build_assoc which I presume uses the constructs built from the schema definition now fails to match the relevant keys when it's a list.

@MarthinL
Copy link

Hey @soundmonster,

My environment isn't ideal for testing and contributing to the code-base so it's non-trivial for me to make and test changes to your code. I have done that (again) anyway, only to find that the change to the build function of Ecto.Association.Has I previously suggested is still what the build_assoc function requires to work again for either normal as well as composite keys.

How do I get that change into your code so it can make its way into the master branch when yours does? Must I create a pull request on your pull request, must I push my changes or can I just copy and paste the way I've rewritten the function hwre so you can apply the change?

@cspeper
Copy link

cspeper commented Sep 12, 2023

Hi @soundmonster @MarthinL. I appreciate your efforts so far to get this over the line. I've become interested in this PR as I have a production application in which we'd love to start making use of PG table partitioning and composite FKs seems like a must have to make that work. Is this PR still as close as it seems to getting merged or should we be exploring other medium term solutions?

@soundmonster
Copy link
Author

@cspeper Thank you for reaching out, and I appreciate your interest.

TBH, I would love to use it in prod for my projects as well, but please do not expect this to be on the main branch this year.

From my point of view, this PR needs:

  • a rebase, and
  • a proof-of-concept app that would showcase it, and validate that it actually works (some things are hard to test here)
  • probably another iteration driven by the feedback on the PoC.

I think I'll have some first results in October, but realistically I don't think the feature will be on hex.pm in 2023.

@fendent
Copy link

fendent commented Mar 1, 2024

If there is anything I can do to help get this over the line, happy to help!

@hickscorp
Copy link

hickscorp commented Nov 6, 2024

@soundmonster I am really interested in using this work... I'm dealing with a schema where records in most tables are identified with primary keys such as [id: UUID, version: String].
It's all fine in migrations:

add :version_here, :string
add :foo_id, references(:foos, type: :uuid, with: [version_here: :version_there]

But then has_one, has_many, belongs_to... Not fine at all.

Are there any plans to bring this to either a usable fork, or have it merged?

Alternativelly, did you find a reliable way to approach this problem (composite foreign keys) in production Ecto projects?

@MarthinL
Copy link

MarthinL commented Nov 6, 2024

Alternativelly, did you find a reliable way to approach this problem (composite foreign keys) in production Ecto projects?

I ultimately made the call not to use this in production. I had to use my own modification compensate for an omission from @soundmonster's code but the ecto and composite key changes was just too foreign for me to be comfortable with either putting my change into production or take on the task of rebasing the entire thing. Ecto and ecto_sql had changed quite a bit since the fork so it was never going to be a trivial task.

For my production app I switched to a completely different approach with regards to the solution I used to need composite keys in associations for. You might want to consider doing something similar.

Let me explain.

My composite key used to consist of a region identifier and a normal bigint identifier within the region, meaning I maintained a region table and the database could enfore referential integrity on the region_id field of my regionalised tables. I had it working with the PR code but it's a royal PITA to modify especially generated code with the additional field that needs to be set, passed, handled in page urls, defaulted and generally taken care of everywhere where Phoenix assumes a single id.

So what I did was to use an opaque primary key that presented as one binary value but consisted internally of several components. In my case I also discovered that I couldn't use all the bits of a bigint anyway so I was able to split a bigint into bits for the region and bits for the id. You're already looking at a UUID key for reasons of your own. UUID is in effect already such an opaque type comprising several internal components gerring generated differently.

My recommendation would be to look at implementing your own type similar to UUID (I'm sure I've seen examples and tutorials lurking around the web) which include additional bits for the version id you want to add. You could shave some bits off one oof the components if you wish to keep it the same size as UUID or you could simply make it bigger by as much as you need. It's already merely a binary at the database level. You'd then use custom functions in your code to compare these keys with or without regard for the version, to extract and update the version portion as required and whatever else you need done.

In hindsight I'm not really surprised that this PR never went mainstream. Once patched it does what it says on the box, but the rest of the Phoenix and LiveView eco-system where ecto normally get used in really is not geared towards composite keys. It's one primary key field or none at all (which legitimately happens in a few places). That's the idiom and ethos of Phoenix. When you venture outside of those boundaries especially for new applications and code, you're in for a world of hurt. The stated and proven reason why there's any support for composite keys in migrations etc. is to allow Ecto to be set up to read from legacy databases that were created with different design constraints. The assumption being that those tables would not really get actively involved in associations but would be donor tables from which the new schemas import legacy data into structures with proper associations. In the end the people who saw opportunity in having composite keys involved in active associations probably all came to the same painful conclusion and lost interest in the initiative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.