-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ecto.Schema resolution proposal #4
Comments
Thanks for this -- very thorough, great work! I love that you dug into field selection as well. @benwilson512 and I will give this a better look-over soon. My guess is that we'll push you to PR, with some feedback. |
As I already wrote the code in a very generic way for my own needs, I thought it my help this project as well. Even if it just brings some ideas. There a certainly a few concerns which might be relevant thought.
def resolve_artist_albums(artist, _args, info) do
# this:
batch_query(Album, artist, info, &order_by(&1, [desc: :release_date]))
# equals that:
batch_query(Album, {:artist_id, artist.id}, info, &order_by(&1, [desc: :release_date]))
end I'm note sure how 1. is relevant. Maybe it's ok to use the same defmodule MyApp.GraphQL.Schema.Types do
use Absinthe.Ecto.Schema.Notation
alias MyApp.Ecto.Schema.{Artist, Album, Track}
# create an object from the given Ecto.Schema
object :artist, ecto_schema: Artist
# same but exporting only specific fields
object :artist, ecto_schema: Artist, only: [:id, :name, :albums]
# same but using except instead of only
object :artist, ecto_schema: Artist, except: [:tracks]
# same but allows to define specific resolvers
object :artist, ecto_schema: Artist, except: [:tracks] do
# defines resolver for :albums
field :albums, do: resolve &resolve_artist_albums/3
# other fields are resolved automatically like in the above examples
end
end |
This is a seriously impressive amount of work. There's a lot going on here and I'd like the chance to discuss it with you in a slightly more flexible medium. Do you have ScreenHero or Skype? You can find me on SH with my github handle [at] gmail [dot] com, and skype with just the same name as my github handle. There's some philosophical differences between this approach and the one I had been targeting, but this is a ton of valuable capability, so I want to see where we can go with this. |
ya great work @redrabbit ! Projecting only selected fields is great but it also has some challenges. There are 2 use cases where we will probably need something to load some fields even though they are not directly required by the graphql query:
@benwilson512 did you have a chance to talk with @redrabbit already ? |
Both above points could probably be easily solved using the new meta facility. |
I had the pleasure to have a very interesting discussion with @benwilson512 a few days ago. I think the proposed code is working quiet well for what it is meant to do, mapping GraphQL queries into Ecto queries if their schemas match exactly. The problem is that currently, it is a all-or-nothing solution. If your GraphQL schema deviates from it For example, let's say your @tlvenn, adding def build_query(schema, info, transform \\ nil) do
# Resolves fields using info,
# basically, a shortcut for:
# fields = info.definition.fields
fields = resolve_fields(info)
# Resolves select statement using Ecto.Schema and fields,
# this is the :select statement from Ecto.Query.
# It also fetches fields required foreign keys
# in order to resolve associations.
select = resolve_fields(schema, fields)
# Resolves associations using Ecto.Schema and fields,
# this is a bit more complicated than the :select statement above.
# It handles :has_one, :has_many, :belongs_to and :many_to_many.
assocs = resolve_assocs(schema, fields)
# Creates the Ecto.Query using the given Ecto.Schema
# and the above :select and :assoc statements.
# The optional transform parameter is a function
# used to transform/compose the final query.
new_query(schema, select, assocs, transform)
end We could simply pass an extra I really like how Ecto is really explicit in contrast to other ORMs (say Django or ActiveRecord) which do a lot of magic behind the scene. Having I've posted the code in order to bring some ideas to the table. Maybe |
Ya that's pretty much my point 2 and can easily be solved by adding meta at the field level listing the Ecto schema fields needed to properly resolve this graphql field. Then as you pointed out, it's fairly trivial to update the Thanks again for your contribution @redrabbit , very much appreciated. |
@redrabbit This is great! I've been using it in a project where my Absinthe schema almost perfectly matches my Ecto schema. Better than writing resolution batchers. And 1 query with 4 joins beats 5 queries. That said I think I've found a bug. Using this query: query {
visits {
id
patient {
id
clients {
id
}
}
staff {
id
}
}
} I get this error:
It looks like |
What we have here at the moment are two distinct ways of trying to efficiently load database records. Whole Tree Up FrontThis is the approach of the linked code. When resolving a top level field some form of introspection happens on the remaining tree producing a Big Fancy Query (BFQ). This query will effectively try to load the rest of the sub tree from the database in a single query. Resolver functions further down the tree don't really need to do anything since the data is already loaded. Main Advantage: Load multiple tables worth of data in a single database query Lazy BatchingThis is the current approach of this library. Nothing is done up front. Resolution functions that map to a database association register the desire to load that association when run, and then groups of such associations are executed altogether. Major Advantage: Extremely flexible. A new scenarioHere's where things get interesting. My work with Apollo / Relay transport batching along with the recent work on subscriptions has put me in a situation where in each case I want to execute a SET of documents. That is instead of simply having:
I have:
This happens with transport batching when the front end groups together many graphql queries on the front end and then sends them back in one HTTP request. It happens in subscriptions because we track groups of documents that belong to particular subscription fields. Batching here turns out to be super useful, because not only can you batch the No matter how many documents are there, you'll have just one database query for the Consider a document set:
For some posts we load the creator, for some we do not. If we tree merge we're going to load the creator for ALL posts whether or not the document actually asked us to. Obviously this will get filtered out prior to going to the client but it's database work we don't need to do and data loaded into memory that serves no purpose. Things to think about:
|
@benwilson512 I always though that it would be best to leverage both approaches. There are scenarios where the BFQ will win and others where batching queries is more efficient. The resolver from the batching strategy already check if the data is there already and if it's the case, simply return the data directly. The missing piece would be a piece of middleware that could flag each query in the document set so the BFQ do not load any association at all for those flagged query and defer the work to the batching resolver. The other scenario when it would be useful to fallback to the batching strategy is once we deal with the |
I was thinking that for that piece of middleware we could do something as simple as to reduce the document set by grouping queries by the root query and then counting them and computing the max or average nesting level. Then if the number of queries is bigger than the nesting level, it's a good indication that batching will probably yield better query time than BFQ otherwise, it probably mean that BFQ will be more performant. Now as you pointed out in Slack, this focus solely on the response time for a given query and not on overall how your workload / concurrency impacts your system where depending on what you do, a given approach might be overall more performant even though on a query basis it might be slower. |
One caveat that is not necessarily obvious with the whole tree approach because Ecto hides it, is while it minimises round trip to the database, in case on traversing one to many associations, it will increase substantially the size of the payload that has to be downloaded from the DB. In the example above, in case of If a given artist has 10 albums each with 20 tracks, the payload with have:
|
I'm really impressed and following this development closely. One issue I have with the "all or nothing" concept is that I don't immediately see how this would fit together with other real-world requirements. What I mean is it looks like it works great if you just want to expose your Ecto schemas as a GraphQL graph, but how would one implement something per-resolver permissions on top of this? Different objects in my API have different permissions logic. It seems to me that a batching solution could be used to help solve the issue of permitting resources based on their associations, in much the same way that it solves N+1 queries, but I haven't come up with the details of how yet. Any thoughts on this? @redrabbit, you say you use this for your own use case. I'm curious to know if your use case relies on permissions/authorization per object? |
Hi @xtagon, just because the data is already resolved, it does not mean that some piece of middleware cannot proxy the data and depending of some custom logic decide to let the data go through or not. One tricky area with the "all or nothing" current approach is that it will fetch only the fields required by the Graphql query while your permission logic might need the value of other fields as well. This can be solved using the meta facility just like how you would need to address any issue when your Graphql object does not match 1 for 1 with your Ecto schema. The batching approach is definitely a lot simpler and generally more flexible in that regard. |
Thanks @tlvenn. I am still a beginner to all of this but when I think about it there are basically two ways to authorize resources:
|
I hope it's ok to comment as an newcomer to this conversation. It looks like this issue/conversation has gone a little stale. Is the problem right now that there is no consensus on whole tree vs batching so we can't move forward with the code? My understanding is that @redrabbit has a basic implementation that is whole-tree, but that we'd probably prefer batching? If that's the case, should we shoot for a batching implementation first, and then maybe work toward getting in the whole-tree as an option later? |
I've been reading this conversation and the documentation on the Absinthe homepage on Ecto Best Practices. After looking at the README.md for this repo, it looks like there might be some ecto schema directed batching implemented and the docs might just be out of date? Is that correct? |
The Ecto Best Practices page is indeed out of date, as it doesn't reflect the existence of this library. I'll make sure to update it. Batching is something that we the core team took on because it required changes to how resolution operates, which wasn't going to be something achievable by third parties. As far as I see though nothing should be blocking community members implementing this full document SQL query generation, other than perhaps education on the tools Absinthe provides. You can tag values on fields with https://hexdocs.pm/absinthe/Absinthe.Schema.Notation.html#meta/2 you can use https://hexdocs.pm/absinthe/Absinthe.Pipeline.html and the It will require getting familiar with Absinthe's internal representation of a document called a Blueprint to build the preloading phase but this is true for most any contribution. |
This is a long way of saying: I like this proposal, I think it has value, and if people have specific questions related to how they might make progress on it I'm happy to answer. However, it just isn't on the core team's TODO list for the time being. |
@benwilson512 I just wanted to check my understanding here. The new |
Correct. After some discussions with Chris McCord about Phoenix 1.3 conventions there will also soon be a |
That's cool. I assumed from the name of this repo "absinthe_ecto", that it was just a dependency of absinthe that takes care of the Ecto code. I skimmed over the first line of the README: "Provides some helper functions for easy batching of Ecto assocations." I'll have to try it out ASAP. Long-term, is there a plan to separate the rest of the Ecto-related code in this repo? Or move this code into the main repo? Or is this going to stay its own thing? |
I'm not sure what this means. |
I also wanted to add something to this discussion. Anyone considering the big-fat-query/whole-tree needs might want to consider some inefficiencies with how sql joins work returning back data. You are going to save in terms of latency, but take a hit in bandwidth and data processing. Note this comment here: elixir-ecto/ecto#659 (comment) |
To perhaps clarify what features belong in what project: Absinthe has resolution middleware. It includes a couple of different middleware to serve as a common base of functionality. One of those is the Batching middleware, which can be used to batch any field resolution functionality, there is nothing ecto specific about it. Absinthe.Ecto exists as a package to provide specific ecto related functionality that anyone using Absinthe as well as Ecto probably wants, and might be difficult to do on one's own. There is no intention for this library to get rolled up into Absinthe. Absinthe has no reason to know about any particular backend store. |
Sorry, yeah. I was confused. I got the impression that there was Ecto-aware
code inside the Absinthe core repo based on comments and other things, but
there isn't. That's all done manually in resolvers. I should probably have
spent more time with Absinthe before commenting on here. Just got working
through things yesterday. I should like to help with the guides and
documentation to help make things clearer, once I have it all straight in
my head. I'm glad that is decoupled like it is.
|
No problem! Further guide help is always appreciated! |
I wrote an extended version of the defp default_callback(result) do
{:ok, result}
end
def add_preloads_to_ultra_batch(structs, nil, field, caller) do
Repo.preload(structs, [field], caller: caller)
end
def add_preloads_to_ultra_batch(structs, preloads, field, caller) do
Repo.preload(structs, [{field, preloads}], caller: caller)
end
def perform_ultra_batch({owner, owner_key, field, preloads, caller}, ids) do
unique_ids = ids |> MapSet.new |> MapSet.to_list
unique_ids
|> Enum.map(&Map.put(struct(owner), owner_key, &1))
|> add_preloads_to_ultra_batch(preloads, field, caller)
|> Enum.map(&{Map.get(&1, owner_key), Map.get(&1, field)})
|> Map.new
end
# this is a slightly different version than ecto_batch, this allows to specify
# extra preloads for the association to be loaded efficiently
def ultra_batch(%schema{} = parent, association, opts \\ []) do
assoc = schema.__schema__(:association, association)
callback = Keyword.get(opts, :callback) || &default_callback/1
preloads = Keyword.get(opts, :preloads)
%{owner: owner,
owner_key: owner_key,
field: field} = assoc
id = Map.fetch!(parent, owner_key)
meta = {owner, owner_key, field, preloads, self()}
batch({__MODULE__, :perform_ultra_batch, meta}, id, fn results ->
results
|> Map.get(id)
|> callback.()
end)
end This allows you to do something like this: @visible_content_attachments from(a in Schema.Content.Attachment, where: a.deleted == false, preload: ^Schema.Content.Attachment.authorization_preloads())
field :supporting_content_attachments, list_of(:content_attachment), resolve: fn parent, _, _ ->
ultra_batch(
parent,
:content_attachments,
preloads: @visible_content_attachments,
callback: fn content_attachments ->
{:ok, remove_primary_content_attachment(content_attachments, parent.primary_content_attachment_id)}
end)
end |
Does dataloader.ecto do whole tree loads or do we need to do solution in absinthe-graphql/dataloader#127? |
Hi, I've been using
Absinthe
andAbsinthe.Relay
intensively for a private project and wrote a few helper functions in order to resolve GraphQL queries into Ecto queries.I've put the code here: https://gist.github.com/redrabbit/be3528a4e4479886acbe648a693e65c0
I don't know if
Absinthe.Ecto
is going to be implemented in a similar manner. If there is interest, I can write a PR and add some tests to it.Basically, it uses
Ecto.Schema
andAbsinthe.Resolution
to resolve pretty much every kind of GraphQL query I'd came across so far.There are two distinct functions,
build_query/3
andbatch_query/5
.By default,
build_query/3
automatically resolves:join
,:preload
and:select
clauses. It tries to be as performant as possible and selects only required fields.Setup
Let's say we have schemas representing a music library:
And the following matching GraphQL types:
Usage
Following GraphQL query:
Returns a single
Ecto.Query
:It handles
:belongs_to
,:has_one
,:has_many
and:many_to_many
associations and will try to join and preload as much as possible. It also support resolving cyclic graphs.Writing resolvers is straight forward:
In order to support batching and avoid N+1 queries,
batch_query/5
provides similar functionalities and resolve the given associations automatically. For example:In the above
object
, the:albums
field is resolved automatically within the query using a:join
. The:tracks
field will require a 2nd query (usingwhere: [artist_id: id]
orwhere: a1.artist_id in ^ids
).Resulting in executing two SQL queries for the following GraphQL:
Customization
You can customize your queries using the optional
transform
parameter both functions provide.For example, to fetch albums sorted by
:release_date
and album tracks sorted by:track_nr
, one can write two new resolver functions:And update the schema types like this:
The text was updated successfully, but these errors were encountered: