Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Entity Framework Core 7 Providers #8654

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

galvesribeiro
Copy link
Member

@galvesribeiro galvesribeiro commented Oct 3, 2023

This PR add Entity Framework 7 providers.

The goal is to provide the basic providers which are storage agnostic and rely only on Microsoft.EntityFrameworkCore like Microsoft.Orleans.XXXXX.EntityFrameworkCore.

We can provide database-specific packages like Microsoft.Orleans.XXXXX.EntityFrameworkCore.SqlServer, Microsoft.Orleans.XXXXX.EntityFrameworkCore.Postgresql, etc., which would depends on Microsoft.EntityFrameworkCore.SqlServer for example.

Also the developers can extend the underlying DbContext (i.e. add extra indexes, constraints, etc.) and/or provide custom migrations.

  • Clustering
  • Persistence
  • Reminders
  • Directory
  • Sql Server support
  • MySql support
  • PostgreSQL support
Microsoft Reviewers: Open in CodeFlow

@galvesribeiro galvesribeiro changed the title [WIP] EFCore Provider [WIP] Entity Framework Core 7 Provider Oct 3, 2023
@galvesribeiro galvesribeiro changed the title [WIP] Entity Framework Core 7 Provider [WIP] Entity Framework Core 7 Providers Oct 5, 2023
@galvesribeiro galvesribeiro force-pushed the efcore-providers branch 5 times, most recently from 74d392a to 719fe27 Compare October 6, 2023 16:53
@m3nax
Copy link
Contributor

m3nax commented Oct 24, 2023

Any ETA for this? I'm interested

@galvesribeiro
Copy link
Member Author

@m3nax trying to figure out the MySQL and PostgreSQL parts as they don't have a timestamp/etag primitive it will be kinda hard without having to lock the tables. The current ADO providers do a lot of aggressive locks which caused issues under high load which the providers here doesn't do.

The problem is to have a stable timestamp/etag mechanism outside SQL Server.

Also, the team is busy with the .Net 8 release so I would say this should be reviewed only after that.

@koenbeuk
Copy link
Contributor

koenbeuk commented Dec 5, 2023

I've been implementing grains with EF backed storage for a while now and my strategy is fundamentally different. Given that a DbContext should be short-lived, I essentially use this pattern:

  1. Register a DbContext which has all of the state types configured as entities. Each root entity needs to have a primary key that maps to the GrainId (I use Grain so I only deal with 1 state).
  2. Upon reading state, I fetch the entity from a new DbContext (including any navigated entities). I then make a deep clone of this entity such that I have an original and a copy.
  3. Upon writing state, I attach the original to a new DbContext and then apply the copy, essentially letting EF do the diff for me. This causes writes where nothing has changed to be no-ops and otherwise only updates what needs updating.

The benefits here are:

  1. Writes become much cheaper, in particular with grains who's state is large (as before I would deliberately defer or omit commits as serializing the entire object graph to json and storing that would be too expensive).
  2. I can migrate grain states through EF migrations.

I would suggest to consider an implementation similar to the one described above.

@AndrewBabbitt97
Copy link

I've been implementing grains with EF backed storage for a while now and my strategy is fundamentally different. Given that a DbContext should be short-lived, I essentially use this pattern:

  1. Register a DbContext which has all of the state types configured as entities. Each root entity needs to have a primary key that maps to the GrainId (I use Grain so I only deal with 1 state).
  2. Upon reading state, I fetch the entity from a new DbContext (including any navigated entities). I then make a deep clone of this entity such that I have an original and a copy.
  3. Upon writing state, I attach the original to a new DbContext and then apply the copy, essentially letting EF do the diff for me. This causes writes where nothing has changed to be no-ops and otherwise only updates what needs updating.

The benefits here are:

  1. Writes become much cheaper, in particular with grains who's state is large (as before I would deliberately defer or omit commits as serializing the entire object graph to json and storing that would be too expensive).
  2. I can migrate grain states through EF migrations.

I would suggest to consider an implementation similar to the one described above.

When I saw this PR I figured this is what was going to be implemented. I'm currently evaluating Orleans for an upcoming project and this is the type of route I'm wanting to go. This type of implementation makes entities much more flexible.

A hard set schema for storage like this isn't great: 20231005033501_InitialPersistenceSchema.cs

I'd much rather see something similar to what ASP.NET Core Identity uses with its IdentityUser class: IdentityUser.cs. You can inherit from it on your own data model to add additional fields as needed.

Make it so users can implement their own data model entities that way they can benefit from the full power of Entity Framework, add indexes to their own columns, use json columns, and use linq to query data from the context when needed in places like stateless workers.

@galvesribeiro
Copy link
Member Author

@koenbeuk / @AndrewBabbitt97 thank you both for the feedback.

The approach @koenbeuk suggested is more or less what I had before in production in a private project. Yes, the benefits you mention are clear. However, the developer is then forced to have to deal with EF directly by having to define the DbContext, register all state types, deal with migrations and what not. This introduces some level of friction which we tried to avoid in this implementation.

This implementation (changes coming soon) aim to behave the same way as the Azure Tables, Cosmos and the other providers that read/write the whole state at once. The grain state is supposed to be owned by the grain itself. Not to be read/write elsewhere. That can cause serious problems with consistency which will potentially be hard to debug/fix.

That being said, I do agree that having the power to allow the developer to customize the DbContext is a good thing if they are willing to take the responsibility of having to deal with it.

What about this:

  1. Have an "easy mode" which is what is currently on the PR. It essentially serialize any state class on the respectively JSON representation and store in a pre-defined table with that DbContext. This is precisely the behavior of the current ADO.Net provider out-of-the-box.
  2. Have the "advanced mode" which allow the developer to register custom DbContext type which will be used by the provider. In this model the only requirement is that the state type have to have some pre-defined properties like the grain type, grain id, eTag, etc.

Both "modes" can be registered on a per-provider basis since the storage provider is a named/keyed registration on the system so you can mix and match.

Also, this would only be something implemented on the storage provider. There is no point of having that complexity for the other providers.

What do you guys think?

@AndrewBabbitt97
Copy link

AndrewBabbitt97 commented Dec 6, 2023

@koenbeuk / @AndrewBabbitt97 thank you both for the feedback.

The approach @koenbeuk suggested is more or less what I had before in production in a private project. Yes, the benefits you mention are clear. However, the developer is then forced to have to deal with EF directly by having to define the DbContext, register all state types, deal with migrations and what not. This introduces some level of friction which we tried to avoid in this implementation.

This implementation (changes coming soon) aim to behave the same way as the Azure Tables, Cosmos and the other providers that read/write the whole state at once. The grain state is supposed to be owned by the grain itself. Not to be read/write elsewhere. That can cause serious problems with consistency which will potentially be hard to debug/fix.

That being said, I do agree that having the power to allow the developer to customize the DbContext is a good thing if they are willing to take the responsibility of having to deal with it.

I'm aware of the caveats with consistency if say someone goes and updates records of grains that have activations and how this could cause data inconsistencies. Keep in mind the big reason and value I see out of a more advanced situations is being able to use EF for read type operations and grain relationships / indexing for search reasons. While there has been attempts at looking at stuff like this (Like Orleans.Indexing), it seems like projects and things like this haven't really been looked at much over the past few years and have mostly been research / pet projects.

The recommended approach to doing things like searching all grains has been to search the underlying data store from what I can tell, and this makes sense. An EF provider opens up opportunities to do a lot more for state management and indexing of fields in grain state along with things like migrations support.

There are footguns that come with this approach plus additional work of managing things like migrations, documenting risks and situations that users could end up in should be considered.

What about this:

  1. Have an "easy mode" which is what is currently on the PR. It essentially serialize any state class on the respectively JSON representation and store in a pre-defined table with that DbContext. This is precisely the behavior of the current ADO.Net provider out-of-the-box.
  2. Have the "advanced mode" which allow the developer to register custom DbContext type which will be used by the provider. In this model the only requirement is that the state type have to have some pre-defined properties like the grain type, grain id, eTag, etc.

Both "modes" can be registered on a per-provider basis since the storage provider is a named/keyed registration on the system so you can mix and match.

Also, this would only be something implemented on the storage provider. There is no point of having that complexity for the other providers.

What do you guys think?

Huge fan of having two modes for this, I think its very appropriate.

I would just like your thoughts on for the "advanced mode" if rather than using a "grain type" field on an entity, you instead associate types to a specific DBSet<TEntity>? I'm not entirely sure how this would look like from a configuration perspective at the moment, but that's at least how I initially envisioned it working.

No other real concerns or thoughts on it beyond that, and I could see myself using both the easy and expert modes in different situations 😊

@galvesribeiro
Copy link
Member Author

galvesribeiro commented Dec 6, 2023

I'm aware of the caveats with consistency if say someone goes and updates records of grains that have activations and how this could cause data inconsistencies. Keep in mind the big reason and value I see out of a more advanced situations is being able to use EF for read type operations and grain relationships / indexing for search reasons. While there has been attempts at looking at stuff like this (Like Orleans.Indexing), it seems like projects and things like this haven't really been looked at much over the past few years and have mostly been research / pet projects.

Yeah, indexing was a good experiment within MSR that proved it is possible to do it with Orleans. I would hope at some point we get back to it and implement it as a first-class citizen.

If properly implemented it would avoid the risks of having direct access to the underlying storage but again, this is a discussion for another forum and yes I understand it would help a lot different scenarios in regards to querying for grains leveraging the storage indexing capabilities.

I would just like your thoughts on for the "advanced mode" if rather than using a "grain type" field on an entity, you instead associate types to a specific DBSet? I'm not entirely sure how this would look like from a configuration perspective at the moment, but that's at least how I initially envisioned it working.

This approach is easy to implement if we have only one underlying database and only one context type. Like for example in a private project which is what I had back in the day. So, the tricky part is how to allow the user-provided DbContext type to be used with the correct storage provider registration for the correct grain type in a generic fashion as this should work for any arbitrary grain/state. It would be insane to have 1 DbContext per State type and use of reflection to figure out the correct DbSet<T> property is also not an option as It would hurt performance.

I will have to experiment a bit to find out a solution.

@ReubenBond
Copy link
Member

Have the "advanced mode" which allow the developer to register custom DbContext type which will be used by the provider. In this model the only requirement is that the state type have to have some pre-defined properties like the grain type, grain id, eTag, etc.

I especially like EF Core for this. Even if the EF Core integration was not a typical IGrainStorage implementation, and was instead something else, this could be the best approach. What are the biggest challenges here?

@galvesribeiro
Copy link
Member Author

No challenge actually.

Just trying to find the better way to map and use the custom DbContext properly on the provider so we can support whatever structs the developer define without having to rely on reflection to figure out the correct DbSet<TGrainState> property.

Also trying to define a simple abstraction which enforce the GrainType/Id/eTag so the provider can properly work.

@veikkoeeva
Copy link
Contributor

veikkoeeva commented Dec 9, 2023

The current ADO providers do a lot of aggressive locks which caused issues under high load which the providers here doesn't do.

They can do whatever. Aggressive locking is needed for the case grain does not exist and if there are two grains creating a state (if I recall correctly), but lighter weight could be used and is used when the lighter route is taken. Just to point this out.

@AndrewBabbitt97
Copy link

Just trying to find the better way to map and use the custom DbContext properly on the provider so we can support whatever structs the developer define without having to rely on reflection to figure out the correct DbSet<TGrainState> property.

Not sure how much you have been thinking on this given the holidays (Happy New Year!).

Thinking on this however, if the goal is to avoid reflection perhaps a source code generator would be a better approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants