-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter migrations #438
base: main
Are you sure you want to change the base?
Filter migrations #438
Conversation
Conflicts: main.go pkg/dbmate/db.go
specify a range of migrations using ... syntax also support positive and negative offsets
@dossy please review |
Before reviewing the code itself, I'd like to understand the requirement driving the change.
My opinion is that those users are trying to use a database migration tool as a means to execute ad-hoc SQL against their database schema, and that's not what a schema migration tool is meant to be used for. As schema migrations are expressing transitions in schema state, they are context-dependent, and we shouldn't create the expectation that executing them in an arbitrary order should always work. My personal workflow looks like this:
Following a different process can lead to migrations that may produce different resulting schema states, and that violates the primary use case for a database migration tool. |
Great!
We also shouldn't create an expectation that the time-based ordering implies a hard dependency. dbmate provides no way to track the dependencies of schema changes, and different schema changes with no dependency on each other are often being committed at the same time. There is prior art here: sqitch has been around a long time and tracks dependencies. dbmate chose not to, so the onus is on the user to figure out how to apply them apropriately. You are going through the development workflow- I like your workflow and I think that those that are in a position to follow it should. The main issue I see with it is that after there are hundreds of migrations one needs to ensure they are compressed in something like a schema.sql. But the existing schema.sql is inadequate. My use case for selecting migrations is less for development but more for production. I might have a couple different migrations that are performing a schema change, adding an index, or deleting or update some data. Maybe the delete/update is an ad-hoc SQL query according to you, but I need to ensure these are performed in a controlled and reviewed manner and potentially rolled out to a staging environment and applied locally as well, and a migration system works quite well for this. Also note that in the discussion about schema.sql being inadequate we discussed the concept of "schema data" that is necessary for the application to function properly- and this data needs alterations (DML) after the schema (DDL) is in place. In a zero-downtime setup none of the migrations should be required to be ran immediately before a deploy and create downtime until applied. Some of the migrations may need to be ran before the new deployment. Some may be able to be ran after the new deployment (for example adding indexes). I may want to wait for a lower traffic period before adding indexes, or for applying a schema change that is blocking. The only way I have figured out how to accomplish these workflows with the existing dbmate is to delete/move migrations that I don't want applied. |
Actually, that's exactly the promise and guarantee that dbmate makes to you: that it will apply migrations in a time-based ordering, if you use a numerically sortable string representation of time as the version numbering scheme. This is a first-class feature of dbmate ("Timestamp-versioned migration files"). It's what sets it apart from some of the other schema migration tools available. It's why I chose dbmate over sqitch, sql-migrate, laravel and even prisma, for schema migrations for my projects. I would really prefer to not turn dbmate into sqitch. If you want sqitch's features, why not use sqitch, instead of using dbmate and wishing it were sqitch?
Why? From a freshly created development database, applying literal hundreds of migrations takes tens of seconds. In higher environments like staging or production that already contain data, you really should not be applying more than 1 or 2 new migrations at a time, if you Release Early, Release Often.
Here, I agree with you: in my opinion, dbmate really isn't the right tool for schema management for applications that require zero-downtime schema changes. This is part of the motivation for #424, being able to specify multiple I think extending dbmate to be capable of applying schema changes with the full functionality of
Yes, having multiple migrations directories was largely how I solved the problem, which is why #424 was so important to me. I wouldn't consider it unreasonable to have a migrations folder structure that might look something like:
And, you can selectively apply the migrations from one or more of those directories, as needed. In development, you can let dbmate just apply all of them in one go, but when deploying to production, perhaps you only use dbmate to apply the migrations in the folders inside Either way, I agree, dbmate isn't well-suited for managing production database schemas when the underlying DBMS doesn't have great support for applying online DDL operations. Trying to enhancing dbmate to compensate for the deficiency of the underlying DBMS feels like massive scope creep for dbmate to me, and might not even be possible to fully achieve, anyway. That's why I chose to use different tooling to solve that problem, while still being able to use dbmate for the non-production environments. |
One kind of potentially wild idea I had was implementing support for migrations to contain either SQL or shell scripts, and having dbmate execute the shell script in sequence based on the version numbering. For example, dbmate would recognize that a migration file named As I haven't personally needed this much flexibility -- yet -- I haven't been motivated to implement it, but it was something I seriously considered when solving some of my own needs. Something to evaluate further, perhaps. |
Applying migrations in a time-based ordering is just that. It does not mean that migration #5 has a hard dependency on migration #4. It just means that dbmate will apply #4 before #5. Where the actual dependencies exist is only known to the operator of the migrations.
This is the way most schema migration tools seem to work today. Some might use an integer, but a timestamp is just an integer with less coordination (which can be a good thing). I don't think dbmate is set apart here- it is doing the simplest thing possible (for better and worse)- and what it is doing is very common now.
Why would we want to wait tens of seconds? Developers may not run this workflow a lot, but CI does, and that is getting ran constantly.
So this opened with a lot of arguments against selecting migrations, but it is actually already known that migrations need to be selected.
This is where the conversation is getting silly. The need has been acknowledged. I am a Postgres user and I can't have this feature because I am supposed to use a MySQL tool? Postgres has some migrations that can be done in a non-blocking way (e.g |
It's also worth noting that TiDB is MySQL compatible and would work with dbmate and that almost all migrations done with TiDB would be online. |
With respect to extensibility, exec is taking total control over some thing, which limits extensibility. outputting the migration content as JSON would be most extensible. Simpler in this case might be a switch to output just the up or down. |
It's worth noting how orthogonal this feature is to all other workflows. For example the suggestion for working with other tools. If I were to use a different tool to apply a migration in production, I would want to be able to use the -m flag to select a single migration to give to the other tool. Along with a |
As mentioned in the PR description, I think there is a pretty clear request from users for at least the following features:
Agree with @dossy we don't want to turn dbmate into something complex by tracking dependencies etc, other tools do that. Dbmate likes to keep things simple. However, I do agree the above 3 features would be nice to support. I originally liked the range syntax proposed, but I thought about it some more and I think it's a bit weird for the rollback case. I guess I would have to do The range syntax would also get a bit weird if we added support for period character in migration versions, e.g. to support #371 Here's another idea (open to bike shedding the flag names): dbmate migrate # existing: migrate to latest version
dbmate rollback # existing: rollback one version
dbmate migrate --only 0003 # apply only this version
dbmate rollback --only 0003 # rollback only this version
dbmate migrate --to 0003 # migrate up to this version (inclusive)
dbmate rollback --to 0003 # roll back to this version (exclusive, i.e. the specified version will become the current applied version)
dbmate rollback --all # special case to rollback all migrations
dbmate migrate --all # maybe we should support this too for consistency (it wouldn't do anything since "all" is the default)
# support these flags for `dbmate up` and `dbmate down` too I think this solves all 3 use cases while keeping a fairly simple CLI. Anything I'm missing? |
I guess there is a 4th request, coming from #256, which is to migrate or rollback a certain number of migrations, without regard for migration versions (e.g. "rollback 2 migrations"). I'm not sure what the expected behavior would be if not all available migrations were applied. We could maybe support it with an additional flag: dbmate migrate --limit 2 # apply two migrations
dbmate rollback --limit 2 # rollback two migrations But I don't know if this is a good idea. What would happen in the following case when I run migrate or rollback with
I think maybe we should leave this use case out for now, and see if people can survive by specifying the target version to migrate/roll back to. |
The way I wrote it, this feature is for migration filtering. It filters regardless of whether the migration is applied or unapplied. It is important to operate that way. It allows you to use it with As implemented in this PR it would use: This single flag I started calling "migration filtering" (I was considering changing the flag name to be |
There are 2 concerns here. One is whether there syntax is intuitive. At first it doesn't seem to match the thinking of down. But I went for uniformity with up and down. In this way the same filter can be used with the The second concern would be inclusive versus non-inclusive range. I don't think of this is a big problem because you can always start with |
|
tern has a single migrate command. This solves the issues we are talking about with the difference between up/down. I do think though that there is an issue around using this approach when there are unapplied migrations. In this scenario:
with a bi-directional migrate command, One might think that up/down only commands or flags would be the obvious solution, but note that tern supports migrating down and re-running migrations in a single command with I do think it is reasonable to split these up into separate commands (migrate, rollback, remigrate) and use a |
Merging this does not preclude having I can change the flag to |
I know this is only my opinion, but I feel strongly that any schema migration tool that allows this state to occur is either broken-by-design or is being improperly used. Migrations are a linear journal of operations that mutate the system from one state to another. Schema versions must be strictly monotonically increasing. The primary use case of a schema migration tool is that it makes the hard guarantee that if migrations are applied in order, that the system will resolve to the same state each time. A tool which is used, or otherwise allows for, the application of migrations in an arbitrary order, cannot make the guarantee that the resulting state will be identical. I feel it's a mistake to implement or introduce features that makes it easier for users to misuse it, either intentionally or worse, accidentally. I feel it's a mistake to implement or introduce features that encourage users to misuse it. The tool's design should encourage proper practices, discourage misuse, and prevent abuse, as much as reasonably possible. If others feel strongly opposed to my views, I'd like to hear from anyone who cares to share their reasoning. Ultimately, a good tool should also meet its users needs and I'm only one user out of many, but I hope sharing my views can help explain what I'm looking to get out of dbmate when I choose it. |
@dossy fundamentally this tool is not designed the way you want it to be designed. You can use it the way you want, and I can use it the way I want. I am using it in a team scenario. People are making branches and making changes. When a branch is started, a prefix is chosen for a migration. I could require that the prefix be updated. But still at some point there can be 2 pull requests with migrations and I would have to merge one and then update the prefix on the other. Even still though, it's possible at deployment time to perform a point release with the newer migration first. Now I would need to merge a new pull request to rename an existing migration. Then maybe we decide to rollback the point release due to some issue. Now I need to rollback that PR to change the prefix. In the meantime the local developers need to reset their schema. Obviously I am going from common occurrences to less common. But the point is that any of this is a bunch of make work to satisfy an ideal that you have. Applying the migrations as is in any order works just fine with transactional migrations in Postgres. If a migration fails, it gets rolled back. Obviously, for data migrations or non-transactional migrations there would be more concerns here. However, in practice I haven't once had an issue of migration dependency. Whereas if we followed your philosophy we would be doing prefix renaming on a weekly basis. If you are worried about migration safety, the biggest threat to safety would be if it was unclear to users whether or not migrations are going to get rolled back! |
I don't understand how that addresses the key issue I'm describing. You're saying that your team regularly operates in an environment where newly created migrations are introduced between previously applied migrations? You're saying that your team regularly operates in an environment where the order in which migrations are defined differs from the order that they are applied in each database across your platform? How do transactional migrations make this work? If migration 1 is "create table foo" and that is then applied, and then a new migration 2 is introduced that is "alter table foo add column bar" and then this is applied, but then someone introduces a new migration 1.5 that is "drop table foo" ... and you apply that out of order in production, which "works" but anyone who plays all the migrations from empty initial state in order (1, 1.5, 2) will end up with an error thrown when migration 2 is applied after 1.5, because the table being referred to no longer exists. I feel bad for any team who regularly operates this way. I cannot see how this is tenable or doesn't result in constant frustration. How do transactional migrations solve this problem? |
transactional migrations solve the problem to the extent that when there is a migration failure things won't be left in an invalid state. So it's simple at that point to fix things if there were a problem running a migration for any reason, including if it were accidentally run out of order. With that being said, no one on my team has eever had an issue with migrations being ran out of order. The reason is that when we do new work on branches with migrations they never depend on each other. By the time there is a branch with a migration any migrations it depends on have already been applied. The only time we have unapplied dependencies are on the same branch.
Although this is couched in your own imagination, this statement represents the tenor of your comments here and they are really out of line with being a good maintainer. It's the job of maintainer to focus on understanding the users rather than lashing out at them with emotional statements. I appreciate being asked to explain my workflow, but I have done it multiple times now in various ways and at this point it doesn't seem that you are actually curious about my workflows but instead focused on figuring out how to declare it as wrong. |
I am trying really hard to understand your workflow and, as you've described it, how you make it work. The key value proposition of a schema migration tool is acknowledging that having developers apply ad-hoc schema changes in arbitary order leads to problems such as version drift between environments and makes testability difficult, as you don't have a consistently repeatable process. I'm not interested in declaring what you do as being wrong. I'm still struggling to see how what you've described can be done without creating avoidable problems. And I'm geniunely puzzled and trying to figure out the right question to ask to get an answer that helps me understand it. If I had already decided what you're doing is wrong, I wouldn't bother continuing to ask questions for which I already knew the answer. But, I am clearly missing something here, so I continue to ask questions as best I can. But, you are right: I am not a good maintainer. I know this, and I appreciate you having the candor to say so. I stepped up as a volunteer to participate as a maintainer in this project as an opportunity to learn how to become a better maintainer. A good way to learn is by doing, or at least trying. I also know that you don't owe me anything and I appreciate the time you've already spent answering my questions, and you have no obligation to help me become a better maintainer, so I appreciate whatever you're willing to share. |
I appreciate what you are doing to move this project forward: you are a being a great maintainer in all other aspects. There's just some things we can improve on communication here. What I explained should be pretty clear: whenever we create a branch, all dependencies of the migration on the branch have been applied already. It is logically impossible for there to be any missing dependency issues in this scenario when creating a single migration, regardless of the merge order of branches and application order of commits. If someone creates 2 migrations in one branch (happens more rarely), the second they create might depend on the first. However, those two will still be ordered with respect to each other. Now it's true that logically there can be conflicts (dependencies aren't missing) where two branches attempt to migrate the same thing. But such conflicts are resolved pre-merge where depending on how your CI operates, the migration would fail or the sql for the schema would conflict, etc. It's also worth noting that I don't know if we have even seen such conflicts other than with pull requests that have been abandoned for months. I don't know what workflows you are using, but it seems like you might not have a CI that is able to validate migrations? |
We do, but the CI always starts with a clean, empty database, and applies all the migrations in the branch in-order. This is the "first checkpoint"--if the migrations fail to apply successfully, it's generally the result of a newly introduced migration that has a defect, and it gets fixed. However, it is possible for two or more branches to diverge, and depending on the order in which those branches are merged and the version numbers of the migrations from those branches, they may both successfully apply cleanly in the CI environment where each is applied to an empty database, but could result in migrations being applied out-of-order in the upstream branch. This is why the introduction of the An environment where allowing different environments to have migrations applied in differing orders leaves me wondering if you have some other external process that periodically does a full schema comparison between environments to prove that they match? Otherwise, how do you avoid unintentional schema version drift between environments, if you allow migrations to be applied in different orders in different environments? That's what I'm trying to understand. Either you are accepting the risk that your schemas may differ between environments, which means testing in one may not reveal problems that will manifest in another, or you have some additional process to enforce consistency between environments? |
It isn't actually possible for the migrations to be out of order if both branches are branched off of the same base branch. That is if there is an issue it isn't an ordering (dependency issue), it is a conflict. But how are the migrations passing CI and then failing in the upstream branch? Is this because branches are being merged when they are not up to date? |
@Retch This is still an open pull request. If you want to test it, you'll need to build dbmate from this PR branch. |
@dossy do you use git flow or trunk-based development? |
I use GitHub flow or branch-based development. |
@dossy pretty much everyone uses branch-based development, at least that uses Github and has more than one contributor. With trunk-based there is only one special long-lived trunk branch (main). Releases are points on main and if need be temporary branches off a point of main. Which approach are you taking? |
@Retch I merged the latest into my branch. I use the migration filtering feature all the time, give it a try and report back. |
The latter: multiple long-lived branches, plus feature branches. |
@dossy I think that explains the difference in perspective. With git flow there are lots of merges going on and sometimes different entry points for new code and thus merges in both directions. Although I think it is too complicated for normal code maintenance, migrations could create some of the worst issues. With trunk-based development none of the issues you are concerned about end up being issues. I don't think we should design software with the primary constraint that it should work with git flow development. I believe git flow is a worse approach theoretically as this discussion shows. In practice Github doesn't properly support trunk based development but does properly support git flow, so I do understand why one would end up using git flow with Github. Right now this PR is in the state of being held back by objections based around a personal choices of how to manage branching strategy and assuming users cannot intelligently use this feature. I am more than happy to add some warnings to the documentation. |
I see why you say this, but I'd like to state that I am not holding back this PR based on my objections. As my personal working style has no need for this PR, I have no horse in this race. I've shared my opinion for others to consider, but am recusing myself from making any decisions about this PR. Someone who is properly qualified to judge this PR's appropriateness should chime in here and carry this forward. In hindsight, I probably should have done this from the start, but since you tagged me specifically asking for a review, I shared my opinions about the PR, but I'm not qualified to decide if it should be merged or not. Edited to add: I'd like to personally apologize to Greg for having wasted so much of his time asking questions about this PR, when I really had no business getting involved in the first place. Because it's so easy to share an opinion, I open my mouth (or, type with my fingers) when I really should refrain from speaking. It's a character flaw I know I need to work on, and it's even harder online because of how easy it is to just throw out a few cents without thinking it through more carefully. People close to me have said I "need to learn when to just shut up and keep your thoughts to yourself." I'll remember this as another instance where I've struggled to do just that, and hope to do better in the future. |
I don't know whether i'm using it wrong, but i can't get it to work properly. These are my migrations, just a test project: I dropped the db and executed all migrations. As i understand, with the -m -3 flag, when rolling back, the last 3 migrations should get rolled back, but it seems only the last migration is rolled back, but not the 3-recent. I also tested to specify the fourth-recent migration directly, and rollback the 3 most-recent, but only the last migration was rolled back. Maybe i use it wrong. |
There may be some issues to work out with the range |
Overview
Many users want to tell dbmate to run specific migrations. Issues have been opened as well as the following merge requests
None of these actually satisfied my use case so I opened the following PR:
The nice thing about #434 is it works for all commands, including status, allowing one to preview what will be done. However, it doesn't satisfy everyone else's use case properly.
Solution
This PR builds on top of #434 but additionally adds a range syntax. So one can specify
-m version1...version3
. Additionally, it supports positive and negative numerals as position arguments. So to use the last 2 migrations, one can use-2...
.Please see the test cases for more examples.
TODO
wait
.