[batch] A procedure to rename job_groups_cancelled.id -> job_groups_cancelled.batch_id #14672

ichengchang · 2024-09-06T16:59:24Z

The rename_job_groups_cancelled_columnsql file renames the job_groups_cancelled.id column to job_groups_cancelled.batch_id. The sql also updates all constraints that reference the original column to reflect the new column name.

I have reviewed other tables and found no foreign keys referencing the job_groups_cancelled table.

All queries that previously used job_groups_cancelled.id have been updated to reference job_groups_cancelled.batch_id accordingly.

Resolve #14646

…ed table

ehigham

Hi Ivan! Thanks so much for picking this up :) I haven't much experience on the batch system but I'll try my best to give accurate feedback. I have a few questions/observations up front:

Your changes to stored procedures under batch/sql make me a little nervous.

Most of these are migrations applied in the order defined in the build step mentioned in [NOTE 1] except estimated-current.sql [NOTE 2].

I don't think changing these will have the desired effect and may make it impossible for someone to reproduce the database.
The only changes to existing sql you'll need to make are in the sql strings in python code.

This needs to be written as a migration and maybe could be simplified?

I think this needs to be done as a database migration. We'll have no need for a stored procedure once complete.
You can assume current columns and constraints exist, dispense with the error checking and simplify.
Can you convert this to a sql script and add it to the end of the list of migrations in build.yaml? You'll probably want online: false too.
I fear you'll have to take inspiration from rename-job-groups-tables.sql by applying one ALTER TABLE command then drop and recreate EVERYTHING that references that name (constraints, triggers, procedures etc).
This will likely involve copy+paste and rename.
Alternatively, create, execute then drop the procedure within rename-job-groups-cancelled.

[NOTE 1] migration applied in build.yaml

The relevant build step in build.yaml can be found by searching for the entry starting with the yaml below. This controls which migrations are applied and in what order.

kind: createDatabase2
name: batch_database
databaseName: batch

[NOTE 2] estimated-current.yaml

I don't agree with why we have this. It would be nice to generate this automatically. Anyway, please keep your changes to this file as it's meant for documentation purposes only. None of it is applied and who knows how much of it works.

…efactored sql for simplification.

ichengchang · 2024-09-06T20:39:55Z

Hi Ivan! Thanks so much for picking this up :) I haven't much experience on the batch system but I'll try my best to give accurate feedback. I have a few questions/observations up front:

Your changes to stored procedures under batch/sql make me a little nervous.

Most of these are migrations applied in the order defined in the build step mentioned in [NOTE 1] except estimated-current.sql [NOTE 2].

I don't think changing these will have the desired effect and may make it impossible for someone to reproduce the database. The only changes to existing sql you'll need to make are in the sql strings in python code.

This needs to be written as a migration and maybe could be simplified?

I think this needs to be done as a database migration. We'll have no need for a stored procedure once complete. You can assume current columns and constraints exist, dispense with the error checking and simplify. Can you convert this to a sql script and add it to the end of the list of migrations in build.yaml? You'll probably want online: false too. I fear you'll have to take inspiration from rename-job-groups-tables.sql by applying one ALTER TABLE command then drop and recreate EVERYTHING that references that name (constraints, triggers, procedures etc). This will likely involve copy+paste and rename. Alternatively, create, execute then drop the procedure within rename-job-groups-cancelled.

[NOTE 1] migration applied in build.yaml

The relevant build step in build.yaml can be found by searching for the entry starting with the yaml below. This controls which migrations are applied and in what order.
kind: createDatabase2
name: batch_database
databaseName: batch
[NOTE 2] estimated-current.yaml

I don't agree with why we have this. It would be nice to generate this automatically. Anyway, please keep your changes to this file as it's meant for documentation purposes only. None of it is applied and who knows how much of it works.

Got it! I wasn't sure how Hail usually does schema update. Based on your above description the process becomes clearer ro me. Here's my second try:

Updated build.yaml in the batch database migrations section.
Simplified the sql in rename-job-groups-cancelled-column.sql.

Do you mean estimated-current.sql rather than estimated-current.yaml above?

ichengchang · 2024-09-06T20:54:58Z

@ehigham Also another question is how does the schema update enforce certain order of operations.

The rename-job-groups-cancelled-column sql should run before other sqls that depend on the modified column name in job_groups_cancelled table, correct?

batch/sql/finalize-job-groups.sql

batch/sql/rename-job-groups-tables.sql

ehigham · 2024-09-09T20:44:32Z

Do you mean estimated-current.sql rather than estimated-current.yaml above?

Yes, sorry for the confusion

Also another question is how does the schema update enforce certain order of operations.

The rename-job-groups-cancelled-column sql should run before other sqls that depend on the modified column name in job_groups_cancelled table, correct?

Migrations are applied successively. You cannot edit a previous migration or the order in which they're applied as they've already been applied to the production database.
That's why I said this:

I fear you'll have to take inspiration from rename-job-groups-tables.sql by applying one ALTER TABLE command then drop and recreate EVERYTHING that references that name (constraints, triggers, procedures etc). This will likely involve copy+paste and rename.

I think you need to find any trigger or stored procedure that references that column, drop it and recreate it with the field renamed. It's a little scary.

ichengchang · 2024-09-10T14:26:48Z

Do you mean estimated-current.sql rather than estimated-current.yaml above?

Yes, sorry for the confusion

Also another question is how does the schema update enforce certain order of operations.
The rename-job-groups-cancelled-column sql should run before other sqls that depend on the modified column name in job_groups_cancelled table, correct?

Migrations are applied successively. You cannot edit a previous migration or the order in which they're applied as they've already been applied to the production database. That's why I said this:

I fear you'll have to take inspiration from rename-job-groups-tables.sql by applying one ALTER TABLE command then drop and recreate EVERYTHING that references that name (constraints, triggers, procedures etc). This will likely involve copy+paste and rename.

I think you need to find any trigger or stored procedure that references that column, drop it and recreate it with the field renamed. It's a little scary.

@ehigham Thanks for your comments above. I’ve added the triggers and stored procedures referencing the job_groups_cancelled table in rename-job-groups-cancelled-column.sql.

I was initially confused by estimate-current.sql; I thought it was a system-generated file to track the latest batch DDLs after a schema update, rather than a file that is manually updated. After reading this thread, I completely agree with your point. In other organizations I've worked with, we maintained schema changes in a separate folder, identified by release versions (e.g., semver) and the DLLs are ordered by sequence number. This way, we had a clear history of DDLs and the order they were applied, eliminating the need for files like estimate-current.sql.

I just have one question: Do we need to manually update estimate-current.sql with the schema changes from rename-job-groups-cancelled-column.sql?

ehigham · 2024-09-10T16:55:42Z

I just have one question: Do we need to manually update estimate-current.sql with the schema changes from rename-job-groups-cancelled-column.sql?

Yes. estimated-current.sql is an estimated current schema for documentation purposes only. Please update it to reflect the state of the database once your migration has been applied.

ehigham · 2024-09-10T19:09:23Z

I think you missed reference in the python function delete_prev_cancelled_job_group_cancellable_resources_records

…cluding those referenced by alias.

ichengchang · 2024-09-10T20:17:32Z

I think you missed reference in the python function delete_prev_cancelled_job_group_cancellable_resources_records

Good catch! I’ve fixed it and also updated the id field referenced by the alias cancelled_t to batch_id in a few places.

ehigham · 2024-09-13T18:13:46Z

Still seeing this error in the deploy_batch job:

utils.py	retry_long_running:923	in delete_prev_cancelled_job_group_cancellable_resources_records	
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 915, in retry_long_running
    return await f(*args, **kwargs)\n  File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 959, in loop
    await f(*args, **kwargs)\n  File "/usr/local/lib/python3.9/dist-packages/batch/driver/main.py", line 1485, in delete_prev_cancelled_job_group_cancellable_resources_records
    async for target in targets:\n  File "/usr/local/lib/python3.9/dist-packages/gear/database.py", line 334, in execute_and_fetchall
    async for row in tx.execute_and_fetchall(sql, args, query_name):\n  File "/usr/local/lib/python3.9/dist-packages/gear/database.py", line 257, in execute_and_fetchall
    await cursor.execute(sql, args)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/cursors.py", line 239, in execute
    await self._query(query)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/cursors.py", line 457, in _query
    await conn.query(q)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 469, in query
    await self._read_query_result(unbuffered=unbuffered)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 683, in _read_query_result
    await result.read()\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 1164, in read
    first_packet = await self.connection._read_packet()\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 652, in _read_packet
    packet.raise_for_error()\n  File "/usr/local/lib/python3.9/dist-packages/pymysql/protocol.py", line 219, in raise_for_error
    err.raise_mysql_exception(self._data)\n  File "/usr/local/lib/python3.9/dist-packages/pymysql/err.py", line 150, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1054, "Unknown column 'cancelled.id' in 'on clause'")

ichengchang · 2024-09-16T15:21:06Z

Still seeing this error in the deploy_batch job:

utils.py	retry_long_running:923	in delete_prev_cancelled_job_group_cancellable_resources_records	
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 915, in retry_long_running
    return await f(*args, **kwargs)\n  File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 959, in loop
    await f(*args, **kwargs)\n  File "/usr/local/lib/python3.9/dist-packages/batch/driver/main.py", line 1485, in delete_prev_cancelled_job_group_cancellable_resources_records
    async for target in targets:\n  File "/usr/local/lib/python3.9/dist-packages/gear/database.py", line 334, in execute_and_fetchall
    async for row in tx.execute_and_fetchall(sql, args, query_name):\n  File "/usr/local/lib/python3.9/dist-packages/gear/database.py", line 257, in execute_and_fetchall
    await cursor.execute(sql, args)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/cursors.py", line 239, in execute
    await self._query(query)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/cursors.py", line 457, in _query
    await conn.query(q)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 469, in query
    await self._read_query_result(unbuffered=unbuffered)\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 683, in _read_query_result
    await result.read()\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 1164, in read
    first_packet = await self.connection._read_packet()\n  File "/usr/local/lib/python3.9/dist-packages/aiomysql/connection.py", line 652, in _read_packet
    packet.raise_for_error()\n  File "/usr/local/lib/python3.9/dist-packages/pymysql/protocol.py", line 219, in raise_for_error
    err.raise_mysql_exception(self._data)\n  File "/usr/local/lib/python3.9/dist-packages/pymysql/err.py", line 150, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1054, "Unknown column 'cancelled.id' in 'on clause'")

This error strikes me as odd because cancelled.id has been updated to cancelled.batch_id in delete_prev_cancelled_job_group_cancellable_resources_records:

batch/driver/main.py

Based on the error, it looks like the main.py being executed at /usr/local/lib/python3.9/dist-packages/batch/driver/main.py is still using the old version of the code, the changes from the PR were not correctly reflected in the environment. Is it possible that we might be missing a pip install step to ensure the latest code is deployed?

A procedure for renaming a column (non-primary) in job_groups_cancell…

09f081c

…ed table

patrick-schultz requested a review from ehigham September 6, 2024 17:46

ehigham requested changes Sep 6, 2024

View reviewed changes

Added rename-job-groups-cancelled-column script to batch migration. R…

45e0049

…efactored sql for simplification.

ichengchang requested a review from ehigham September 6, 2024 20:40

ehigham requested changes Sep 9, 2024

View reviewed changes

batch/sql/finalize-job-groups.sql Outdated Show resolved Hide resolved

batch/sql/rename-job-groups-tables.sql Outdated Show resolved Hide resolved

ichengchang added 2 commits September 9, 2024 16:53

revert files

fec05d4

Recreate triggers and stored procedures after column name change.

64dcea3

ichengchang requested a review from ehigham September 10, 2024 14:52

Replaced job_groups_cancelled.id by job_groups_cancelled.batch_id, in…

c6e3c66

…cluding those referenced by alias.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[batch] A procedure to rename job_groups_cancelled.id -> job_groups_cancelled.batch_id #14672

[batch] A procedure to rename job_groups_cancelled.id -> job_groups_cancelled.batch_id #14672

ichengchang commented Sep 6, 2024 •

edited

Loading

ehigham left a comment •

edited

Loading

ichengchang commented Sep 6, 2024

ichengchang commented Sep 6, 2024

ehigham commented Sep 9, 2024

ichengchang commented Sep 10, 2024

ehigham commented Sep 10, 2024

ehigham commented Sep 10, 2024

ichengchang commented Sep 10, 2024

ehigham commented Sep 13, 2024

ichengchang commented Sep 16, 2024 •

edited

Loading

[batch] A procedure to rename job_groups_cancelled.id -> job_groups_cancelled.batch_id #14672

Are you sure you want to change the base?

[batch] A procedure to rename job_groups_cancelled.id -> job_groups_cancelled.batch_id #14672

Conversation

ichengchang commented Sep 6, 2024 • edited Loading

ehigham left a comment • edited Loading

Choose a reason for hiding this comment

ichengchang commented Sep 6, 2024

ichengchang commented Sep 6, 2024

ehigham commented Sep 9, 2024

ichengchang commented Sep 10, 2024

ehigham commented Sep 10, 2024

ehigham commented Sep 10, 2024

ichengchang commented Sep 10, 2024

ehigham commented Sep 13, 2024

ichengchang commented Sep 16, 2024 • edited Loading

ichengchang commented Sep 6, 2024 •

edited

Loading

ehigham left a comment •

edited

Loading

ichengchang commented Sep 16, 2024 •

edited

Loading