Fix/remove orm from alembic migrations #60012

Arunodoy18 · 2026-01-01T06:17:55Z

What does this PR do?

Refactors the 0015_2_9_0_update_trigger_kwargs_type.py migration script to remove ORM model imports and usage
Removes direct imports of airflow.models.Trigger from the migration
Adds a pre-commit hook (check-no-orm-in-migrations) that prevents ORM model imports in future Alembic migration scripts
Adds explanatory comments about why the original ORM-based operations were unnecessary

Why is this needed?

Using ORM models directly in Alembic migration scripts causes issues when the model schema changes. Specifically:

If a new column is added to the Trigger model in a future version (e.g., 3.2.0)
The ORM reference in the 2.9.0 migration script will include that new column
Downgrades from 3.2.X to ≤ 2.9.0 will fail with errors like:
```
sqlalchemy.exc.ProgrammingError: column trigger.trigger_queue does not exist
```

This is a well-known anti-pattern in Alembic migrations. The general recommendation is to use SQLAlchemy Core (table reflection) or raw SQL instead of ORM models.

How to test

Run the pre-commit hook: python dev/check_no_orm_in_migrations.py
Verify no ORM imports are detected in migration scripts
Test upgrade/downgrade scenarios:
```bash
breeze shell 'airflow db reset --skip-init -y && airflow db migrate --to-revision heads'
--use-airflow-version 2.11.0 --airflow-extras pydantic --answer y
&& breeze shell "export AIRFLOW__DATABASE__EXTERNAL_DB_MANAGERS=airflow.providers.fab.auth_manager.models.db.FABDBManager
&& airflow db migrate --to-revision heads
&& airflow db downgrade -n 2.7.0 -y
&& airflow db migrate"
```

Related Issues

Fixes issue with Alembic migrations using ORM models that prevent future schema changes." --base main --head fix/remove-orm-from-alembic-migrations

Closes: 59871

The help text for --start-date and --end-date arguments only documented the YYYY-MM-DD format, but the actual implementation uses pendulum.parse() which accepts a much wider variety of formats. This commit updates the help text to accurately document the commonly used formats: - YYYY-MM-DD (date only) - YYYY-MM-DDTHH:MM:SS (datetime) - YYYY-MM-DDTHH:MM:SSHH:MM (datetime with timezone, ISO 8601) The help text also references pendulum.parse() to indicate that additional formats are supported, improving clarity for users. Fixes: Incomplete documentation of date format options

This change addresses an issue where the scheduler crashes when executors (such as AWS Lambda Executor) generate external executor IDs exceeding 250 characters. Long dag_id, task_id, and run_id combinations can easily exceed this limit, causing database constraint violations. Changes: - Created migration 0094_3_2_0_increase_external_executor_id_length.py to alter the external_executor_id column in both task_instance and task_instance_history tables from VARCHAR(250) to VARCHAR(1000) - Updated TaskInstance model to use StringID(length=1000) for external_executor_id column - Updated TaskInstanceHistory model to use StringID(length=1000) for external_executor_id column This fix allows executors with longer identifiers to work properly without causing StringDataRightTruncation errors. Fixes: #<issue_number>

Prevent unnecessary cache invalidation in useRefreshOnNewDagRuns hook when the component first mounts by adding an early return when the previousDagRunIdRef is undefined. On initial page load, when latestDagRunId is fetched for the first time, it would differ from the undefined previousDagRunIdRef, causing cache invalidation and duplicate HTTP requests to endpoints like /ui/grid/runs, /ui/grid/structure, etc. The fix sets previousDagRunIdRef without invalidating queries on the first render, ensuring cache invalidation only occurs when a genuinely new DAG run appears (not on initial load). This reduces unnecessary network traffic and improves page load performance while preserving the intended refresh behavior when new DAG runs complete.

Handle NotFound exception when deleting already-deleted Dataproc clusters to prevent failures in ephemeral cluster cleanup patterns. When using DataprocCreateClusterOperator with delete_on_error=True (default), failed cluster creation automatically deletes the cluster. Downstream DataprocDeleteClusterOperator with TriggerRule.ALL_DONE would then fail with NotFound error when attempting to delete the already-deleted cluster. This change makes the operator idempotent by: - Catching NotFound exceptions in both synchronous and deferrable modes - Logging informational message when cluster is already deleted - Completing successfully instead of failing This enables clean ephemeral cluster patterns: Create Cluster -> Run Jobs -> Delete Cluster (ALL_DONE trigger rule) The operator now succeeds whether the cluster exists or not, preventing cleanup task failures from masking actual upstream failures in monitoring.

The NeedsReviewButton component was continuously polling the /api/v2/dags/{dag_id}/dagRuns/{dag_run_id}/hitlDetails endpoint even when there were no pending or deferred task instances. This resulted in unnecessary API calls and increased server load. This commit optimizes the polling behavior by: 1. First checking if there are any deferred task instances using the lighter-weight getTaskInstances endpoint 2. Only calling the hitlDetails endpoint when deferred tasks exist 3. Using the 'enabled' option in React Query to conditionally enable the hitlDetails query This change significantly reduces API calls when DAGs have no pending runs, while maintaining the same functionality when pending actions are present. Fixes: Issue where hitlDetails API was continuously polled even when no pending runs existed

The scheduler was using potentially stale DagRun data when calculating next_dagrun fields, which could cause next_dagrun_create_after to be set back in time in distributed systems. Problem: - calculate_dagrun_date_fields() accepted last_automated_dag_run parameter - Parameter name implied it was always the latest run, but wasn't always true - Scheduler passed whatever run it was processing at the time - In distributed systems, newer runs could exist by the time calculation happened - This led to next_dagrun being calculated from outdated data Solution: - Rename parameter to last_automated_data_interval to clarify it's for calculation - Update docstring to make expectations explicit - Add _get_latest_automated_dagrun_data_interval() helper in scheduler - Always query for actual latest automated run before calculating next_dagrun - Ensures next_dagrun fields are always calculated from current data This prevents race conditions where: 1. Scheduler processes run A 2. Run B is created by another process 3. Scheduler calculates next_dagrun based on A (now stale) 4. next_dagrun is set incorrectly, potentially back in time Changes: - Renamed last_automated_dag_run to last_automated_data_interval in DagModel.calculate_dagrun_date_fields() - Added helper method to query actual latest automated run - Updated all scheduler call sites to query before calculating - Improved documentation to prevent future misuse

…cess, failed); clarify OpenAPI docs; add tests for delete validation

- Refactor 0015_2_9_0_update_trigger_kwargs_type.py to avoid importing airflow.models.Trigger - Remove ORM-based operations that caused issues with future schema changes - Add pre-commit hook to prevent ORM model imports in future migrations - Prevent issues with downgrade when new columns are added to Trigger model This fixes the issue where adding new columns to the Trigger model would break downgrades to 2.9.0 due to ORM references in the migration script.

Arunodoy18 · 2026-01-01T06:19:15Z

I've opened this PR to address the issue discussed above. The goal is to make the migration behavior safer and more predictable by avoiding ORM usage in Alembic scripts. I’d appreciate a review when you have time, and I’m happy to iterate on the approach based on feedback.

potiuk · 2026-01-01T13:33:40Z

You have conflicts - likely with this one #59239 that addresses the issue partially. My suggestion: always pull latest main first before opening a PR :).

You shoudl rebase and resolve conflicts

Arunodoy18 added 8 commits December 25, 2025 11:16

fix(api): Only allow deleting DAG runs in allowed states (queued, suc…

e971f84

…cess, failed); clarify OpenAPI docs; add tests for delete validation

Arunodoy18 requested review from XD-DENG, amoghrajesh, ashb, bbovenzi, bugraoz93, ephraimbuddy, gopidesupavan, guan404ming, jason810496, jedcunningham, jscheffl, kaxil, pierrejeambrun, potiuk, rawwar, ryanahamilton, shubhamraj-git and vatsrahul1001 as code owners January 1, 2026 06:17

boring-cyborg bot added area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration area:dev-tools area:providers labels Jan 1, 2026

boring-cyborg bot added area:Scheduler including HA (high availability) scheduler area:UI Related to UI/UX. For Frontend Developers. backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch provider:google Google (including GCP) related issues labels Jan 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/remove orm from alembic migrations #60012

Fix/remove orm from alembic migrations #60012

Arunodoy18 commented Jan 1, 2026

Arunodoy18 commented Jan 1, 2026

potiuk commented Jan 1, 2026

Labels

2 participants

Fix/remove orm from alembic migrations #60012

Are you sure you want to change the base?

Fix/remove orm from alembic migrations #60012

Conversation

Arunodoy18 commented Jan 1, 2026

Why is this needed?

How to test

Related Issues

Arunodoy18 commented Jan 1, 2026

potiuk commented Jan 1, 2026

Labels

2 participants