Skip to content

Conversation

@dstandish
Copy link
Contributor

Previously we used one field to serve two purposes.

  1. what's the next dag run that should be created
  2. whether max active runs is exceeded (and therefore scheduler should not bother creating)

Previously we would set next_dagrun and next_dagrun_create_after to NULL when max active runs is exceeded.

This .... basically worked but it had some bad consequences.

It made the logic at call site more confuing because of all the conditionals.

And by nulling out the value, it forced us to re-set the value with possibly incorrect data.

For example previously if it was null we would possibly re-set it every time a dag run completed. And we would use the data interval of the just-completed dag run to set it. But if the dag run that just completed was not the latest one, then we would set it to a value that is not the latest time! And we can't even easily check if we have the "latest run" because, if it is null, there's no value to compare with.

So, what I do here is add a boolean field to mark when the dag has exceeded max active runs. And I set that separately from the next dag run info. And I only set the next dag run info when we create a new dag run; if we never "unset" it, then we don't need to re-set it in those scenarios.

Previously we used one field to serve two purposes.

1. what's the next dag run that should be created
2. whether max active runs is exceeded (and therefore scheduler should not bother creating)

Previously we would set next_dagrun and next_dagrun_create_after to NULL when max active runs is exceeded.

This .... basically worked but it had some bad consequences.

It made the logic at call site more confuing because of all the conditionals.

And by nulling out the value, it forced us to re-set the value with possibly incorrect data.

For example previously if it was null we would possibly re-set it every time a dag run completed. And we would use the data interval of the just-completed dag run to set it.  But if the dag run that just completed was not the latest one, then we would set it to a value that is not the latest time! And we can't even easily *check* if we have the "latest run" because, if it is null, there's no value to compare with.

So, what I do here is add a boolean field to mark when the dag has exceeded max active runs.  And I set that separately from the next dag run info.  And I only set the next dag run info when we create a new dag run; if we never "unset" it, then we don't need to re-set it in those scenarios.
data_interval=data_interval,
)
self._check_exceeds_max_active_runs(dag_model=dag_model, session=session)
dag_model.calculate_dagrun_date_fields(dag=serdag, last_automated_dag_run=data_interval)
Copy link
Contributor Author

@dstandish dstandish Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have separated the "exceeds max" check from "next dagrun", we can now call them both unconditionally when creating a new run

DagRunType.MANUAL,
DagRunType.ASSET_TRIGGERED,
):
self._check_exceeds_max_active_runs(dag_model=dag_model, session=session)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not need to call calculate_dagrun_date_fields here anymore, since it's not getting set to null to signal "exceeds max". since we're not adding any new runs here, we shouldn't need to change the next run.

note that we can now evaluate max active runs with more dag run types, since we've decoupled that from the dates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:DAG-processing area:db-migrations PRs with DB migration area:Scheduler including HA (high availability) scheduler kind:documentation

1 participant