-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Separate "next dag run" from "max active runs" #60006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously we used one field to serve two purposes. 1. what's the next dag run that should be created 2. whether max active runs is exceeded (and therefore scheduler should not bother creating) Previously we would set next_dagrun and next_dagrun_create_after to NULL when max active runs is exceeded. This .... basically worked but it had some bad consequences. It made the logic at call site more confuing because of all the conditionals. And by nulling out the value, it forced us to re-set the value with possibly incorrect data. For example previously if it was null we would possibly re-set it every time a dag run completed. And we would use the data interval of the just-completed dag run to set it. But if the dag run that just completed was not the latest one, then we would set it to a value that is not the latest time! And we can't even easily *check* if we have the "latest run" because, if it is null, there's no value to compare with. So, what I do here is add a boolean field to mark when the dag has exceeded max active runs. And I set that separately from the next dag run info. And I only set the next dag run info when we create a new dag run; if we never "unset" it, then we don't need to re-set it in those scenarios.
| data_interval=data_interval, | ||
| ) | ||
| self._check_exceeds_max_active_runs(dag_model=dag_model, session=session) | ||
| dag_model.calculate_dagrun_date_fields(dag=serdag, last_automated_dag_run=data_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have separated the "exceeds max" check from "next dagrun", we can now call them both unconditionally when creating a new run
| DagRunType.MANUAL, | ||
| DagRunType.ASSET_TRIGGERED, | ||
| ): | ||
| self._check_exceeds_max_active_runs(dag_model=dag_model, session=session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not need to call calculate_dagrun_date_fields here anymore, since it's not getting set to null to signal "exceeds max". since we're not adding any new runs here, we shouldn't need to change the next run.
note that we can now evaluate max active runs with more dag run types, since we've decoupled that from the dates.
Previously we used one field to serve two purposes.
Previously we would set next_dagrun and next_dagrun_create_after to NULL when max active runs is exceeded.
This .... basically worked but it had some bad consequences.
It made the logic at call site more confuing because of all the conditionals.
And by nulling out the value, it forced us to re-set the value with possibly incorrect data.
For example previously if it was null we would possibly re-set it every time a dag run completed. And we would use the data interval of the just-completed dag run to set it. But if the dag run that just completed was not the latest one, then we would set it to a value that is not the latest time! And we can't even easily check if we have the "latest run" because, if it is null, there's no value to compare with.
So, what I do here is add a boolean field to mark when the dag has exceeded max active runs. And I set that separately from the next dag run info. And I only set the next dag run info when we create a new dag run; if we never "unset" it, then we don't need to re-set it in those scenarios.