Add boolean helm chart option dags.gitSync.syncToApiServer#63677
Add boolean helm chart option dags.gitSync.syncToApiServer#63677r-richmond wants to merge 1 commit intoapache:mainfrom
Conversation
|
I am not a fan of it like this. Adding Git Sync to API Server sounds for me like a bit of a hack. Actually for security reasons the architecture wants to limit dependencies installed in API server. I assume mostly the given documentation does not cater for this. I assume we should rather invest in a proper mechanism to have dashboard alerts in the database or some other mechanism to transport rather than git syncing them to API server. |
jscheffl
left a comment
There was a problem hiding this comment.
@pierrejeambrun @bbovenzi Do we have any concept for the alerts? Is my assumption right that documentation might be outdated?
As I understand it the dashboard alert code runs on the api-server pods. So in order to adhere to the recommendation in the docs
I wanted to just run a subprocess command to get the githash/timestamp from the currently deployed/synced git side-car rather than doing something more expensive like a DB query. (Although I wasn't aware this info would be in the DB, if it is where?).
My initial assumption is this is why the api-server pods were not configured with this. So I made the option default to False. For our deployment there is minimal/no additional risk so we'd like to turn this feature True. Said another way, we would get a lot of value out of knowing exactly which githash/timestamp is currently deployed when we load the frontend. |
I see and we have at our location exactly the same. In Airflow 2 we had a modification and displayed the Git information in the footer... but still therefore adding GitSync is the wrong way. Architecturally. |
I don't see the harm in having the git repo on the api-server pods via GitSync... one of the potential deployment methods talks about baking the dag code into images which would do the same thing but at a slower cadence.... but i guess we wait for |
I missed this question but, yes dynamic alerts were added back in Airflow 3.1.1 via #54677 #56259. Aka the docs seem up to date to me. |
|
At the current stage, I agree with Jens, as we are not aiming to use Dags directly from files in the API. The design of how DagBundles work and how we are fetching in the API aligns. Moving files to the API, which DagProcessor manages the DagBundles over files, can open a can of worms, too, since by design, we should have a separate DagProcessor deployed after v3. This can eliminate isolation while causing certain synchronisation issues between your process and DagBundles. You should be able to achieve the same thing by using API endpoints in your code and acquire the DagBag from the API itself, without even loading it from files directly. Load DagBag in API: airflow/airflow-core/src/airflow/api_fastapi/common/dagbag.py Lines 38 to 41 in b87ed1d Get All Dags (even with good filters) Even though there is a specific API for UI only and it doesn't provide backwards compatibility, as I suggest using the Core API even within the UI scope would make more sense from a stability and maintenance perspective. |
|
As I have the same demand, how about the following - just tried it and might be a good option w/o needing this PR: (1) Create a "pseudo Dag file", e.g. (Some code to be added to read GIT info though... but you have this at your hands anyway... Then use the following Yeah.. Variable might not be ideal but is cached for 60s at least... as long as there are no other state structures available... (There is still an ongoing discussion how to store e.g. state in triggerer.... but discussion is still ongoing) |
Looks possible, TY! The only thing that makes me a little nervous is we have a rule for no variables, since we got burned in the past with too many people/dags utilizing variables at parse time.. I can make it the one exception to the rule though... So the variable would be updated every min_file_process_interval... thinking out loud i guess i could find an alternate way to do write/update the variable to alleviate this.. then we just lean on the ttl cache per pod. Ok yea, thanks for the suggestion I think it's good enough to close this pr. I'd be interested if there is ever another way to do this as I agree with |
FYI, implemented this today for our Test/Dev instances, users are happy :-D |
Finally got around to it today. And yea the QOl improvement is significant. Thanks for rubber ducking with me. I ended up keeping the logic in the dag run so I scheduled it at a 5 minute interval. I think I'm going to end up putting an api call (trigger dag run) in the post hook inside the git-sync side car though so it's event driven instead of scheduled. |

Add helm chart option
dags.gitSync.syncToApiServerWhat
Simply adds a boolean option (Default false) to the helm chart that will allow the gitsync to work with api-server pods as well.
Why
We want to display the currently deployed githash using https://airflow.apache.org/docs/apache-airflow/stable/howto/customize-ui.html#dynamic-dashboard-alerts. But to do this I need the gitsync /sidecar on the api-server pods hence this pr
Was generative AI tooling used to co-author this PR?
Generated-by: [GPT-5.3-Codex] following the guidelines
Note: I'm not familiar with helm chart stuff at all, AKA maintainers feel free to take over this branch. I've got no desire for this credit I'd just like to slip this feature into 1.20 https://apache-airflow.slack.com/archives/C03G9H97MM2/p1773585210359609