Skip to content

Add boolean helm chart option dags.gitSync.syncToApiServer#63677

Closed
r-richmond wants to merge 1 commit intoapache:mainfrom
r-richmond:helm-gitsync-to-apiserver-option
Closed

Add boolean helm chart option dags.gitSync.syncToApiServer#63677
r-richmond wants to merge 1 commit intoapache:mainfrom
r-richmond:helm-gitsync-to-apiserver-option

Conversation

@r-richmond
Copy link
Copy Markdown
Contributor


Add helm chart option dags.gitSync.syncToApiServer

What

Simply adds a boolean option (Default false) to the helm chart that will allow the gitsync to work with api-server pods as well.

Why

We want to display the currently deployed githash using https://airflow.apache.org/docs/apache-airflow/stable/howto/customize-ui.html#dynamic-dashboard-alerts. But to do this I need the gitsync /sidecar on the api-server pods hence this pr

Was generative AI tooling used to co-author this PR?

  • Yes

Generated-by: [GPT-5.3-Codex] following the guidelines

Note: I'm not familiar with helm chart stuff at all, AKA maintainers feel free to take over this branch. I've got no desire for this credit I'd just like to slip this feature into 1.20 https://apache-airflow.slack.com/archives/C03G9H97MM2/p1773585210359609

@jscheffl
Copy link
Copy Markdown
Contributor

I am not a fan of it like this. Adding Git Sync to API Server sounds for me like a bit of a hack. Actually for security reasons the architecture wants to limit dependencies installed in API server.

I assume mostly the given documentation does not cater for this. I assume we should rather invest in a proper mechanism to have dashboard alerts in the database or some other mechanism to transport rather than git syncing them to API server.

Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierrejeambrun @bbovenzi Do we have any concept for the alerts? Is my assumption right that documentation might be outdated?

@r-richmond
Copy link
Copy Markdown
Contributor Author

r-richmond commented Mar 15, 2026

I assume mostly the given documentation does not cater for this. I assume we should rather invest in a proper mechanism to have dashboard alerts in the database or some other mechanism to transport rather than git syncing them to API server.

As I understand it the dashboard alert code runs on the api-server pods. So in order to adhere to the recommendation in the docs

When implementing dynamic alerts it is important to keep alert generation logic lightweight to avoid impacting dashboard load times. Consider caching results for expensive operations and handle exceptions gracefully to prevent alert generation from breaking the UI.

I wanted to just run a subprocess command to get the githash/timestamp from the currently deployed/synced git side-car rather than doing something more expensive like a DB query. (Although I wasn't aware this info would be in the DB, if it is where?).

Actually for security reasons the architecture wants to limit dependencies installed in API server.

My initial assumption is this is why the api-server pods were not configured with this. So I made the option default to False. For our deployment there is minimal/no additional risk so we'd like to turn this feature True. Said another way, we would get a lot of value out of knowing exactly which githash/timestamp is currently deployed when we load the frontend.

@jscheffl
Copy link
Copy Markdown
Contributor

My initial assumption is this is why the api-server pods were not configured with this. So I made the option default to False. For our deployment there is minimal/no additional risk so we'd like to turn this feature True. Said another way, we would get a lot of value out of knowing exactly which githash/timestamp is currently deployed when we load the frontend.

I see and we have at our location exactly the same. In Airflow 2 we had a modification and displayed the Git information in the footer... but still therefore adding GitSync is the wrong way. Architecturally.

@r-richmond
Copy link
Copy Markdown
Contributor Author

r-richmond commented Mar 15, 2026

but still therefore adding GitSync is the wrong way. Architecturally.

I don't see the harm in having the git repo on the api-server pods via GitSync... one of the potential deployment methods talks about baking the dag code into images which would do the same thing but at a slower cadence.... but i guess we wait for pierrejeambrun bbovenzi to get more consensus on what the right way would be?

@r-richmond
Copy link
Copy Markdown
Contributor Author

pierrejeambrun bbovenzi Do we have any concept for the alerts? Is my assumption right that documentation might be outdated?

I missed this question but, yes dynamic alerts were added back in Airflow 3.1.1 via #54677 #56259. Aka the docs seem up to date to me.

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Mar 16, 2026
@bugraoz93
Copy link
Copy Markdown
Contributor

At the current stage, I agree with Jens, as we are not aiming to use Dags directly from files in the API. The design of how DagBundles work and how we are fetching in the API aligns. Moving files to the API, which DagProcessor manages the DagBundles over files, can open a can of worms, too, since by design, we should have a separate DagProcessor deployed after v3. This can eliminate isolation while causing certain synchronisation issues between your process and DagBundles.

You should be able to achieve the same thing by using API endpoints in your code and acquire the DagBag from the API itself, without even loading it from files directly.

Load DagBag in API:

FastAPI dependency resolver that returns the shared DagBag instance from app.state.
This ensures that all API routes using DagBag via dependency injection receive the same
singleton instance that was initialized at app startup.

Get All Dags (even with good filters)

Even though there is a specific API for UI only and it doesn't provide backwards compatibility, as I suggest using the Core API even within the UI scope would make more sense from a stability and maintenance perspective.
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dags.py

@jscheffl
Copy link
Copy Markdown
Contributor

jscheffl commented Mar 16, 2026

As I have the same demand, how about the following - just tried it and might be a good option w/o needing this PR:

(1) Create a "pseudo Dag file", e.g. __git_info__.py in your Dags folder like:

from __future__ import annotations

from airflow.models.variable import Variable
from airflow.sdk import DAG

# assume we read the needed info from git here...
GIT_INFO = {
    "branch": "main",
    "commit": "abc123",
    "date": "2024-06-01T12:00:00Z",
}

Variable.set("git_info", GIT_INFO, serialize_json=True)

# Dummy something that Dag Parser will read this...
if isinstance("test", DAG):
    pass

(Some code to be added to read GIT info though... but you have this at your hands anyway...

Then use the following airflow_local_settings.py on your API server:

from __future__ import annotations

from cachetools import TTLCache, cached

from airflow.api_fastapi.common.types import UIAlert
from airflow.models.variable import Variable


@cached(cache=TTLCache(maxsize=1, ttl=60))
def get_dashboard_ui_alerts() -> str:
    """Get the list of UI alerts to be shown on the dashboard."""
    git_info = Variable.get("git_info", default_var=None, deserialize_json=True)
    return f"Current Git branch: {git_info.get('branch')}, commit: {git_info.get('commit')}, date: {git_info.get('date')}" if git_info else "No Git info available."


class DynamicAlerts(list):
    """Dynamically load the list of UI alerts to be shown on the dashboard."""

    def __iter__(self):
        yield UIAlert(text=get_dashboard_ui_alerts(), category="info")

DASHBOARD_UIALERTS = DynamicAlerts()

Yeah.. Variable might not be ideal but is cached for 60s at least... as long as there are no other state structures available... (There is still an ongoing discussion how to store e.g. state in triggerer.... but discussion is still ongoing)

Then gets to:
image

@r-richmond
Copy link
Copy Markdown
Contributor Author

As I have the same demand, how about the following

Looks possible, TY! The only thing that makes me a little nervous is we have a rule for no variables, since we got burned in the past with too many people/dags utilizing variables at parse time.. I can make it the one exception to the rule though...

So the variable would be updated every min_file_process_interval... thinking out loud i guess i could find an alternate way to do write/update the variable to alleviate this.. then we just lean on the ttl cache per pod.

Ok yea, thanks for the suggestion I think it's good enough to close this pr. I'd be interested if there is ever another way to do this as I agree with Yeah.. Variable might not be ideal .

@r-richmond r-richmond closed this Mar 16, 2026
@r-richmond r-richmond deleted the helm-gitsync-to-apiserver-option branch March 16, 2026 22:52
@jscheffl
Copy link
Copy Markdown
Contributor

Ok yea, thanks for the suggestion I think it's good enough to close this pr. I'd be interested if there is ever another way to do this as I agree with Yeah.. Variable might not be ideal .

FYI, implemented this today for our Test/Dev instances, users are happy :-D

@r-richmond
Copy link
Copy Markdown
Contributor Author

FYI, implemented this today for our Test/Dev instances, users are happy

Finally got around to it today. And yea the QOl improvement is significant. Thanks for rubber ducking with me.

I ended up keeping the logic in the dag run so I scheduled it at a 5 minute interval. I think I'm going to end up putting an api call (trigger dag run) in the post hook inside the git-sync side car though so it's event driven instead of scheduled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart kind:documentation ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants