-
Notifications
You must be signed in to change notification settings - Fork 476
feat(profiling): support Python 3.14 #15546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 254 ± 3 ms. The average import time from base is: 259 ± 4 ms. The import time difference between this PR and base is: -4.4 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate taegyunkim/prof-12435-py314 (c99ec52) with baseline main (fd1f2f9) 📈 Performance Regressions (2 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.063µs (SLO: <10.000µs 📉 -49.4%) vs baseline: 📈 +18.5% Memory: ✅ 38.574MB (SLO: <41.000MB -5.9%) vs baseline: +4.8% ✅ ospathbasename_noaspectTime: ✅ 1.086µs (SLO: <10.000µs 📉 -89.1%) vs baseline: -0.2% Memory: ✅ 38.653MB (SLO: <41.000MB -5.7%) vs baseline: +5.0% ✅ ospathjoin_aspectTime: ✅ 6.010µs (SLO: <10.000µs 📉 -39.9%) vs baseline: ~same Memory: ✅ 38.594MB (SLO: <41.000MB -5.9%) vs baseline: +4.7% ✅ ospathjoin_noaspectTime: ✅ 2.300µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +1.2% Memory: ✅ 38.594MB (SLO: <41.000MB -5.9%) vs baseline: +4.8% ✅ ospathnormcase_aspectTime: ✅ 3.481µs (SLO: <10.000µs 📉 -65.2%) vs baseline: -0.1% Memory: ✅ 38.614MB (SLO: <41.000MB -5.8%) vs baseline: +4.8% ✅ ospathnormcase_noaspectTime: ✅ 0.572µs (SLO: <10.000µs 📉 -94.3%) vs baseline: +0.2% Memory: ✅ 38.712MB (SLO: <41.000MB -5.6%) vs baseline: +5.2% ✅ ospathsplit_aspectTime: ✅ 4.837µs (SLO: <10.000µs 📉 -51.6%) vs baseline: -0.5% Memory: ✅ 38.457MB (SLO: <41.000MB -6.2%) vs baseline: +4.2% ✅ ospathsplit_noaspectTime: ✅ 1.589µs (SLO: <10.000µs 📉 -84.1%) vs baseline: -0.9% Memory: ✅ 38.692MB (SLO: <41.000MB -5.6%) vs baseline: +5.2% ✅ ospathsplitdrive_aspectTime: ✅ 3.761µs (SLO: <10.000µs 📉 -62.4%) vs baseline: +1.9% Memory: ✅ 38.614MB (SLO: <41.000MB -5.8%) vs baseline: +4.8% ✅ ospathsplitdrive_noaspectTime: ✅ 0.705µs (SLO: <10.000µs 📉 -92.9%) vs baseline: -0.4% Memory: ✅ 38.614MB (SLO: <41.000MB -5.8%) vs baseline: +4.9% ✅ ospathsplitext_aspectTime: ✅ 4.619µs (SLO: <10.000µs 📉 -53.8%) vs baseline: +0.3% Memory: ✅ 38.574MB (SLO: <41.000MB -5.9%) vs baseline: +4.6% ✅ ospathsplitext_noaspectTime: ✅ 1.374µs (SLO: <10.000µs 📉 -86.3%) vs baseline: -1.0% Memory: ✅ 38.673MB (SLO: <41.000MB -5.7%) vs baseline: +5.0% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.517µs (SLO: <20.000µs 📉 -82.4%) vs baseline: 📈 +13.8% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4% ✅ 1-count-metrics-100-timesTime: ✅ 205.325µs (SLO: <220.000µs -6.7%) vs baseline: -2.4% Memory: ✅ 34.937MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ 1-distribution-metric-1-timesTime: ✅ 3.390µs (SLO: <20.000µs 📉 -83.0%) vs baseline: -0.8% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.4% ✅ 1-distribution-metrics-100-timesTime: ✅ 218.917µs (SLO: <230.000µs -4.8%) vs baseline: -1.7% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.1% ✅ 1-gauge-metric-1-timesTime: ✅ 2.201µs (SLO: <20.000µs 📉 -89.0%) vs baseline: -0.3% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +3.9% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.207µs (SLO: <150.000µs -9.2%) vs baseline: -0.5% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.7% ✅ 1-rate-metric-1-timesTime: ✅ 3.186µs (SLO: <20.000µs 📉 -84.1%) vs baseline: -1.6% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +3.6% ✅ 1-rate-metrics-100-timesTime: ✅ 220.906µs (SLO: <250.000µs 📉 -11.6%) vs baseline: -0.2% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5% ✅ 100-count-metrics-100-timesTime: ✅ 20.930ms (SLO: <22.000ms -4.9%) vs baseline: +1.5% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.2% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.259ms (SLO: <2.550ms 📉 -11.4%) vs baseline: -0.8% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.7% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.402ms (SLO: <1.550ms -9.6%) vs baseline: -0.2% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.1% ✅ 100-rate-metrics-100-timesTime: ✅ 2.263ms (SLO: <2.550ms 📉 -11.3%) vs baseline: -0.3% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.0% ✅ flush-1-metricTime: ✅ 4.694µs (SLO: <20.000µs 📉 -76.5%) vs baseline: -0.6% Memory: ✅ 35.232MB (SLO: <35.500MB 🟡 -0.8%) vs baseline: +5.2% ✅ flush-100-metricsTime: ✅ 174.212µs (SLO: <250.000µs 📉 -30.3%) vs baseline: -0.6% Memory: ✅ 35.212MB (SLO: <35.500MB 🟡 -0.8%) vs baseline: +5.0% ✅ flush-1000-metricsTime: ✅ 2.167ms (SLO: <2.500ms 📉 -13.3%) vs baseline: -0.5% Memory: ✅ 35.999MB (SLO: <36.500MB 🟡 -1.4%) vs baseline: +4.9% 🟡 Near SLO Breach (16 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/threads.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/threads.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/threads.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/src/echion/frame.cc
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
KowalskiThomas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I think we should just make it clear what the expectation is with regards to Generator Frames.
taegyunkim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the code a bit to keep the original address of PyThreadState struct and pass that down to copy over the asyncio_tasks_head from _PyThreadStateImpl
This fixed the issue of slowdown in asyncio tests.
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/tasks.h
Outdated
Show resolved
Hide resolved
ddtrace/internal/datadog/profiling/stack_v2/echion/echion/threads.h
Outdated
Show resolved
Hide resolved
brettlangdon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
## Description For [feat(profiling): support Python 3.14](#15546), I first gathered how we use CPython internal headers/structs, and then compared how they changed from 3.13.0 to 3.14.0. Then, generated an action plan based off on these. This PR is to have those two skills available so that we can follow similar steps when adding support for newer versions of Python, 3.15+, and free-threading Python. <!-- Provide an overview of the change and motivation for the change --> ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes <!-- Any other information that would be helpful for reviewers -->
Co-authored-by: Brett Langdon <[email protected]>
## Description Split out from #15546. <!-- Provide an overview of the change and motivation for the change --> ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes <!-- Any other information that would be helpful for reviewers -->
Description
This PR adds support for Python 3.14 in the profiler by updating it to handle CPython internal changes.
Key CPython changes addressed
_PyInterpreterFrameStructure ChangesInclude/internal/pycore_frame.htoInclude/internal/pycore_interpframe_structs.hPyObject *f_executableandPyObject *f_funcobjchanged to_PyStackReftype. Profilers like us now need to clear the LSB of these fields to get thePyObject*. See Use deferred reference counting in some_PyInterpreterFramefields python/cpython#123923 for detailsint stacktopfield removed, replaced with_PyStackRef *stackpointerpointer. See GH-120024: Use pointer for stack pointer python/cpython#121923 (GH-120024) for detailsPyObject *localsplus[1]changed to_PyStackRef localsplus[1]. See gh-117139: Convert the evaluation stack to stack refs python/cpython#118450 (gh-117139) for detailsFutureObj/TaskObjChangesawaited_by,is_task,awaited_by_is_setinFutureObj_HEADmacrostruct llist_node_task_nodefield for linked-list storageAsyncio Task Storage Changes
Prior to Python 3.14,
_scheduled_tasksWeakSet (exported from C extension)_eager_tasksset (exported from C extension)From Python 3.14,
asyncio.Tasksare stored in a linked-list (struct llist_node) per thread and per interpretertstate->asyncio_tasks_head(in_PyThreadStateImpl)interp->asyncio_tasks_head(for lingering tasks)TaskObjhas atask_nodefield withnextandprevpointers_scheduled_tasksWeakSet (now Python-only, not exported from C extension)_eager_taskssetImplementation Summary
Frame reading (
frame.h,frame.cc): Updated header includes to usepycore_interpframe_structs.hfor Python 3.14+. Implemented tagged pointer handling: clear LSB off_executableto recoverPyObject*(per gh-123923). Replacedstacktopfield access withstackpointerpointer arithmetic for stack depth calculation. UpdatedPyGen_yf()to use_PyStackRefandstackpointer[-1]instead oflocalsplus[stacktop-1]. Added handling forFRAME_OWNED_BY_INTERPRETERframe type (introduced in 3.14).Task structures (
cpython/tasks.h): Added Python 3.14+FutureObj_HEADmacro with new fields:awaited_by,is_task,awaited_by_is_set. Addedstruct llist_node task_nodefield toTaskObjfor linked-list storage. UpdatedPyGen_yf()implementation to handle_PyStackRefandstackpointerinstead ofstacktop.Asyncio discovery (
tasks.h,threads.h): Implementedget_tasks_from_linked_list()to safely iterate over circular linked-lists with iteration limits (MAX_ITERATIONS = 2 << 15). Addedget_tasks_from_thread_linked_list()to read tasks from_PyThreadStateImpl.asyncio_tasks_head(per-thread active tasks). Addedget_tasks_from_interpreter_linked_list()to read lingering tasks fromPyInterpreterState.asyncio_tasks_head(per-interpreter). Updatedget_all_tasks()to handle both linked-list (nativeasyncio.Taskinstances) and WeakSet (third-party tasks).Python integration (
_asyncio.py): Added compatibility handling forBaseDefaultEventLoopPolicy→_BaseDefaultEventLoopPolicyrename in 3.14. Updated_scheduled_tasksaccess to handle Python-only WeakSet (no longer exported from C extension in 3.14+).Testing
All existing tests pass except for tests/profiling/collector/test_memalloc.py which needed some edits.
Risks
Additional Notes