-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
I have built a toy bot to run arbitrary benchmarks on Azure Linux VMs to then report the results back to PRs/issues the bot was invoked from. It also runs native Linux perf tool to collect traces/flamegraphs on demand. What I noticed from the beginning is that the quality of flamegraphs on x64 is often poor and a bit random between runs compared to exactly the same config on arm64.
Example: #105593 (comment)
So here is how flamegraphs look like between sequential runs on arm64 (which is Ampere Altra):
- Default arm64 run 1: https://telegafiles.blob.core.windows.net/telega/base_flamegraph_8f9736ce.svg
- Default arm64 run 2: https://telegafiles.blob.core.windows.net/telega/diff_flamegraph_8f9736ce.svg
if you open these graphs and try to switch between tabs in your browser, you will notice almost no difference between "run 0" and "run 1" as expected.
On x64 (Amd Milano) the picture is a bit different:
- Default x64 run 1: https://telegafiles.blob.core.windows.net/telega/base_flamegraph_1837f76f.svg
- Default x64 run 2: https://telegafiles.blob.core.windows.net/telega/diff_flamegraph_1837f76f.svg
(presumably, with less aggressive/disabled inlining it's a lot worse)
the graphs are a lot "random" between the runs. We wonder if this could be caused by an x64-specific optimization to omit frame pointers for simple methods (on arm64 we currently always emit them, although, we have an issue to eventually fix it: #88823 and #35274). Should we never use that optimization when PerfMap is enabled? (or even introduce a new knob since it's not only perf specific).
cc @jkotas @dotnet/dotnet-diag