Add crashdump example and include snapshot/scratch in core dumps#1264
Add crashdump example and include snapshot/scratch in core dumps#1264jsturtevant wants to merge 1 commit intohyperlight-dev:mainfrom
Conversation
Core dumps generated by Hyperlight were missing the snapshot and scratch memory regions, making post-mortem debugging with GDB incomplete — register state was present but the guest's code, stack, heap, and page tables were absent. This adds the snapshot and scratch regions to the ELF core dump alongside any dynamically mapped regions so that GDB can show full backtraces, disassemble at the crash site, and inspect guest memory. A new runnable crashdump example demonstrates automatic dumps (VM-level faults), on-demand dumps (guest-caught exceptions), and per-sandbox opt-out, with GDB-based integration tests that validate register and memory content in the generated ELF files. The debugging docs are also updated with practical GDB commands for inspecting crash dumps. Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
| @@ -0,0 +1,599 @@ | |||
| /* | |||
| Copyright 2025 The Hyperlight Authors. | |||
There was a problem hiding this comment.
Unrelated question, I remember from some trainings I've taken some very long time ago, that if we add a new file we should have the year reflect that. Also, if any modifications to the file are made in a different year, then a new header with the new year should be applied.
Is that true here also? I am curious how we should approach these scenarios.
There was a problem hiding this comment.
The way I've done this in the past is to have a linter in the CI that would verify the header on files in the PR. Example inherited from long ago.
The date doesn't need to be updated. It can stay the creation date for the file. Some folks will opt for a date range $creation_year - $most_recent_update_year. For example, check out https://github.com/kubernetes/kubernetes/blob/07a1af766fd54f1f495a854ddf3e5227241fb961/pkg/api/node/util.go in the K/K repo.
| println!("=== Hyperlight Crash Dump Example ===\n"); | ||
|
|
||
| // ----------------------------------------------------------------------- | ||
| // Part 1: Automatic crash dump (VM-level fault bypasses guest handler) |
There was a problem hiding this comment.
nit: Maybe guest caused crash dump or something like that is more suggestive than automatic?
| let mut regions: Vec<MemoryRegion> = Vec::new(); | ||
|
|
||
| // Snapshot region: contains guest code, read-only data, page tables, etc. | ||
| if let Some(snapshot) = &self.snapshot_memory { |
There was a problem hiding this comment.
I don't know for sure about this, I'll try and run it locally, see how it looks
Core dumps generated by Hyperlight were missing the snapshot and scratch memory regions, making post-mortem debugging with GDB incomplete — register state was present but the guest's code, stack, heap, and page tables were absent. This adds the snapshot and scratch regions to the ELF core dump alongside any dynamically mapped regions so that GDB can show full backtraces, disassemble at the crash site, and inspect guest memory. A new runnable crashdump example demonstrates automatic dumps (VM-level faults), on-demand dumps (guest-caught exceptions), and per-sandbox opt-out, with GDB-based integration tests that validate register and memory content in the generated ELF files. The debugging docs are also updated with practical GDB commands for inspecting crash dumps.