Conversation
- With the new Copy-on-Write changes, the guest virtual addresses and guest physical addresses are no longer identity mapped, so we need a way to do this translation when a new guest trace batch arrives from the guest Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- Use registers r12, r13 and r14 to pass trace data information from the guest to the host - This is in use temporarily, until the virtio queues are implemented Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- For each call to `internal_dispatch_function` now we create a span manually because instrumenting the function would be redundant because of the call to `new_call` that resets tracing state for each new function call - The same logic is used for `generic_init` to enable it for the initialise call Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
78fb1fe to
6176db1
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes guest→host tracing breakage introduced by CoW by translating the guest-provided trace buffer pointer (GVA) via a page-table walk (using CR3) before decoding the FlatBuffer trace batch. It also refines guest-side spans around initialization/dispatch to improve trace usability and reduce mismatched span warnings.
Changes:
- Translate trace batch pointers from GVA→GPA on the host (CR3-driven page table walk) before decoding trace FlatBuffers.
- Update the guest↔host tracing ABI to pass trace batch metadata via r12/r13/r14 and adjust host handling accordingly.
- Improve guest tracing spans around
generic_initandinternal_dispatch_functionand enable Debug printing ofFunctionCall.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/hyperlight_host/src/sandbox/trace/context.rs | Switch trace batch extraction to read via GVA translation and new from_regs API. |
| src/hyperlight_host/src/sandbox/snapshot.rs | Expose helpers for resolving GPAs and reading page tables from shared/scratch memory. |
| src/hyperlight_host/src/mem/mgr.rs | Add read_guest_memory_by_gva that walks page tables and reads from shared/scratch backing. |
| src/hyperlight_host/src/hypervisor/hyperlight_vm.rs | Pass CR3 into trace handling and adjust when trace handling is attempted. |
| src/hyperlight_guest_tracing/src/state.rs | Send trace batch pointer/len via r12/r13/r14 in the OUT instruction. |
| src/hyperlight_guest_bin/src/lib.rs | Add a generic_init span and flush/close ordering to avoid host span warnings. |
| src/hyperlight_guest_bin/src/guest_function/call.rs | Add an internal_dispatch_function span and trace log the FunctionCall in debug builds. |
| src/hyperlight_guest/src/exit.rs | Update OUT path to attach trace batch metadata via r12/r13/r14. |
| src/hyperlight_common/src/vmem.rs | Update comment noting virt_to_phys is now used for tracing reads. |
| src/hyperlight_common/src/flatbuffer_wrappers/function_call.rs | Derive Debug for FunctionCall to support tracing output. |
| match self.vm.sregs().map(|s| s.cr3) { | ||
| Ok(cr3) => { | ||
| tc.handle_trace(®s, mem_mgr, cr3).unwrap_or_else(|e| { | ||
| tracing::error!("Cannot handle trace data: {}", e); |
There was a problem hiding this comment.
This path logs Cannot handle trace data at error level for any tc.handle_trace failure. Since handle_trace returns an error when the trace magic isn’t present, this can become noisy on VM exits that don’t carry trace batches. Consider either gating the call on regs.r12 == OutBAction::TraceBatch as u64 or downgrading the expected "no trace batch" case to debug/trace logging.
| tracing::error!("Cannot handle trace data: {}", e); | |
| tracing::debug!("Cannot handle trace data: {}", e); |
| /// Read guest memory at a Guest Virtual Address (GVA) by walking the | ||
| /// page tables to translate GVA → GPA, then reading from the correct | ||
| /// backing memory (shared_mem or scratch_mem). | ||
| /// | ||
| /// This is necessary because with Copy-on-Write (CoW) the guest's | ||
| /// virtual pages are be backed by physical pages in the scratch | ||
| /// region rather than being identity-mapped. | ||
| /// | ||
| /// # Arguments | ||
| /// * `gva` - The Guest Virtual Address to read from | ||
| /// * `len` - The number of bytes to read | ||
| /// * `root_pt` - The root page table physical address (CR3) | ||
| #[cfg(feature = "trace_guest")] | ||
| pub(crate) fn read_guest_memory_by_gva( | ||
| &mut self, | ||
| gva: u64, | ||
| len: usize, | ||
| root_pt: u64, | ||
| ) -> Result<Vec<u8>> { | ||
| use hyperlight_common::vmem::PAGE_SIZE; | ||
|
|
||
| use crate::sandbox::snapshot::{SharedMemoryPageTableBuffer, access_gpa}; | ||
|
|
||
| let scratch_size = self.scratch_mem.mem_size(); | ||
|
|
||
| self.shared_mem.with_exclusivity(|snap| { | ||
| self.scratch_mem.with_exclusivity(|scratch| { | ||
| let pt_buf = SharedMemoryPageTableBuffer::new(snap, scratch, scratch_size, root_pt); | ||
|
|
||
| // Walk page tables to get all mappings that cover the GVA range | ||
| let mappings: Vec<_> = unsafe { | ||
| hyperlight_common::vmem::virt_to_phys(&pt_buf, gva, len as u64) | ||
| } | ||
| .collect(); | ||
|
|
||
| if mappings.is_empty() { | ||
| return Err(new_error!( | ||
| "No page table mappings found for GVA {:#x} (len {})", | ||
| gva, | ||
| len, | ||
| )); | ||
| } | ||
|
|
||
| let mut result = Vec::with_capacity(len); | ||
|
|
||
| for mapping in &mappings { | ||
| // Calculate the offset within this page for the first byte we want | ||
| let page_offset = if mapping.virt_base <= gva { | ||
| (gva - mapping.virt_base) as usize | ||
| } else { | ||
| 0 | ||
| }; | ||
|
|
||
| let bytes_remaining = len - result.len(); | ||
| let available_in_page = PAGE_SIZE - page_offset; | ||
| let bytes_to_copy = bytes_remaining.min(available_in_page); | ||
|
|
||
| // Translate the GPA to host memory | ||
| let gpa = mapping.phys_base + page_offset as u64; | ||
| let (mem, offset) = access_gpa(snap, scratch, scratch_size, gpa) | ||
| .ok_or_else(|| { | ||
| new_error!( | ||
| "Failed to resolve GPA {:#x} to host memory (GVA {:#x})", | ||
| gpa, | ||
| gva | ||
| ) | ||
| })?; | ||
|
|
||
| let slice = mem | ||
| .as_slice() | ||
| .get(offset..offset + bytes_to_copy) | ||
| .ok_or_else(|| { | ||
| new_error!( | ||
| "GPA {:#x} resolved to out-of-bounds host offset {} (need {} bytes)", | ||
| gpa, | ||
| offset, | ||
| bytes_to_copy | ||
| ) | ||
| })?; | ||
|
|
||
| result.extend_from_slice(slice); | ||
| } | ||
|
|
||
| if result.len() != len { | ||
| return Err(new_error!( | ||
| "Could not read full GVA range: got {} of {} bytes", | ||
| result.len(), | ||
| len | ||
| )); | ||
| } | ||
|
|
||
| Ok(result) | ||
| }) |
There was a problem hiding this comment.
New page-table-walk-based trace reading logic is introduced here, but there’s no unit test coverage validating cross-page reads, unmapped gaps, and scratch-vs-snapshot backing resolution. Since this file already has paging-related tests, consider adding tests (under appropriate feature gating) that exercise read_guest_memory_by_gva for both identity-mapped and CoW/scratch-backed pages.
| /// in register r9. With Copy-on-Write enabled, this GVA may not be | ||
| /// identity-mapped to its physical address, so we walk the guest page | ||
| /// tables to translate GVA → GPA before reading the data. | ||
| /// | ||
| /// # Arguments | ||
| /// * `regs` - The guest registers (r8 = magic, r9 = GVA pointer, r10 = length) |
There was a problem hiding this comment.
The rustdoc for EventsBatch::from_regs still refers to r8/r9/r10 and says the GVA pointer is in r9, but the implementation now reads magic/pointer/len from r12/r13/r14. Please update the doc comment (including the # Arguments bullet) to match the new register convention so callers don’t mis-wire the ABI again later.
| /// in register r9. With Copy-on-Write enabled, this GVA may not be | |
| /// identity-mapped to its physical address, so we walk the guest page | |
| /// tables to translate GVA → GPA before reading the data. | |
| /// | |
| /// # Arguments | |
| /// * `regs` - The guest registers (r8 = magic, r9 = GVA pointer, r10 = length) | |
| /// in register r13. With Copy-on-Write enabled, this GVA may not be | |
| /// identity-mapped to its physical address, so we walk the guest page | |
| /// tables to translate GVA → GPA before reading the data. | |
| /// | |
| /// # Arguments | |
| /// * `regs` - The guest registers (r12 = magic, r13 = GVA pointer, r14 = length) |
| pub(crate) fn read_guest_memory_by_gva( | ||
| &mut self, | ||
| gva: u64, | ||
| len: usize, | ||
| root_pt: u64, | ||
| ) -> Result<Vec<u8>> { | ||
| use hyperlight_common::vmem::PAGE_SIZE; | ||
|
|
||
| use crate::sandbox::snapshot::{SharedMemoryPageTableBuffer, access_gpa}; | ||
|
|
||
| let scratch_size = self.scratch_mem.mem_size(); | ||
|
|
||
| self.shared_mem.with_exclusivity(|snap| { | ||
| self.scratch_mem.with_exclusivity(|scratch| { | ||
| let pt_buf = SharedMemoryPageTableBuffer::new(snap, scratch, scratch_size, root_pt); | ||
|
|
||
| // Walk page tables to get all mappings that cover the GVA range | ||
| let mappings: Vec<_> = unsafe { | ||
| hyperlight_common::vmem::virt_to_phys(&pt_buf, gva, len as u64) | ||
| } | ||
| .collect(); | ||
|
|
||
| if mappings.is_empty() { | ||
| return Err(new_error!( | ||
| "No page table mappings found for GVA {:#x} (len {})", | ||
| gva, | ||
| len, | ||
| )); | ||
| } | ||
|
|
||
| let mut result = Vec::with_capacity(len); | ||
|
|
There was a problem hiding this comment.
read_guest_memory_by_gva allocates Vec::with_capacity(len) using a length provided by the guest (r14). A malicious/buggy guest can request an arbitrarily large len and cause host OOM. Consider enforcing a strict upper bound (e.g., a max trace batch size) before allocating, and return an error if exceeded.
| // to correct execution of the guest | ||
| match self.vm.sregs().map(|s| s.cr3) { | ||
| Ok(cr3) => { | ||
| tc.handle_trace(®s, mem_mgr, cr3).unwrap_or_else(|e| { |
There was a problem hiding this comment.
virt_to_phys expects root_pt to be a page-aligned PML4 base, but the value passed here comes directly from sregs.cr3 and may include low flag/PCID bits. The guest-side root_table() masks with & !0xfff; the host should do the same (or use a shared helper) before walking page tables, otherwise translation can read the wrong entries.
| tc.handle_trace(®s, mem_mgr, cr3).unwrap_or_else(|e| { | |
| let root_pt = cr3 & !0xfff_u64; | |
| tc.handle_trace(®s, mem_mgr, root_pt).unwrap_or_else(|e| { |
| // This span captures the internal dispatch function only, without tracing internals. | ||
| // Close the span before flushing to ensure that the `flush` call is not included in the span | ||
| // NOTE: This is necessary to avoid closing the span twice. Flush closes all the open | ||
| // spans, when preparing to close a guest function call context. | ||
| // It is not mandatory, though, but avoids a warning on the host that alerts a spans | ||
| // that has not been opened but is being closed. | ||
| _entered.exit(); | ||
|
|
||
| // Ensure that any tracing output during the call is flushed to | ||
| // the host, if necessary. | ||
| hyperlight_guest_tracing::flush(); | ||
| } |
There was a problem hiding this comment.
Could this be factored out to not be duplicated here and in lib.rs?
There was a problem hiding this comment.
not really, we need flush to be right before the call to hlt so that no other spans/logs are added.
| #[cfg(all(feature = "trace_guest", target_arch = "x86_64"))] | ||
| hyperlight_guest_tracing::flush(); | ||
| { | ||
| // This span captures the internal dispatch function only, without tracing internals. |
There was a problem hiding this comment.
I think this comment shoud say generic_init as opposed to internal dispatch function maybe?
| match result.as_ref() { | ||
| // On the Halt exit we don't receive any trace information | ||
| Ok(VmExit::Halt()) => {} | ||
| _ => { |
There was a problem hiding this comment.
nit: could be simplified slightly to something like
if !matches!(result.as_ref(), Ok(VmExit::Halt())) {
let regs = self.vm.regs().map_err(RunVmError::GetRegs)?;
if let Ok(cr3) = self.vm.sregs().map(|s| s.cr3) {
tc.handle_trace(®s, mem_mgr, cr3).unwrap_or_else(|e| {
tracing::error!("Cannot handle trace data: {}", e);
});
} else {
tracing::error!("Failed to get sregs for trace handling");
}
}| /// tables to translate GVA → GPA before reading the data. | ||
| /// | ||
| /// # Arguments | ||
| /// * `regs` - The guest registers (r8 = magic, r9 = GVA pointer, r10 = length) |
There was a problem hiding this comment.
I think this comment is using outdated regsiters
| /// backing memory (shared_mem or scratch_mem). | ||
| /// | ||
| /// This is necessary because with Copy-on-Write (CoW) the guest's | ||
| /// virtual pages are be backed by physical pages in the scratch |
| )); | ||
| } | ||
|
|
||
| let mut result = Vec::with_capacity(len); |
There was a problem hiding this comment.
Is this length controlled by the guest? coming from context.rs:
let trace_data_len = regs.r14 as usize;
let buf = mem_mgr.read_guest_memory_by_gva(trace_data_gva, trace_data_len, root_pt)?;
There was a problem hiding this comment.
very good point, we need to have some max size we allow for sure
There was a problem hiding this comment.
yeah, I think I'll limit it at 4K for now, but in the future we'd be able to shrink it because we would use the virtio queues and we won't care about this. We could even not batch them at all, the queue would do that for us
| Ok(VmExit::Halt()) => {} | ||
| _ => { | ||
| // Handle the guest trace data if any | ||
| let regs = self.vm.regs().map_err(RunVmError::GetRegs)?; |
There was a problem hiding this comment.
since we know the outb action ( in("r12") OutBAction::TraceBatch as u64), could we skip doing this for every instance with something like:
if regs.r12 == OutBAction::TraceBatch then
do all the tracing stuff here
|
|
||
| // Walk page tables to get all mappings that cover the GVA range | ||
| let mappings: Vec<_> = unsafe { | ||
| hyperlight_common::vmem::virt_to_phys(&pt_buf, gva, len as u64) |
There was a problem hiding this comment.
This lengt isn't necessarily aligned to a page I think?
This isn't in your code, but I think this might expose an issue where if this length goes past one page, it could truncate the result and not give you the right mappings. We are missing a unit test that covers that scenario in vmem.
You could do len as u64 + page_offset but we should probably fix it in vmem so that we don't accidently mis use this function again
There was a problem hiding this comment.
I'll add some tests for this
Description
This PR fixes #1242.
The guest tracing data received by the host assumes identity mapped addresses, the address needs to be translated to correctly decode the flatbuffers data.
This can be reviewed commit by commit.
The first commit tackles this issue, the other two improve on the spans and usability.