LDEV-6275 Fix dump on Lucee 6.2 + native getMetadata + exit cleanup#2
Open
LDEV-6275 Fix dump on Lucee 6.2 + native getMetadata + exit cleanup#2
Conversation
Adds DapClient.cfc wrappers for the dump/dumpAsJSON custom JSON requests and a DumpTest.cfc suite covering struct, array, JSON round-trip, and a server-survives-dump regression guard for the System.exit(1) in the dump catch blocks. Expected to fail on agent-mode Lucee 6.2 (JVM kill cascades through the rest of the run) and pass on jakarta Lucee 7.x — intentional red phase before the javax/jakarta servlet-API fix.
Swap TestBox toInclude/notToBe for CFML 'contains' operator on the dump content checks. Drop failure messages — TestBox auto-dumps the actual value when there's no message, which is more useful than a static string, so the systemOutput diagnostic lines become redundant.
Native mode returns JSON-serialized GetMetaData result; agent/JDWP mode returns the static 'not supported in JDWP mode' marker string. Guards both paths plus valid-JSON shape.
The 'Error:' substring check was brittle (false positives on metadata that legitimately contains the word) and 'content contains X' reduces to a boolean before the matcher, so TestBox can't auto-dump the offending content on failure. Keep: isJSON + agent-mode marker literal + server-survives. If GetMetaData throws for a plain struct on native, the fallback is still valid JSON - that's a separate discussion.
Tighten the native assertion: deserialized response must be a struct.
If GetMetaData throws or doGetMetadataWithPageContext falls back to a
JSON string ("Error: ...", "getMetadata failed", "No PageContext"),
this test will fail - which is what we want to investigate.
systemOutput the raw content each run since 'isJSON' and 'toBeTypeOf'
lose the value in the matcher chain.
GetMetaData lives in lucee.runtime.functions.system, which lucee.core does not self-import via OSGi - so cl.loadClass on the PageContext's bundle classloader fails with: Class 'lucee.runtime.functions.system.GetMetaData' was not found because bundle lucee.core does not import 'lucee.runtime.functions.system' loadBIF is the loader-level API built for this: it resolves BIF classes across bundle boundaries and returns a typed BIF reference whose invoke is a normal interface call, not reflection. Pattern matches extension-websocket and other sibling extensions. Applied to getMetadata (GetMetaData + SerializeJSON) and the dumpAsJSON branch of doDumpWithPageContext (SerializeJSON). ThreadLocalPageContext, DumpUtil, HTMLDumpWriter etc. stay on cl.loadClass - they're not BIFs, they're runtime utility classes and currently resolve fine.
The previous commit passed the fully-qualified class name to loadBIF. ClassUtilImpl.loadBIF tries loadClass first (same OSGi bundle issue) and only falls through to FunctionLib lookup when that returns null. The short name bypasses the classloader step entirely - FunctionLib is the right resolver for BIFs and doesn't care about bundle imports.
Four System.exit(1) sites in doDump and doDumpAsJSON would take the
whole Lucee process down on any Throwable from the dump worker thread
or from thread.join. A debugger feature that kills the host on failure
is indefensible.
Keep printStackTrace so the error is still visible, let the preset
result.value fallback strings ('...something went wrong when calling
writeDump(...)' / 'Something went wrong when calling serializeJSON(...)')
get returned to the DAP client. The containment fix on its own doesn't
make dump work on Lucee 6.2 - it just stops the Lucee 6.2 crash from
being catastrophic.
Drop the direct pc.getServletConfig() call from ephemeralPageContextFromOther - that method's return type flipped from javax to jakarta between Lucee 6 and 7, and the agent jar is compiled against jakarta, so calling it on 6.x throws NoSuchMethodError at link time. That was the underlying cause of the dump crash on Lucee 6.2 (the System.exit containment was only half the story). New path uses ClassUtil.callStaticMethod on ThreadUtil.createPageContext, which takes a ConfigWeb (servlet-API-agnostic on the caller side; Lucee plumbs the internal ServletConfig itself). Empty cookie array is built jakarta-first-with-javax-fallback via reflection. Same pattern as extension-websocket's WSUtil.createPageContext. One agent jar works on both javax-Lucee (6.2) and jakarta-Lucee (7.x).
Only two production callers, both in ExprEvaluator's Lucee5/6 detection fallbacks. If Renderer.tag signature detection throws something other than NoSuchMethodException, killing the JVM was indefensible - just fall through to Optional.empty() with a context-ful log so the caller tries the other evaluator. Delete Utils.java; no production caller of Utils.unreachable remains.
All 8 System.exit(1) sites in LuceeTransformer returned on catching a Throwable from classfile rewrite. Any one bad class - a weird inner class, an unexpected ASM opcode, anything - would take the whole Lucee host down. One class failing to instrument should degrade debugging for that one class, not kill the process. Each catch now logs a context-rich line (which class, what the consequence is) and returns the ORIGINAL classfileBuffer, so the class loads normally. Breakpoints in that specific class won't fire, but Lucee keeps running. The 'Got class X before PageContextImpl' branch now logs-and-continues too rather than fatal.
Cleanup per remove-runtime-exits.md - 20+ runtime exit sites in DebugManager, LuceeVm, and KlassMap replaced with context-ful stderr logs that name the operation, the inputs involved, and the consequence. Covers: - step / pop-frame / bad-step-type handlers (user-triggered debug actions) - JDWP event pump, thread tracking, class-ref tracking - step in/over/out suspend-count assertions - KlassMap build failures *OrFail thread-lookup helpers now throw RuntimeException with context instead of calling exit - the caller's catch decides how to proceed. Startup/init sites (classloader sanity, JDWP connector lookup, premain, DAP socket bind, instrumentation helper method lookup) are kept as System.exit - if they fire the debugger is fundamentally broken and failing loud is correct.
CI was intermittently hanging on 'Warmup debuggee (Lucee Express)' for up to 6 hours (GitHub's job cap). Root cause: LUCEE_ENABLE_WARMUP=true tells Lucee to compile bundles and exit, but our agent spawned three non-daemon threads that stayed parked forever: - DAP server thread (DebugManager.spawnWorker) - blocked on ServerSocket.accept() - JDWP worker (LuceeVm.JdwpWorker.spawnThreadForJdwpToSuspend) - suspended in method - JDWP event pump (LuceeVm.initEventPump) - blocked on vm_.eventQueue().remove() Any one of them is enough to keep the JVM alive. When warmup 'worked' it was a race where JDWP self-connect failed fast and the DAP thread exited via its catch before ServerSocket.accept was reached. Native mode (ExtensionActivator) already sets dapThread.setDaemon(true). Match that pattern in agent mode. During a real debug session, Tomcat's own non-daemon threads keep the JVM alive - daemon status on our background workers is invisible there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three stacked bugs discovered via a user report of Dump crashing
Lucee 6.2 in the VS Code CFML debugger.
LDEV: LDEV-6275
Bug 1 — Lucee 6.2 agent: dump kills the JVM
Agent jar is built against jakarta-servlet; 6.2 is still javax.
pc.getServletConfig()in the dump worker throwsNoSuchMethodError,caught as
Throwable, thenSystem.exit(1)takes the JVM with it.Fix:
ephemeralPageContextFromOtherno longer callsgetServletConfig().It uses
ClassUtil.callStaticMethodonThreadUtil.createPageContext(the pattern already in use by extension-websocket / dk).
ConfigWebis servlet-API-agnostic — one agent jar works on both javax (6.2) and
jakarta (7.x) runtimes.
Bug 2 — Lucee 7.1 native: Inspect Metadata fails
NativeLuceeVmwas loadinglucee.runtime.functions.system.GetMetaDatavia
cl.loadClass, butlucee.coredoesn't self-importlucee.runtime.functions.systemover OSGi.Fix: BIF calls now resolve via
ClassUtil.loadBIF(pc, shortName)— theFunctionLib fallback bypasses bundle-import issues entirely. Same change
for
SerializeJSONused by bothgetMetadataand the nativedumpAsJSON.Bug 3 —
System.exit(1)sprinkled through runtime pathsThe dump crash was the most visible symptom of a wider pattern. 41 exits
across the agent code; many in runtime paths (steps, stack pop, bytecode
rewrite, JDWP event pump, BIF-evaluator detection). Any Throwable → dead
JVM.
Fix: ~30 runtime sites now log + continue with enough context to be
actionable (op name, thread/class ids, consequence). The 11 startup/init
sites (classloader sanity, premain, socket bind, instrumentation helper
method lookup) stay as
System.exit— if those fail the debugger isfundamentally broken and failing loud is correct.
Test coverage added
DumpTest.cfc—dump/dumpAsJSONstruct + array + JSON round-tripMetadataTest.cfc— nativegetMetadatareturns a struct; agent modereturns the static "not supported in JDWP mode" marker
DapClient.cfc—dump/dumpAsJSON/getMetadatawrappersTest plan
debuggee, confirm content comes back and server stays up
native debuggee, confirm JSON metadata comes back