ADFA-3108: Improve CV-to-XML accuracy with fuzzy search and OCR refinement #1047
ADFA-3108: Improve CV-to-XML accuracy with fuzzy search and OCR refinement #1047Daniel-ADFA merged 19 commits intostagefrom
Conversation
…tionary to handel OCR mistakes
📝 WalkthroughRelease Notes: CV-to-XML Accuracy Improvements (ADFA-3108)Key Features
API Changes
Risks & Best Practices Concerns
Dependencies
Files Modified: 13
WalkthroughThis PR refactors the computer vision OCR pipeline from simple full-image processing to region-aware analysis. It introduces RegionOcrProcessor to orchestrate parallel OCR streams: widget text extraction, margin annotation parsing, and full-image analysis. Supporting components include FuzzyAttributeParser for XML attribute parsing, enhanced bitmap preprocessing with adaptive thresholding, and improved margin annotation detection with spatial clustering. Repository and DI wiring are updated to integrate the new pipeline components. Changes
Sequence DiagramsequenceDiagram
participant ViewModel
participant RegionOcrProcessor
participant OcrSource
participant Bitmap as Image Processing
ViewModel->>RegionOcrProcessor: process(bitmap, yoloDetections, leftGuidePct, rightGuidePct)
par Widget OCR Path
RegionOcrProcessor->>Bitmap: crop widget regions with padding
Bitmap-->>RegionOcrProcessor: cropped bitmaps
RegionOcrProcessor->>Bitmap: preprocessForOcr
Bitmap-->>RegionOcrProcessor: preprocessed bitmap
RegionOcrProcessor->>OcrSource: recognizeText on widgets
OcrSource-->>RegionOcrProcessor: widget text blocks
RegionOcrProcessor->>RegionOcrProcessor: enrich YOLO detections with text
and Margin OCR Path
RegionOcrProcessor->>Bitmap: crop left/right margin strips
Bitmap-->>RegionOcrProcessor: margin bitmaps
RegionOcrProcessor->>Bitmap: preprocessForOcr
Bitmap-->>RegionOcrProcessor: preprocessed bitmap
RegionOcrProcessor->>OcrSource: recognizeText on margins
OcrSource-->>RegionOcrProcessor: margin text blocks
RegionOcrProcessor->>RegionOcrProcessor: convert to DetectionResults
and Full-Image OCR Path
RegionOcrProcessor->>OcrSource: recognizeText on full bitmap
OcrSource-->>RegionOcrProcessor: full-image text blocks
end
RegionOcrProcessor-->>ViewModel: RegionOcrResult(enriched, remaining, margin, fullImage)
ViewModel->>ViewModel: mergeDetections + region filtering
ViewModel->>ViewModel: aggregate canvas + margin detections
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt (1)
71-79:⚠️ Potential issue | 🟠 MajorAvoid logging raw OCR content in production paths.
These debug logs include annotation/canvas text directly, which can leak user content into log storage.
🛡️ Suggested hardening
-Log.d(TAG, "Processed Margin Annotations: {$finalAnnotationLog}") +if (Log.isLoggable(TAG, Log.DEBUG)) { + Log.d(TAG, "Processed Margin Annotations count=${annotationMap.size}") +} @@ -Log.d(TAG, "Parsed Canvas Content (Corrected): $canvasLogOutput") +if (Log.isLoggable(TAG, Log.DEBUG)) { + Log.d(TAG, "Parsed Canvas Content count=${correctedCanvasDetections.size}") +} @@ -Log.d(TAG, "Block $i: tag=${result?.first ?: "none"}, ${block.size} lines, text='${annotationText.take(40)}'") +if (Log.isLoggable(TAG, Log.DEBUG)) { + Log.d(TAG, "Block $i: tag=${result?.first ?: "none"}, lines=${block.size}") +}Also applies to: 102-111, 128-130, 147-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt` around lines 71 - 79, The logs in MarginAnnotationParser that build finalAnnotationLog and canvasLogOutput (and similar logs around lines with correctedCanvasDetections, annotationMap, and parsed canvas content) currently include raw OCR text and must be redacted before logging; update the code to avoid printing user content by replacing raw text with a sanitized placeholder or hashed token (e.g., hash or "<redacted>") and only log non-sensitive metadata (coordinates, sizes, keys) or conditionally log full content under a debug-only flag (e.g., BuildConfig.DEBUG). Ensure all occurrences (finalAnnotationLog, canvasLogOutput and the other listed log sites) follow the same redaction/conditional pattern so no raw OCR strings are written to production logs.
🧹 Nitpick comments (3)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt (2)
119-134: Border pixels are unfiltered.The median filter skips the 1-pixel border (iterating
y in 1 until height - 1). Border pixels retain their original binary values from the adaptive threshold step. This is typically acceptable for OCR since text content rarely touches image edges, but worth noting if edge artifacts become problematic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt` around lines 119 - 134, The medianFilter currently skips the 1-pixel border (function medianFilter using pixels, copy and window) so border pixels retain unfiltered values; change the loop to cover x in 0 until width and y in 0 until height and when building the 3x3 window handle out-of-bounds neighbors by clamping coordinates (e.g., nx = max(0, min(width-1, x+dx)), ny = max(0, min(height-1, y+dy))) or by filling missing entries with the center pixel before sorting, then sort window and assign pixels[y*width + x] = window[4]; this preserves current logic for interior pixels while applying a consistent median to edges.
11-11: Consider validatingblockSizeparameter.The Gaussian blur algorithm requires
blockSizeto be a positive odd number for symmetric kernel behavior. While the default value (31) is correct and callers currently use defaults, invalid values (0, negative, or even) could cause unexpected behavior or incorrect results.🛡️ Suggested validation
fun preprocessForOcr(bitmap: Bitmap, blockSize: Int = 31, c: Int = 15): Bitmap { + require(blockSize > 0 && blockSize % 2 == 1) { "blockSize must be a positive odd number" } val width = bitmap.width🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt` at line 11, In preprocessForOcr ensure the blockSize parameter is validated at the start: check that blockSize is > 0 and odd, and if not throw an IllegalArgumentException with a clear message referencing the parameter (e.g., "blockSize must be a positive odd number"); update the method signature of preprocessForOcr to include this guard so callers cannot pass 0, negative, or even values that would break the Gaussian blur kernel.cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt (1)
212-215: Track detection completion analytics on OCR/merge failure paths.These exits call error handling but skip
trackDetectionCompleted(success = false, ...), which makes failure metrics incomplete outside YOLO failures.📉 Suggested patch
+ fun trackFailureAndHandle(exception: Throwable?) { + CvAnalyticsUtil.trackDetectionCompleted( + success = false, + detectionCount = 0, + durationMs = System.currentTimeMillis() - startTime + ) + handleDetectionError(exception) + } + val regionOcrResult = repository.runRegionOcr( bitmap, yoloDetections, state.leftGuidePct, state.rightGuidePct ) if (regionOcrResult.isFailure) { - handleDetectionError(regionOcrResult.exceptionOrNull()) + trackFailureAndHandle(regionOcrResult.exceptionOrNull()) return@launch } val ocrResult = regionOcrResult.getOrThrow() @@ - .onFailure { handleDetectionError(it) } + .onFailure { trackFailureAndHandle(it) }Also applies to: 255-255
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt` around lines 212 - 215, When an OCR/merge result fails you currently call handleDetectionError(...) and return but never record the failure; before returning from the region OCR failure branch (where regionOcrResult.isFailure is checked) and the merge failure branch (where mergeOcrResult.isFailure is checked) call trackDetectionCompleted(success = false, error = regionOcrResult.exceptionOrNull() / mergeOcrResult.exceptionOrNull()) (matching the same metadata parameters used in the success path) and then proceed to handleDetectionError(...)/return so failure metrics are tracked consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 14-31: The extraction fails on lowercase OCR output; update
TAG_REGEX and TAG_EXTRACT_REGEX to be case-insensitive (e.g., use Regex(...,
RegexOption.IGNORE_CASE)) or alternatively uppercase the trimmed input before
matching in extractTag, then normalize the matched prefix and digits to
uppercase (handle the "8" -> "B" case after uppercasing) and build the tag in
the canonical uppercase form before calling isTag; also ensure isTag uses the
same case-insensitive check or receives the normalized tag.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.kt`:
- Around line 65-77: The async block that crops and preprocesses bitmaps (using
BitmapUtils.cropRegion and BitmapUtils.preprocessForOcr) must ensure
deterministic recycling on all paths; wrap the body in try/finally, declare crop
and preprocessed variables before the try, assign them inside try, and in
finally call recycle() safely (check for null and for crop !== bitmap) so both
normal, exception, and cancellation paths free native memory; apply the same
try/finally pattern to the other similar block around
ocrSource.recognizeText/component.copy mentioned at lines 106-109.
- Around line 87-96: Validate and clamp the guide percentage inputs before
computing rects: ensure leftGuidePct and rightGuidePct are constrained to [0.0,
1.0] and that leftGuidePct <= rightGuidePct (or swap them) to avoid
negative/invalid regions; then compute width/height and build leftRect/rightRect
from the sanitized values and call ocrCroppedRegion(bitmap, leftRect, 0f) and
ocrCroppedRegion(bitmap, rightRect, rightOffsetX) using the corrected
rightOffsetX based on the clamped rightGuidePct. This change should be applied
in the RegionOcrProcessor logic surrounding the variables leftGuidePct,
rightGuidePct and the calls to ocrCroppedRegion to prevent malformed RectF
creation.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.kt`:
- Around line 129-133: The fallback lookup is using raw tag text and can miss
matches; change both places where you call getTagType(tagBox.text) and
annotations[tagBox.text] to use normalizeTagText(tagBox.text) (same as
normalizedText) so lookups and getTagType use the normalized key; update the
identical fallback block later (the second occurrence around the alternate
assignment) to normalize before calling getTagType and indexing into annotations
(referencing deduplicatedTags, tagBox, normalizeTagText, getTagType, and
annotations).
- Line 329: Remove the full XML debug print in the appendSimpleView path: stop
logging the contents of the xml variable via Log.d(TAG, "appendSimpleView:
$xml"); either delete that Log.d call or replace it with a non-sensitive,
minimal message (e.g., an opaque event or size/count only) inside the
appendSimpleView function so OCR-derived text is not emitted to Logcat.
---
Outside diff comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 71-79: The logs in MarginAnnotationParser that build
finalAnnotationLog and canvasLogOutput (and similar logs around lines with
correctedCanvasDetections, annotationMap, and parsed canvas content) currently
include raw OCR text and must be redacted before logging; update the code to
avoid printing user content by replacing raw text with a sanitized placeholder
or hashed token (e.g., hash or "<redacted>") and only log non-sensitive metadata
(coordinates, sizes, keys) or conditionally log full content under a debug-only
flag (e.g., BuildConfig.DEBUG). Ensure all occurrences (finalAnnotationLog,
canvasLogOutput and the other listed log sites) follow the same
redaction/conditional pattern so no raw OCR strings are written to production
logs.
---
Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt`:
- Around line 212-215: When an OCR/merge result fails you currently call
handleDetectionError(...) and return but never record the failure; before
returning from the region OCR failure branch (where regionOcrResult.isFailure is
checked) and the merge failure branch (where mergeOcrResult.isFailure is
checked) call trackDetectionCompleted(success = false, error =
regionOcrResult.exceptionOrNull() / mergeOcrResult.exceptionOrNull()) (matching
the same metadata parameters used in the success path) and then proceed to
handleDetectionError(...)/return so failure metrics are tracked consistently.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt`:
- Around line 119-134: The medianFilter currently skips the 1-pixel border
(function medianFilter using pixels, copy and window) so border pixels retain
unfiltered values; change the loop to cover x in 0 until width and y in 0 until
height and when building the 3x3 window handle out-of-bounds neighbors by
clamping coordinates (e.g., nx = max(0, min(width-1, x+dx)), ny = max(0,
min(height-1, y+dy))) or by filling missing entries with the center pixel before
sorting, then sort window and assign pixels[y*width + x] = window[4]; this
preserves current logic for interior pixels while applying a consistent median
to edges.
- Line 11: In preprocessForOcr ensure the blockSize parameter is validated at
the start: check that blockSize is > 0 and odd, and if not throw an
IllegalArgumentException with a clear message referencing the parameter (e.g.,
"blockSize must be a positive odd number"); update the method signature of
preprocessForOcr to include this guard so callers cannot pass 0, negative, or
even values that would break the Gaussian blur kernel.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 96ee70e6-296f-4961-8223-94872394c4f4
📒 Files selected for processing (13)
cv-image-to-xml/build.gradle.ktscv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepository.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepositoryImpl.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/source/OcrSource.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/di/ComputerVisionModule.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionMerger.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.ktcv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParserTest.kt
...l/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
Show resolved
Hide resolved
...o-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.kt
Outdated
Show resolved
Hide resolved
...o-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.kt
Show resolved
Hide resolved
...o-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.kt
Show resolved
Hide resolved
...o-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.kt
Show resolved
Hide resolved
…n YoloToXmlConverter
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt (1)
71-78:⚠️ Potential issue | 🟠 MajorDon't ship raw OCR content to Logcat.
These
Log.dcalls dump recognized annotations and canvas text verbatim. Screenshots can contain user/customer content, so this should be removed, redacted, or at least gated to debug builds before release.Also applies to: 102-110, 128-130, 147-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt` around lines 71 - 78, The Log.d calls in MarginAnnotationParser (specifically the computed finalAnnotationLog and canvasLogOutput that print raw OCR text from correctedCanvasDetections and annotationMap) must not output sensitive user content; change them to either remove the detailed strings or gate them behind a debug-only check (e.g., BuildConfig.DEBUG) or redact the text before logging (e.g., replace characters with a mask or log only lengths/counts and bounding boxes). Update the logging in the blocks that build finalAnnotationLog and canvasLogOutput and apply the same pattern to the other similar Log.d usages you flagged (lines 102-110, 128-130, 147-148) so no raw OCR content is written in production logs.
🧹 Nitpick comments (3)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt (3)
7-13: Consider documenting fuzzy threshold rationale.The threshold values (50-65) are quite low and may occasionally produce false positives. This is likely intentional for OCR error tolerance, but a brief comment explaining the trade-off would help maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt` around lines 7 - 13, Add a brief comment above FUZZY_VALUE_THRESHOLD and fuzzyKeyThreshold explaining why the numeric thresholds (50,55,60,65) were chosen—note that they are intentionally low to tolerate OCR errors, the trade-off of increased false positives, and guidance on when to raise/lower values; also mention that these values are configurable/tunable (or suggest externalizing to config) so future maintainers understand rationale and how to adjust behavior of FUZZY_VALUE_THRESHOLD and fuzzyKeyThreshold().
299-319: Loop structure is intentional but confusing.The static analysis warning about "unconditional jump" is a false positive. The
breakstatements are correct: line 313 breaks from the innermostforloop, and line 315 breaks from the middleforloop whenfoundis true. However, the triple-nested loop with multiple break conditions is hard to follow.Consider extracting the inner search logic into a helper function that returns a match result, which would clarify intent and eliminate the
foundflag pattern.♻️ Optional refactor to improve readability
+ private data class TrailingMatch(val keyStart: Int, val attr: String, val value: String) + + private fun findTrailingAttribute(words: List<String>, tag: String): TrailingMatch? { + for (keyStart in words.size - 2 downTo 1) { + for (keyLen in minOf(3, words.size - keyStart - 1) downTo 1) { + val candidateKey = words.subList(keyStart, keyStart + keyLen).joinToString("_") + val matched = fuzzyMatchKey(candidateKey) ?: continue + val trailingValue = words.subList(keyStart + keyLen, words.size).joinToString(" ") + val cleanedValue = cleanValue(trailingValue, matched) + if (cleanedValue.isEmpty()) continue + val (attr, finalValue) = resolveXmlAttribute(matched, cleanedValue, tag) + return TrailingMatch(keyStart, attr, finalValue) + } + } + return null + } + private fun extractTrailingAttributes(value: String, tag: String): Pair<String, Map<String, String>> { val attrs = mutableMapOf<String, String>() var remaining = value while (true) { val words = remaining.split(Regex("\\s+")) if (words.size < 2) break - var found = false - for (keyStart in words.size - 2 downTo 1) { - for (keyLen in minOf(3, words.size - keyStart - 1) downTo 1) { - val candidateKey = words.subList(keyStart, keyStart + keyLen).joinToString("_") - val matched = fuzzyMatchKey(candidateKey) ?: continue - - val trailingValue = words.subList(keyStart + keyLen, words.size).joinToString(" ") - val cleanedValue = cleanValue(trailingValue, matched) - if (cleanedValue.isEmpty()) continue - - val (attr, finalValue) = resolveXmlAttribute(matched, cleanedValue, tag) - attrs[attr] = finalValue - remaining = words.subList(0, keyStart).joinToString(" ") - found = true - break - } - if (found) break - } - - if (!found) break + val match = findTrailingAttribute(words, tag) ?: break + attrs[match.attr] = match.value + remaining = words.subList(0, match.keyStart).joinToString(" ") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt` around lines 299 - 319, The triple-nested search loop in FuzzyAttributeParser.kt is confusing and uses the found flag to control breaks; refactor the inner search (the logic that iterates keyStart and keyLen, calls fuzzyMatchKey(candidateKey), computes trailingValue/cleanedValue, and calls resolveXmlAttribute) into a helper function (e.g., findAttributeMatch(words: List<String>, tag: String): Pair<String, String>? or a small data class) that returns the matched attribute and final value (or null) so the outer loop can simply call this helper and, if non-null, set attrs[attr]=finalValue and remaining accordingly; this removes the found flag and nested breaks and keeps usage of fuzzyMatchKey, cleanValue, resolveXmlAttribute, attrs and remaining intact.
364-376: Consider increasing fuzzy match thresholds or adding validation gates for low-confidence attribute matches.Current thresholds (50–65) fall well below industry standard for high-confidence fuzzy matching (≥90). While
fuzzyMatchKey()returns results based on these thresholds, those results flow directly throughparseAttribute()into XML attributes without re-validation. Garbled OCR text like "layouT_wldth" could plausibly match "layout_width" at a score just above the 50–65 range, silently producing incorrect attributes in output.Either raise thresholds toward 80–90 for auto-acceptance, or add validation/logging for matches below 80 to catch potential errors during debugging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt` around lines 364 - 376, fuzzyMatchKey currently accepts low-confidence matches (via FuzzySearch.extractOne and fuzzyKeyThreshold), allowing potentially incorrect attributes to pass into parseAttribute; update fuzzyMatchKey (and its use in parseAttribute) to require a higher acceptance threshold (e.g., >=80–90) or add a secondary validation gate: after FuzzySearch.extractOne(normalizeOcrKey(rawKey)...) check if result.score >= 80 (or fuzzyKeyThreshold but clamped to min 80) before returning AttributeKey.findByAlias(result.string), otherwise log a warning including rawKey, normalizedKey and result.score and return null (or mark for manual review) so low-confidence matches aren’t auto-accepted without visibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/di/ComputerVisionModule.kt`:
- Around line 17-19: The shared TextRecognizer in OcrSource is being used
concurrently by RegionOcrProcessor.runWidgetOcr() which launches parallel async
jobs (widgetOcr, marginOcr, fullImageOcr); wrap all calls that invoke
OcrSource.recognizeText()/TextRecognizer.process() behind a Mutex to serialize
access, or change OcrSource to provide a factory/getPerWorkerTextRecognizer()
and have RegionOcrProcessor obtain a dedicated TextRecognizer per worker to
allow safe parallelism—update OcrSource and RegionOcrProcessor.runWidgetOcr()
accordingly and ensure recognizeText() calls reference the chosen serialized or
per-worker instance.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionMerger.kt`:
- Around line 17-20: enrichedComponents already contain widget-level OCR but
orphanText later re-adds full-image Text.TextBlock entries, creating duplicate
detections; modify the logic in DetectionMerger (where usedTextBlocks,
enrichedComponents, orphanText, finalDetections, and remainingYoloDetections are
handled) so that when building orphanText you filter out any Text.TextBlock
instances present in usedTextBlocks or contained in enrichedComponents before
adding them to finalDetections, ensuring orphanText only includes truly
unclaimed text blocks and preventing duplicate "text" detections for the same
widget-level OCR.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt`:
- Around line 431-438: The cleanId function is stripping a trailing underscore
plus single lowercase letter via .replace(Regex("_[a-z]$"), "") which
inadvertently truncates valid IDs like "button_a"; update cleanId to stop
removing that pattern (or make it optional/opt-in) by removing or guarding the
Regex("_[a-z]$") replacement, or add a flag/comment and only apply that cleanup
when an explicit OCR-noise mode is enabled; keep references to the cleanId
function and the Regex("_[a-z]$") so the change is easy to locate.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 134-138: The pass-2 filter in MarginAnnotationParser.kt is too
strict and drops valid short annotations because of the check ".filter { (_,
parsed) -> parsed.annotationText.length >= 5 }"; change that predicate to allow
short but meaningful tokens (e.g. use "parsed.annotationText.trim().length >= 3"
or accept any non-blank string via "parsed.annotationText.isNotBlank()") so
values like "red", "8dp" or "gone" are retained for fuzzy parsing; update the
filter on the remainingBlocks construction that references parsedBlocks,
matchedBlockIndices, and parsed.annotationText accordingly.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt`:
- Around line 228-236: The current filter keeps all YOLO detections regardless
of ROI because it uses "detection.isYolo || ...", so change the predicate to
only include YOLO boxes that fall within the leftBound..rightBound while still
keeping non-YOLO detections; update the lambda on mergedDetections (where
canvasOnlyMerged is built) so it uses a condition like "not detection.isYolo OR
detection.boundingBox.centerX() in leftBound..rightBound" (referencing
detection.isYolo, detection.boundingBox.centerX(), leftBound, rightBound,
mergedDetections, canvasOnlyMerged, allDetections and
MarginAnnotationParser.parse) to prevent out-of-bounds YOLO false positives from
being passed into MarginAnnotationParser.parse().
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt`:
- Around line 30-38: cropRegion currently returns the original Bitmap when the
computed crop width/height are non-positive, causing callers like
RegionOcrProcessor.runWidgetOcr to OCR the whole image; change cropRegion
(function name: cropRegion) to return null on invalid crop (w <= 0 || h <= 0)
instead of the full bitmap, and update callers such as
RegionOcrProcessor.runWidgetOcr to handle the nullable result (e.g., skip or
return the component when cropRegion returns null), referencing the bounding box
parameter (component.boundingBox) and padding argument (componentPadding) so
invalid ROIs are skipped rather than processed as the full image.
---
Outside diff comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 71-78: The Log.d calls in MarginAnnotationParser (specifically the
computed finalAnnotationLog and canvasLogOutput that print raw OCR text from
correctedCanvasDetections and annotationMap) must not output sensitive user
content; change them to either remove the detailed strings or gate them behind a
debug-only check (e.g., BuildConfig.DEBUG) or redact the text before logging
(e.g., replace characters with a mask or log only lengths/counts and bounding
boxes). Update the logging in the blocks that build finalAnnotationLog and
canvasLogOutput and apply the same pattern to the other similar Log.d usages you
flagged (lines 102-110, 128-130, 147-148) so no raw OCR content is written in
production logs.
---
Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt`:
- Around line 7-13: Add a brief comment above FUZZY_VALUE_THRESHOLD and
fuzzyKeyThreshold explaining why the numeric thresholds (50,55,60,65) were
chosen—note that they are intentionally low to tolerate OCR errors, the
trade-off of increased false positives, and guidance on when to raise/lower
values; also mention that these values are configurable/tunable (or suggest
externalizing to config) so future maintainers understand rationale and how to
adjust behavior of FUZZY_VALUE_THRESHOLD and fuzzyKeyThreshold().
- Around line 299-319: The triple-nested search loop in FuzzyAttributeParser.kt
is confusing and uses the found flag to control breaks; refactor the inner
search (the logic that iterates keyStart and keyLen, calls
fuzzyMatchKey(candidateKey), computes trailingValue/cleanedValue, and calls
resolveXmlAttribute) into a helper function (e.g., findAttributeMatch(words:
List<String>, tag: String): Pair<String, String>? or a small data class) that
returns the matched attribute and final value (or null) so the outer loop can
simply call this helper and, if non-null, set attrs[attr]=finalValue and
remaining accordingly; this removes the found flag and nested breaks and keeps
usage of fuzzyMatchKey, cleanValue, resolveXmlAttribute, attrs and remaining
intact.
- Around line 364-376: fuzzyMatchKey currently accepts low-confidence matches
(via FuzzySearch.extractOne and fuzzyKeyThreshold), allowing potentially
incorrect attributes to pass into parseAttribute; update fuzzyMatchKey (and its
use in parseAttribute) to require a higher acceptance threshold (e.g., >=80–90)
or add a secondary validation gate: after
FuzzySearch.extractOne(normalizeOcrKey(rawKey)...) check if result.score >= 80
(or fuzzyKeyThreshold but clamped to min 80) before returning
AttributeKey.findByAlias(result.string), otherwise log a warning including
rawKey, normalizedKey and result.score and return null (or mark for manual
review) so low-confidence matches aren’t auto-accepted without visibility.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9bf00bfc-4d52-4789-bb0f-5c4bfda2b356
📒 Files selected for processing (13)
cv-image-to-xml/build.gradle.ktscv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepository.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepositoryImpl.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/source/OcrSource.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/di/ComputerVisionModule.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionMerger.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.ktcv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParserTest.kt
No description provided.