Skip to content

ADFA-3108: Improve CV-to-XML accuracy with fuzzy search and OCR refinement #1047

Merged
Daniel-ADFA merged 19 commits intostagefrom
ADFA-3108-cv-fuzzy-search-experimental
Mar 6, 2026
Merged

ADFA-3108: Improve CV-to-XML accuracy with fuzzy search and OCR refinement #1047
Daniel-ADFA merged 19 commits intostagefrom
ADFA-3108-cv-fuzzy-search-experimental

Conversation

@Daniel-ADFA
Copy link
Contributor

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Release Notes: CV-to-XML Accuracy Improvements (ADFA-3108)

Key Features

  • Fuzzy Search Integration: Added FuzzyAttributeParser with configurable fuzzy matching thresholds (50-65 based on key length) to handle OCR-derived attribute annotations with tolerance for character recognition errors
  • Region-Based OCR Processing: Introduced RegionOcrProcessor that orchestrates three parallel OCR paths:
    • Widget OCR for interactive UI components
    • Margin OCR for side regions defined by left/right guide percentages
    • Full-image OCR for comprehensive text coverage
  • Enhanced Margin Annotation Parsing: Refactored MarginAnnotationParser with spatial clustering and two-pass resolution to separate detections into left, right, and canvas regions
  • Improved OCR Preprocessing: Replaced simple thresholding with advanced adaptive preprocessing pipeline (grayscale conversion, normalization, Gaussian blur, adaptive thresholding, median filtering)
  • OCR Noise Reduction: Added normalization for common OCR digit mistakes (l/I/! → 1; o/O → 0) and robust tag extraction with regex-based parsing
  • Attribute Parsing Enhancements: Comprehensive support for dimension, color, drawable, ID, and numeric value types with Android-specific shortcuts (layout_width/height, padding/margin, text properties)

API Changes

  • Removed Methods:
    • runOcrRecognition(bitmap) - replaced with region-based approach
    • preprocessBitmapForOcr(bitmap) - moved to BitmapUtils with new signature
  • New Methods:
    • runRegionOcr(bitmap, yoloDetections, leftGuidePct, rightGuidePct) in ComputerVisionRepository
    • FuzzyAttributeParser.parse(annotation, tag) for attribute parsing
    • BitmapUtils.cropRegion(bitmap, rect, padding) for region cropping
  • Updated Method Signatures:
    • mergeDetections() now takes enrichedComponents, remainingDetections, fullImageTextBlocks instead of yoloDetections and textBlocks
    • BitmapUtils.preprocessForOcr() signature changed (blockSize, c parameters replace windowSize, threshold)
  • Dependency Injection: ComputerVisionRepositoryImpl constructor now receives RegionOcrProcessor instead of OcrSource

Risks & Best Practices Concerns

  • ⚠️ Limited Integration Testing: Only unit test (FuzzyAttributeParserTest with 537 lines) covers new functionality; no integration tests for the complete region-based OCR workflow with ComputerVisionViewModel
  • ⚠️ Hardcoded Fuzzy Thresholds: Matching thresholds (50, 55, 60, 65) are hardcoded without external configuration; may require tuning based on real-world OCR accuracy
  • ⚠️ Parallel OCR Execution: Three simultaneous OCR operations on different bitmap regions could impact memory usage and performance on lower-end devices; no resource constraints or thread pool configuration visible
  • ⚠️ Complex Margin Parsing Logic: Multi-step spatial clustering and two-pass resolution introduces complexity that may require careful tuning of block clustering thresholds and tag boundary detection
  • ⚠️ Breaking API Changes: Significant refactoring of public repository methods could affect consumers; ComputerVisionViewModel had to be updated accordingly but other callers may break
  • ⚠️ Removed preprocessBitmapForOcr: Deletion of original preprocessing method without deprecation period; direct callers will experience build failures
  • ⚠️ No Validation for Guide Percentages: leftGuidePct and rightGuidePct parameters have no documented constraints or validation for edge cases (e.g., leftGuidePct > rightGuidePct)

Dependencies

  • Added: libs.composite.fuzzysearch (provides FuzzySearch for fuzzy string matching)

Files Modified: 13

  • 1 new test file (537 lines)
  • 2 new domain classes (RegionOcrProcessor, FuzzyAttributeParser)
  • 1 enhanced utility module (BitmapUtils)
  • 9 existing files refactored for region-based workflow integration

Walkthrough

This PR refactors the computer vision OCR pipeline from simple full-image processing to region-aware analysis. It introduces RegionOcrProcessor to orchestrate parallel OCR streams: widget text extraction, margin annotation parsing, and full-image analysis. Supporting components include FuzzyAttributeParser for XML attribute parsing, enhanced bitmap preprocessing with adaptive thresholding, and improved margin annotation detection with spatial clustering. Repository and DI wiring are updated to integrate the new pipeline components.

Changes

Cohort / File(s) Summary
Region-Based OCR Infrastructure
build.gradle.kts, ComputerVisionRepository.kt, ComputerVisionRepositoryImpl.kt, RegionOcrProcessor.kt, ComputerVisionModule.kt
Adds fuzzysearch dependency; replaces simple OCR invocation with region-aware runRegionOcr accepting YOLO detections and guide percentages; introduces RegionOcrProcessor singleton for parallel widget/margin/full-image OCR processing; updates DI to wire RegionOcrProcessor instead of OcrSource.
Detection Processing & Merging
DetectionMerger.kt, MarginAnnotationParser.kt
DetectionMerger simplified to accept enriched components, remaining detections, and full-image text blocks, removing prior IoU-based claiming logic; MarginAnnotationParser refactored with spatial clustering, multi-pass tag resolution, and OCR digit normalization.
Attribute & Tag Parsing
FuzzyAttributeParser.kt, FuzzyAttributeParserTest.kt
Introduces comprehensive FuzzyAttributeParser singleton with fuzzy key matching against AttributeKey enum, value resolution pipeline supporting dimensions/colors/drawables/IDs, special handling for Button background-to-tint mapping, and extensive test coverage for edge cases and garbled OCR input.
OCR Enhancement & Utilities
YoloToXmlConverter.kt, BitmapUtils.kt, OcrSource.kt
YoloToXmlConverter adds OCR digit normalization (l/I/! → 1; o/O → 0), replaces static tag filtering with fuzzy deduplication, and uses FuzzyAttributeParser for margin attribute extraction; BitmapUtils replaces simple thresholding with adaptive thresholding pipeline (grayscale, Gaussian blur, adaptive threshold, median filter) and adds cropRegion utility; OcrSource adds Log import.
ViewModel Integration
ComputerVisionViewModel.kt
Updates to use runRegionOcr with guide percentages; processes enriched detections, remaining detections, and margin detections; applies region-bound filtering (leftBound/rightBound) to merged detections before combining with margin results.

Sequence Diagram

sequenceDiagram
    participant ViewModel
    participant RegionOcrProcessor
    participant OcrSource
    participant Bitmap as Image Processing
    
    ViewModel->>RegionOcrProcessor: process(bitmap, yoloDetections, leftGuidePct, rightGuidePct)
    
    par Widget OCR Path
        RegionOcrProcessor->>Bitmap: crop widget regions with padding
        Bitmap-->>RegionOcrProcessor: cropped bitmaps
        RegionOcrProcessor->>Bitmap: preprocessForOcr
        Bitmap-->>RegionOcrProcessor: preprocessed bitmap
        RegionOcrProcessor->>OcrSource: recognizeText on widgets
        OcrSource-->>RegionOcrProcessor: widget text blocks
        RegionOcrProcessor->>RegionOcrProcessor: enrich YOLO detections with text
    and Margin OCR Path
        RegionOcrProcessor->>Bitmap: crop left/right margin strips
        Bitmap-->>RegionOcrProcessor: margin bitmaps
        RegionOcrProcessor->>Bitmap: preprocessForOcr
        Bitmap-->>RegionOcrProcessor: preprocessed bitmap
        RegionOcrProcessor->>OcrSource: recognizeText on margins
        OcrSource-->>RegionOcrProcessor: margin text blocks
        RegionOcrProcessor->>RegionOcrProcessor: convert to DetectionResults
    and Full-Image OCR Path
        RegionOcrProcessor->>OcrSource: recognizeText on full bitmap
        OcrSource-->>RegionOcrProcessor: full-image text blocks
    end
    
    RegionOcrProcessor-->>ViewModel: RegionOcrResult(enriched, remaining, margin, fullImage)
    ViewModel->>ViewModel: mergeDetections + region filtering
    ViewModel->>ViewModel: aggregate canvas + margin detections
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • jatezzz

Poem

🐰 A hopping refactor, region by region we go,
Widget texts extracted, margins aglow,
Fuzzy attributes parsed with care divine,
From OCR chaos to XML that's fine! 🎨✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to assess relevance to the changeset. Add a pull request description explaining the changes, objectives, and testing performed for this PR.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly references ADFA-3108 and accurately describes the main changes: introducing fuzzy search (evidenced by FuzzyAttributeParser) and OCR refinement (RegionOcrProcessor) to improve CV-to-XML accuracy.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ADFA-3108-cv-fuzzy-search-experimental

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Daniel-ADFA
Copy link
Contributor Author

@coderabbitai

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

@Daniel-ADFA Sure! I'll review the pull request now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt (1)

71-79: ⚠️ Potential issue | 🟠 Major

Avoid logging raw OCR content in production paths.

These debug logs include annotation/canvas text directly, which can leak user content into log storage.

🛡️ Suggested hardening
-Log.d(TAG, "Processed Margin Annotations: {$finalAnnotationLog}")
+if (Log.isLoggable(TAG, Log.DEBUG)) {
+    Log.d(TAG, "Processed Margin Annotations count=${annotationMap.size}")
+}
@@
-Log.d(TAG, "Parsed Canvas Content (Corrected): $canvasLogOutput")
+if (Log.isLoggable(TAG, Log.DEBUG)) {
+    Log.d(TAG, "Parsed Canvas Content count=${correctedCanvasDetections.size}")
+}
@@
-Log.d(TAG, "Block $i: tag=${result?.first ?: "none"}, ${block.size} lines, text='${annotationText.take(40)}'")
+if (Log.isLoggable(TAG, Log.DEBUG)) {
+    Log.d(TAG, "Block $i: tag=${result?.first ?: "none"}, lines=${block.size}")
+}

Also applies to: 102-111, 128-130, 147-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`
around lines 71 - 79, The logs in MarginAnnotationParser that build
finalAnnotationLog and canvasLogOutput (and similar logs around lines with
correctedCanvasDetections, annotationMap, and parsed canvas content) currently
include raw OCR text and must be redacted before logging; update the code to
avoid printing user content by replacing raw text with a sanitized placeholder
or hashed token (e.g., hash or "<redacted>") and only log non-sensitive metadata
(coordinates, sizes, keys) or conditionally log full content under a debug-only
flag (e.g., BuildConfig.DEBUG). Ensure all occurrences (finalAnnotationLog,
canvasLogOutput and the other listed log sites) follow the same
redaction/conditional pattern so no raw OCR strings are written to production
logs.
🧹 Nitpick comments (3)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt (2)

119-134: Border pixels are unfiltered.

The median filter skips the 1-pixel border (iterating y in 1 until height - 1). Border pixels retain their original binary values from the adaptive threshold step. This is typically acceptable for OCR since text content rarely touches image edges, but worth noting if edge artifacts become problematic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt`
around lines 119 - 134, The medianFilter currently skips the 1-pixel border
(function medianFilter using pixels, copy and window) so border pixels retain
unfiltered values; change the loop to cover x in 0 until width and y in 0 until
height and when building the 3x3 window handle out-of-bounds neighbors by
clamping coordinates (e.g., nx = max(0, min(width-1, x+dx)), ny = max(0,
min(height-1, y+dy))) or by filling missing entries with the center pixel before
sorting, then sort window and assign pixels[y*width + x] = window[4]; this
preserves current logic for interior pixels while applying a consistent median
to edges.

11-11: Consider validating blockSize parameter.

The Gaussian blur algorithm requires blockSize to be a positive odd number for symmetric kernel behavior. While the default value (31) is correct and callers currently use defaults, invalid values (0, negative, or even) could cause unexpected behavior or incorrect results.

🛡️ Suggested validation
 fun preprocessForOcr(bitmap: Bitmap, blockSize: Int = 31, c: Int = 15): Bitmap {
+    require(blockSize > 0 && blockSize % 2 == 1) { "blockSize must be a positive odd number" }
     val width = bitmap.width
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt`
at line 11, In preprocessForOcr ensure the blockSize parameter is validated at
the start: check that blockSize is > 0 and odd, and if not throw an
IllegalArgumentException with a clear message referencing the parameter (e.g.,
"blockSize must be a positive odd number"); update the method signature of
preprocessForOcr to include this guard so callers cannot pass 0, negative, or
even values that would break the Gaussian blur kernel.
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt (1)

212-215: Track detection completion analytics on OCR/merge failure paths.

These exits call error handling but skip trackDetectionCompleted(success = false, ...), which makes failure metrics incomplete outside YOLO failures.

📉 Suggested patch
+            fun trackFailureAndHandle(exception: Throwable?) {
+                CvAnalyticsUtil.trackDetectionCompleted(
+                    success = false,
+                    detectionCount = 0,
+                    durationMs = System.currentTimeMillis() - startTime
+                )
+                handleDetectionError(exception)
+            }
+
             val regionOcrResult = repository.runRegionOcr(
                 bitmap, yoloDetections, state.leftGuidePct, state.rightGuidePct
             )
             if (regionOcrResult.isFailure) {
-                handleDetectionError(regionOcrResult.exceptionOrNull())
+                trackFailureAndHandle(regionOcrResult.exceptionOrNull())
                 return@launch
             }
             val ocrResult = regionOcrResult.getOrThrow()
@@
-                .onFailure { handleDetectionError(it) }
+                .onFailure { trackFailureAndHandle(it) }

Also applies to: 255-255

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt`
around lines 212 - 215, When an OCR/merge result fails you currently call
handleDetectionError(...) and return but never record the failure; before
returning from the region OCR failure branch (where regionOcrResult.isFailure is
checked) and the merge failure branch (where mergeOcrResult.isFailure is
checked) call trackDetectionCompleted(success = false, error =
regionOcrResult.exceptionOrNull() / mergeOcrResult.exceptionOrNull()) (matching
the same metadata parameters used in the success path) and then proceed to
handleDetectionError(...)/return so failure metrics are tracked consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 14-31: The extraction fails on lowercase OCR output; update
TAG_REGEX and TAG_EXTRACT_REGEX to be case-insensitive (e.g., use Regex(...,
RegexOption.IGNORE_CASE)) or alternatively uppercase the trimmed input before
matching in extractTag, then normalize the matched prefix and digits to
uppercase (handle the "8" -> "B" case after uppercasing) and build the tag in
the canonical uppercase form before calling isTag; also ensure isTag uses the
same case-insensitive check or receives the normalized tag.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.kt`:
- Around line 65-77: The async block that crops and preprocesses bitmaps (using
BitmapUtils.cropRegion and BitmapUtils.preprocessForOcr) must ensure
deterministic recycling on all paths; wrap the body in try/finally, declare crop
and preprocessed variables before the try, assign them inside try, and in
finally call recycle() safely (check for null and for crop !== bitmap) so both
normal, exception, and cancellation paths free native memory; apply the same
try/finally pattern to the other similar block around
ocrSource.recognizeText/component.copy mentioned at lines 106-109.
- Around line 87-96: Validate and clamp the guide percentage inputs before
computing rects: ensure leftGuidePct and rightGuidePct are constrained to [0.0,
1.0] and that leftGuidePct <= rightGuidePct (or swap them) to avoid
negative/invalid regions; then compute width/height and build leftRect/rightRect
from the sanitized values and call ocrCroppedRegion(bitmap, leftRect, 0f) and
ocrCroppedRegion(bitmap, rightRect, rightOffsetX) using the corrected
rightOffsetX based on the clamped rightGuidePct. This change should be applied
in the RegionOcrProcessor logic surrounding the variables leftGuidePct,
rightGuidePct and the calls to ocrCroppedRegion to prevent malformed RectF
creation.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.kt`:
- Around line 129-133: The fallback lookup is using raw tag text and can miss
matches; change both places where you call getTagType(tagBox.text) and
annotations[tagBox.text] to use normalizeTagText(tagBox.text) (same as
normalizedText) so lookups and getTagType use the normalized key; update the
identical fallback block later (the second occurrence around the alternate
assignment) to normalize before calling getTagType and indexing into annotations
(referencing deduplicatedTags, tagBox, normalizeTagText, getTagType, and
annotations).
- Line 329: Remove the full XML debug print in the appendSimpleView path: stop
logging the contents of the xml variable via Log.d(TAG, "appendSimpleView:
$xml"); either delete that Log.d call or replace it with a non-sensitive,
minimal message (e.g., an opaque event or size/count only) inside the
appendSimpleView function so OCR-derived text is not emitted to Logcat.

---

Outside diff comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 71-79: The logs in MarginAnnotationParser that build
finalAnnotationLog and canvasLogOutput (and similar logs around lines with
correctedCanvasDetections, annotationMap, and parsed canvas content) currently
include raw OCR text and must be redacted before logging; update the code to
avoid printing user content by replacing raw text with a sanitized placeholder
or hashed token (e.g., hash or "<redacted>") and only log non-sensitive metadata
(coordinates, sizes, keys) or conditionally log full content under a debug-only
flag (e.g., BuildConfig.DEBUG). Ensure all occurrences (finalAnnotationLog,
canvasLogOutput and the other listed log sites) follow the same
redaction/conditional pattern so no raw OCR strings are written to production
logs.

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt`:
- Around line 212-215: When an OCR/merge result fails you currently call
handleDetectionError(...) and return but never record the failure; before
returning from the region OCR failure branch (where regionOcrResult.isFailure is
checked) and the merge failure branch (where mergeOcrResult.isFailure is
checked) call trackDetectionCompleted(success = false, error =
regionOcrResult.exceptionOrNull() / mergeOcrResult.exceptionOrNull()) (matching
the same metadata parameters used in the success path) and then proceed to
handleDetectionError(...)/return so failure metrics are tracked consistently.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt`:
- Around line 119-134: The medianFilter currently skips the 1-pixel border
(function medianFilter using pixels, copy and window) so border pixels retain
unfiltered values; change the loop to cover x in 0 until width and y in 0 until
height and when building the 3x3 window handle out-of-bounds neighbors by
clamping coordinates (e.g., nx = max(0, min(width-1, x+dx)), ny = max(0,
min(height-1, y+dy))) or by filling missing entries with the center pixel before
sorting, then sort window and assign pixels[y*width + x] = window[4]; this
preserves current logic for interior pixels while applying a consistent median
to edges.
- Line 11: In preprocessForOcr ensure the blockSize parameter is validated at
the start: check that blockSize is > 0 and odd, and if not throw an
IllegalArgumentException with a clear message referencing the parameter (e.g.,
"blockSize must be a positive odd number"); update the method signature of
preprocessForOcr to include this guard so callers cannot pass 0, negative, or
even values that would break the Gaussian blur kernel.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96ee70e6-296f-4961-8223-94872394c4f4

📥 Commits

Reviewing files that changed from the base of the PR and between 89ee046 and bf25e48.

📒 Files selected for processing (13)
  • cv-image-to-xml/build.gradle.kts
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepository.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepositoryImpl.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/source/OcrSource.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/di/ComputerVisionModule.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionMerger.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt
  • cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParserTest.kt

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt (1)

71-78: ⚠️ Potential issue | 🟠 Major

Don't ship raw OCR content to Logcat.

These Log.d calls dump recognized annotations and canvas text verbatim. Screenshots can contain user/customer content, so this should be removed, redacted, or at least gated to debug builds before release.

Also applies to: 102-110, 128-130, 147-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`
around lines 71 - 78, The Log.d calls in MarginAnnotationParser (specifically
the computed finalAnnotationLog and canvasLogOutput that print raw OCR text from
correctedCanvasDetections and annotationMap) must not output sensitive user
content; change them to either remove the detailed strings or gate them behind a
debug-only check (e.g., BuildConfig.DEBUG) or redact the text before logging
(e.g., replace characters with a mask or log only lengths/counts and bounding
boxes). Update the logging in the blocks that build finalAnnotationLog and
canvasLogOutput and apply the same pattern to the other similar Log.d usages you
flagged (lines 102-110, 128-130, 147-148) so no raw OCR content is written in
production logs.
🧹 Nitpick comments (3)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt (3)

7-13: Consider documenting fuzzy threshold rationale.

The threshold values (50-65) are quite low and may occasionally produce false positives. This is likely intentional for OCR error tolerance, but a brief comment explaining the trade-off would help maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt`
around lines 7 - 13, Add a brief comment above FUZZY_VALUE_THRESHOLD and
fuzzyKeyThreshold explaining why the numeric thresholds (50,55,60,65) were
chosen—note that they are intentionally low to tolerate OCR errors, the
trade-off of increased false positives, and guidance on when to raise/lower
values; also mention that these values are configurable/tunable (or suggest
externalizing to config) so future maintainers understand rationale and how to
adjust behavior of FUZZY_VALUE_THRESHOLD and fuzzyKeyThreshold().

299-319: Loop structure is intentional but confusing.

The static analysis warning about "unconditional jump" is a false positive. The break statements are correct: line 313 breaks from the innermost for loop, and line 315 breaks from the middle for loop when found is true. However, the triple-nested loop with multiple break conditions is hard to follow.

Consider extracting the inner search logic into a helper function that returns a match result, which would clarify intent and eliminate the found flag pattern.

♻️ Optional refactor to improve readability
+    private data class TrailingMatch(val keyStart: Int, val attr: String, val value: String)
+
+    private fun findTrailingAttribute(words: List<String>, tag: String): TrailingMatch? {
+        for (keyStart in words.size - 2 downTo 1) {
+            for (keyLen in minOf(3, words.size - keyStart - 1) downTo 1) {
+                val candidateKey = words.subList(keyStart, keyStart + keyLen).joinToString("_")
+                val matched = fuzzyMatchKey(candidateKey) ?: continue
+                val trailingValue = words.subList(keyStart + keyLen, words.size).joinToString(" ")
+                val cleanedValue = cleanValue(trailingValue, matched)
+                if (cleanedValue.isEmpty()) continue
+                val (attr, finalValue) = resolveXmlAttribute(matched, cleanedValue, tag)
+                return TrailingMatch(keyStart, attr, finalValue)
+            }
+        }
+        return null
+    }
+
     private fun extractTrailingAttributes(value: String, tag: String): Pair<String, Map<String, String>> {
         val attrs = mutableMapOf<String, String>()
         var remaining = value
 
         while (true) {
             val words = remaining.split(Regex("\\s+"))
             if (words.size < 2) break
 
-            var found = false
-            for (keyStart in words.size - 2 downTo 1) {
-                for (keyLen in minOf(3, words.size - keyStart - 1) downTo 1) {
-                    val candidateKey = words.subList(keyStart, keyStart + keyLen).joinToString("_")
-                    val matched = fuzzyMatchKey(candidateKey) ?: continue
-
-                    val trailingValue = words.subList(keyStart + keyLen, words.size).joinToString(" ")
-                    val cleanedValue = cleanValue(trailingValue, matched)
-                    if (cleanedValue.isEmpty()) continue
-
-                    val (attr, finalValue) = resolveXmlAttribute(matched, cleanedValue, tag)
-                    attrs[attr] = finalValue
-                    remaining = words.subList(0, keyStart).joinToString(" ")
-                    found = true
-                    break
-                }
-                if (found) break
-            }
-
-            if (!found) break
+            val match = findTrailingAttribute(words, tag) ?: break
+            attrs[match.attr] = match.value
+            remaining = words.subList(0, match.keyStart).joinToString(" ")
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt`
around lines 299 - 319, The triple-nested search loop in FuzzyAttributeParser.kt
is confusing and uses the found flag to control breaks; refactor the inner
search (the logic that iterates keyStart and keyLen, calls
fuzzyMatchKey(candidateKey), computes trailingValue/cleanedValue, and calls
resolveXmlAttribute) into a helper function (e.g., findAttributeMatch(words:
List<String>, tag: String): Pair<String, String>? or a small data class) that
returns the matched attribute and final value (or null) so the outer loop can
simply call this helper and, if non-null, set attrs[attr]=finalValue and
remaining accordingly; this removes the found flag and nested breaks and keeps
usage of fuzzyMatchKey, cleanValue, resolveXmlAttribute, attrs and remaining
intact.

364-376: Consider increasing fuzzy match thresholds or adding validation gates for low-confidence attribute matches.

Current thresholds (50–65) fall well below industry standard for high-confidence fuzzy matching (≥90). While fuzzyMatchKey() returns results based on these thresholds, those results flow directly through parseAttribute() into XML attributes without re-validation. Garbled OCR text like "layouT_wldth" could plausibly match "layout_width" at a score just above the 50–65 range, silently producing incorrect attributes in output.

Either raise thresholds toward 80–90 for auto-acceptance, or add validation/logging for matches below 80 to catch potential errors during debugging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt`
around lines 364 - 376, fuzzyMatchKey currently accepts low-confidence matches
(via FuzzySearch.extractOne and fuzzyKeyThreshold), allowing potentially
incorrect attributes to pass into parseAttribute; update fuzzyMatchKey (and its
use in parseAttribute) to require a higher acceptance threshold (e.g., >=80–90)
or add a secondary validation gate: after
FuzzySearch.extractOne(normalizeOcrKey(rawKey)...) check if result.score >= 80
(or fuzzyKeyThreshold but clamped to min 80) before returning
AttributeKey.findByAlias(result.string), otherwise log a warning including
rawKey, normalizedKey and result.score and return null (or mark for manual
review) so low-confidence matches aren’t auto-accepted without visibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/di/ComputerVisionModule.kt`:
- Around line 17-19: The shared TextRecognizer in OcrSource is being used
concurrently by RegionOcrProcessor.runWidgetOcr() which launches parallel async
jobs (widgetOcr, marginOcr, fullImageOcr); wrap all calls that invoke
OcrSource.recognizeText()/TextRecognizer.process() behind a Mutex to serialize
access, or change OcrSource to provide a factory/getPerWorkerTextRecognizer()
and have RegionOcrProcessor obtain a dedicated TextRecognizer per worker to
allow safe parallelism—update OcrSource and RegionOcrProcessor.runWidgetOcr()
accordingly and ensure recognizeText() calls reference the chosen serialized or
per-worker instance.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionMerger.kt`:
- Around line 17-20: enrichedComponents already contain widget-level OCR but
orphanText later re-adds full-image Text.TextBlock entries, creating duplicate
detections; modify the logic in DetectionMerger (where usedTextBlocks,
enrichedComponents, orphanText, finalDetections, and remainingYoloDetections are
handled) so that when building orphanText you filter out any Text.TextBlock
instances present in usedTextBlocks or contained in enrichedComponents before
adding them to finalDetections, ensuring orphanText only includes truly
unclaimed text blocks and preventing duplicate "text" detections for the same
widget-level OCR.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt`:
- Around line 431-438: The cleanId function is stripping a trailing underscore
plus single lowercase letter via .replace(Regex("_[a-z]$"), "") which
inadvertently truncates valid IDs like "button_a"; update cleanId to stop
removing that pattern (or make it optional/opt-in) by removing or guarding the
Regex("_[a-z]$") replacement, or add a flag/comment and only apply that cleanup
when an explicit OCR-noise mode is enabled; keep references to the cleanId
function and the Regex("_[a-z]$") so the change is easy to locate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 134-138: The pass-2 filter in MarginAnnotationParser.kt is too
strict and drops valid short annotations because of the check ".filter { (_,
parsed) -> parsed.annotationText.length >= 5 }"; change that predicate to allow
short but meaningful tokens (e.g. use "parsed.annotationText.trim().length >= 3"
or accept any non-blank string via "parsed.annotationText.isNotBlank()") so
values like "red", "8dp" or "gone" are retained for fuzzy parsing; update the
filter on the remainingBlocks construction that references parsedBlocks,
matchedBlockIndices, and parsed.annotationText accordingly.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt`:
- Around line 228-236: The current filter keeps all YOLO detections regardless
of ROI because it uses "detection.isYolo || ...", so change the predicate to
only include YOLO boxes that fall within the leftBound..rightBound while still
keeping non-YOLO detections; update the lambda on mergedDetections (where
canvasOnlyMerged is built) so it uses a condition like "not detection.isYolo OR
detection.boundingBox.centerX() in leftBound..rightBound" (referencing
detection.isYolo, detection.boundingBox.centerX(), leftBound, rightBound,
mergedDetections, canvasOnlyMerged, allDetections and
MarginAnnotationParser.parse) to prevent out-of-bounds YOLO false positives from
being passed into MarginAnnotationParser.parse().

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt`:
- Around line 30-38: cropRegion currently returns the original Bitmap when the
computed crop width/height are non-positive, causing callers like
RegionOcrProcessor.runWidgetOcr to OCR the whole image; change cropRegion
(function name: cropRegion) to return null on invalid crop (w <= 0 || h <= 0)
instead of the full bitmap, and update callers such as
RegionOcrProcessor.runWidgetOcr to handle the nullable result (e.g., skip or
return the component when cropRegion returns null), referencing the bounding box
parameter (component.boundingBox) and padding argument (componentPadding) so
invalid ROIs are skipped rather than processed as the full image.

---

Outside diff comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 71-78: The Log.d calls in MarginAnnotationParser (specifically the
computed finalAnnotationLog and canvasLogOutput that print raw OCR text from
correctedCanvasDetections and annotationMap) must not output sensitive user
content; change them to either remove the detailed strings or gate them behind a
debug-only check (e.g., BuildConfig.DEBUG) or redact the text before logging
(e.g., replace characters with a mask or log only lengths/counts and bounding
boxes). Update the logging in the blocks that build finalAnnotationLog and
canvasLogOutput and apply the same pattern to the other similar Log.d usages you
flagged (lines 102-110, 128-130, 147-148) so no raw OCR content is written in
production logs.

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt`:
- Around line 7-13: Add a brief comment above FUZZY_VALUE_THRESHOLD and
fuzzyKeyThreshold explaining why the numeric thresholds (50,55,60,65) were
chosen—note that they are intentionally low to tolerate OCR errors, the
trade-off of increased false positives, and guidance on when to raise/lower
values; also mention that these values are configurable/tunable (or suggest
externalizing to config) so future maintainers understand rationale and how to
adjust behavior of FUZZY_VALUE_THRESHOLD and fuzzyKeyThreshold().
- Around line 299-319: The triple-nested search loop in FuzzyAttributeParser.kt
is confusing and uses the found flag to control breaks; refactor the inner
search (the logic that iterates keyStart and keyLen, calls
fuzzyMatchKey(candidateKey), computes trailingValue/cleanedValue, and calls
resolveXmlAttribute) into a helper function (e.g., findAttributeMatch(words:
List<String>, tag: String): Pair<String, String>? or a small data class) that
returns the matched attribute and final value (or null) so the outer loop can
simply call this helper and, if non-null, set attrs[attr]=finalValue and
remaining accordingly; this removes the found flag and nested breaks and keeps
usage of fuzzyMatchKey, cleanValue, resolveXmlAttribute, attrs and remaining
intact.
- Around line 364-376: fuzzyMatchKey currently accepts low-confidence matches
(via FuzzySearch.extractOne and fuzzyKeyThreshold), allowing potentially
incorrect attributes to pass into parseAttribute; update fuzzyMatchKey (and its
use in parseAttribute) to require a higher acceptance threshold (e.g., >=80–90)
or add a secondary validation gate: after
FuzzySearch.extractOne(normalizeOcrKey(rawKey)...) check if result.score >= 80
(or fuzzyKeyThreshold but clamped to min 80) before returning
AttributeKey.findByAlias(result.string), otherwise log a warning including
rawKey, normalizedKey and result.score and return null (or mark for manual
review) so low-confidence matches aren’t auto-accepted without visibility.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9bf00bfc-4d52-4789-bb0f-5c4bfda2b356

📥 Commits

Reviewing files that changed from the base of the PR and between 89ee046 and a236df8.

📒 Files selected for processing (13)
  • cv-image-to-xml/build.gradle.kts
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepository.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/repository/ComputerVisionRepositoryImpl.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/data/source/OcrSource.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/di/ComputerVisionModule.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/DetectionMerger.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/RegionOcrProcessor.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/YoloToXmlConverter.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/ui/viewmodel/ComputerVisionViewModel.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/BitmapUtils.kt
  • cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/FuzzyAttributeParserTest.kt

@Daniel-ADFA Daniel-ADFA merged commit 8abdd4c into stage Mar 6, 2026
2 checks passed
@Daniel-ADFA Daniel-ADFA deleted the ADFA-3108-cv-fuzzy-search-experimental branch March 6, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants