RT runtime cutover, generic purge strategy, and tracked-pieces rebuild by mneuhaus · Pull Request #129 · basicallysource/sorter-v2

mneuhaus · 2026-04-23T00:30:50Z

Summary

This is the current sorthive branch delta against main. It carries the runtime rebuild from shim-heavy legacy wiring onto the new RT runtime, a generic cross-channel purge architecture, a fully persisted tracked-pieces view with per-segment debug crops, a broad service-extraction sweep, the Hive integration with Gemini-backed condition labeling, and a BoTSORT + OSNet ReID primary tracker that replaces motion-only association.

Architectural direction is codified in [lab/sorter-architecture-principles]({{ '/lab/sorter-architecture-principles/' | relative_url }}) — the docs tree has been realigned so the principles doc is the active guide.

Recent additions (on top of the base cutover work)

BoTSORT + OSNet ReID primary tracker

rt/perception/trackers/boxmot_reid.py wraps boxmot's BoTSORT with OSNet ReID and emits appearance_embedding on each Track. Weights (osnet_x0_25_msmt17, ~3 MB) auto-download into blob/reid_models/ on first use.
Primary tracker key is now botsort_reid (override via RT_PRIMARY_TRACKER_KEY); shadow is turntable_groundplane. Motion-only trackers stay registered as benchmarks.
Pure-geometry helpers factored into rt/perception/trackers/_geometry.py so the three legacy adapters share one implementation instead of three near-identical copies.
Intra-channel re-acquisition (including the Carousel "same-piece reappearing after occlusion" case) is now the tracker's job — previously paper-over logic in the transit stitcher.
TrackTransitRegistry still owns the cross-channel C3 → C4 hand-off but now carries the source track's embedding and gates claim() by cosine similarity (default 0.55). Catches the documented failure mode where a red slope's track got merged with an unrelated white-piece track. Missing embeddings never block, so pure-motion fleets keep working.

Hive integration

Hive linking + per-target queue management UI, live queue polling, retry/purge controls
Queue page filters by sample type (gemini boxes / condition / classification / other) and ships a unified Backfill dialog for upload backfills and condition-teacher runs
Hive training reports surface on each Hive model; balanced-dataset tooling lands alongside
Obsolete Hive register endpoint removed

Gemini condition teacher

condition_teacher.py adapter scans persisted piece crops and asks Gemini for composition (single_part / compound_part / multi_part / …) and condition (clean_ok / minor_wear / dirty / damaged / trash_candidate), emitting a condition sample type
sample_payloads.build_sample_payload produces a first-class condition analysis block
Hive uploader classifies each queued sample (teacher_detection / condition / classification / other) so the queue can filter and backfill by type
Background auto-teacher thread drives it on recent piece crops with a per-piece crop-count ceiling; Queue UI surfaces its live state
Hive frontend renders a SampleConditionCard on sample detail + review pages

Runtime + channels

Runtime track transit stitching across channels (with registry bootstrap fix and dossier-stitching audit)
Runtime ghost gating softened; confirmed tracker identities stay real
C4 piece-crop capture made more aggressive; C-channel sample capture/upload QA improved
Direct motion controls + named motion profiles for sample transport (C1–C4), continuous-motion ramping tuned
C4 hardware command backlog avoidance + unjam watchdog; handoff callbacks replaced by HandoffPort
Sorter runtime sidebar simplified; Carousel relabeled as Classification Channel; tracked-piece detail page redesigned around a verdict band

Extractions + sweep

Camera-settings service out of cameras router (and out of sidebar)
Stepper HTTP calls moved into a service; sorting-profile reads collapsed; app-settings persistence behind a helper
Empty frontend helper wrappers trimmed; generated local-state DBs ignored

Base cutover (what opened this PR)

Runtime cutover (RT)

RT contracts, registry, event bus, perception pipeline, and runtime handle as the primary runtime path
Channel runtimes for C1/C2/C3/C4 plus downstream distribution and classifier wiring
Detection endpoints and registry callers ported off the legacy path; detection shim deleted
MJPEG pipeline removed — WebSocket is the only stream
Runtime startup restored, detection previews and current-frame testing fixed after restarts

Generic purge architecture

New PurgePort contract and PurgeStrategy abstraction
C2, C3, and C4 bound onto the generic strategy (was previously C4-only, ad-hoc)
Bootstrap coordinator wires strategies uniformly; C234 coordinator extracted into a maintenance service
Startup purge forces owned-sweep when no exit_track exists
Orchestrator caps upstream capacity by downstream headroom (true backpressure)

Tracked pieces rebuild

Single /api/tracked/pieces endpoint feeds list + detail + modal (no more parallel code paths)
SegmentRecorder persists per-track path points, channel geometry, and wedge crops to SQLite + disk
Detail page renders track path and wedge composites directly from DB
Wedges sized dynamically from bbox (radial + angular) and gated for non-overlap — no more fixed 15° sectors
List view preview shows the latest captured wedge crop, not the intake snapshot
4-char base36 track labels — tracker overlay and tracked-pieces UI speak the same language

Service extractions + ports (principle: composition wires, services coordinate)

SorterLifecyclePort replaces hardware callable-globals
rt/projections package with piece_dossier subscriber
Detection config read/save, polygon save, runtime_stats access, camera annotation introspection → dedicated services
rt/hardware/channel_callables.py, rt/config/channels.py, perception runner builder → own modules
blob_manager forwarder layer dissolved

Dead-code sweep

vision_manager + controller_ref compat stubs and branches removed from cameras/steppers/training/sorting-profile/detection-config/camera_preview_hub
runtime_variables module + endpoint path removed (hidden config store violating principle 4)
Dead debug endpoints wired to legacy stubs deleted
Empty detection wrappers removed (and the anti-abstraction rule codified)
Dead imports, empty re-export __init__.py files, dead local variables (ruff F841) swept
Direct gc_ref.xxx field reads removed from routers

Bug fixes surfaced by live-debug workflow

Detector zone filter tightened, confirmed_real gate dropped in C2/C3 (enables ring visibility)
Per-channel polygon resolution and crop scaling fixed
Stepper-init lambda loop-var binding + storage-layer hardware save NameError + layer-sync setattr-with-constant

Docs

Lab index points at sorter-architecture-principles
docs/sorter/architecture.md no longer carries the stale pre-RT component map; it's a short pointer into the principles doc
Old runtime-rebuild design / current-state map / audit HTML removed (~4200 LOC gone)

Validation

rt test suite green (363 passed) with the new test_boxmot_reid_tracker + embedding-aware test_track_transit + C3 embedding forwarding test
Condition-teacher + hive-uploader test suites green
Hive upload round-trip test covering condition_sample payload passing

Live hardware run (2026-04-24 evening, ~5 min total)

metric	value
pieces_seen	501
classified	216
unknown	214
tracker_id_switch_suspect_total	0
distributed	0 (distributor bottleneck — separate bug)

BoTSORT adapter loaded cleanly on all three feeds (tracker=botsort_reid); ReID weights auto-downloaded to blob/reid_models/osnet_x0_25_msmt17.pt on first backend boot
512-d OSNet appearance embeddings confirmed flowing to each live Track via the extended /api/rt/tracks/{feed_id} surface
Identity check on 150 archived dossiers: every unique tracked_global_id resolved to exactly one classified part_id/color combo — the Police-vs-Slope class of identity swap did not reproduce. BoTSORT keeps the same gid across full carousel rotations of the same physical piece, which is the intended behaviour and matches operator expectations.
Side-finds from this run (committed in the same branch):
- runtime-stats widget (bottom-right UI) was reporting empty state_machines after the RT cutover — nothing in the rt graph was calling RuntimeStatsCollector.observeStateTransition. Fixed by adding a state_observer callback on BaseRuntime and wiring it in bootstrap; widget now shows current state + transition counts + per-state time shares for c1/c2/c3/c4/distributor.
Addressed in follow-up commits on this branch:
- Distributor backoff (82980aa0): HandoffPort gets an available_slots() probe, each dossier has a 250 ms retry cooldown after a busy rejection. C4 stops hammering the distributor at tick rate — the 2.6% acceptance rate was pure bookkeeping noise from handoff_request being called repeatedly while the distributor worked through a single piece.
- Pulse-settle-pulse C4 transport profile (fde8c8c): new PROFILE_TRANSPORT_PULSED splits a transport move into sub-pulses with explicit settle pauses between them, so pieces on the carousel can re-grip between kicks instead of sliding under the abrupt stops. Opt-in via RT_C4_TRANSPORT_PROFILE=pulsed; sub-pulse size (RT_C4_TRANSPORT_SUB_PULSE_DEG, default 2°) and settle duration (RT_C4_TRANSPORT_SETTLE_MS, default 120 ms) are env-tunable so Marc can A/B without a rebuild. Default profile stays transport — no behaviour change unless opted in.

Notes

Intentionally a large integration PR — the full branch delta from main (200+ commits)
Architectural principles are the durable handle for future iteration; this PR is the point where the runtime is architecturally coherent enough to keep shipping from

Wumms snapshot of ~65 files from ongoing throughput/UX iteration: Backend: - classification_channel cleanup (running/recognition/ejecting), ghost handling, ignored regions, tracker exemption for pending-drop - local_state: ghost ignore memory + dossier groundwork - piece_transport: motion-sync metrics, distribution positioning fix - vision: overlay scaling for 4K feeds, stream overlay cleanups (helper IN/white-red DROP removed), camera_feed/service tweaks, tracking history + polar_tracker refinements - server/api + detection router: known-objects endpoints, dossier read paths - new: role_aliases, overlays/scaling, tests/test_distribution_positioning Frontend: - RecentObjects + TrackPathComposite + tracked/[uuid] restored from APFS snapshot, burst filmstrip, KnownObject-by-uuid fallback - recent-pieces.ts extracted (dedup key logic) - settings sections updates, CameraFeed crop rotation gate - stores/sortingProfile + manager.svelte.ts wiring gitignore: - exclude stray tmp-*.png / dashboard-*.png / sorter-dashboard-*.png debug screenshots from repo root Safety-net commit before the unified piece-dossier refactor (plan at ~/.claude/plans/lass-uns-gerade-einmal-wiggly-parrot.md). Intent is recoverability, not clean history — the next commits split this work by subsystem.

- new piece_segments table persisting per-segment polar track data, sector snapshots, and recognition results per physical piece - indexes for (piece_uuid, sequence) and (session_id, role, first_seen_ts) - helpers remember_piece_segment / list_piece_segments / clear_piece_segments_for_session - additive only: existing piece_dossiers untouched - phase 3 will wire this table from the vision manager Phase 2 of unified piece-dossier plan.

- piece_transport.registerIncomingPiece now cascades tracked_global_id through active lookup, DB dossier lookup, explicit piece_uuid, and finally fresh uuid — preventing double-register on tracker glitches - new resumeExistingPiece() rehydrates a dossier from SQLite into active pieces - new KnownObject.from_dossier() classmethod reciprocal to event serialiser - classification_channel running.py _recoverExistingTrackedPieces uses resumeExistingPiece on DB-hit; _registerNewIntakePiece guards against double-register at call-site too - tests for idempotent register + dossier rehydration Phase 1 of unified piece-dossier plan.

- blob_manager: piece_crops_dir + write_piece_crop, best-effort on OSError - PieceHistoryBuffer: on_segment_archived callback slot - vision_manager: wires archive callback -> resolves piece_uuid via piece_transport, creates stub dossier on miss, writes sector_snapshot wedge/piece JPEGs to blob/piece_crops/<uuid>/seg<n>/, then remember_piece_segment() with paths instead of b64 - api.py: GET /api/piece-crops/{uuid}/seg{n}/{kind}/{idx}.jpg FileResponse with path-traversal guard and long cache - piece_transport: bindStubPieceUuid helper - tests for roundtrip, stub-dossier creation, and error tolerance Phase 3 of unified piece-dossier plan.

- _LiveTrack.piece_uuid assigned when C3 track reaches 4 stable hits - stub dossier written to SQLite via remember_piece_dossier on first bind, so downstream lookups find the record from the moment the piece becomes reliably tracked - PendingHandoff and handoff.register_track propagate piece_uuid across C3 -> Carousel transition - TrackAngularExtent surfaces piece_uuid to classification channel - running.py passes piece_uuid through registerIncomingPiece so the idempotent cascade (phase 1) reuses the same UUID on C4 intake - tests for binding threshold, handoff preservation, and register idempotency with explicit piece_uuid Phase 4 of unified piece-dossier plan.

- local_state.build_piece_detail_payload merges dossier + segments - GET /api/tracked/pieces/{uuid} now reads DB first, falls back to runtime LRU, returns 404 only when truly unknown — kills the spurious "Track Not Found" on still-known pieces - response adds track_detail.{segments, live} block; live tracker details merged when the piece is still active - GET /api/tracked/pieces items carry has_track_segments flag - bulk segment-count helper keeps list endpoint O(1) per row - tests for persisted-after-restart, live-merge, and fallback paths Phase 5 of unified piece-dossier plan.

- recent-pieces.ts key logic: piece_uuid first, tracked_global_id fallback, recentPhysicalKeyOrNull() returns null when neither is available (drop the item instead of inventing keys) - RecentObjects + tracked/+page dedupe skip null-key items - tracked/[uuid] reads track_detail.segments from /api/tracked/pieces response, only hits /api/feeder/tracking/history when track_detail.live === true - TrackPathComposite prefers jpeg_path (-> /api/piece-crops/...) over jpeg_b64 for snapshot rendering - removed "Track Not Found" label; replaced with neutral loading/error states that reflect the DB-first reality Phase 6 of unified piece-dossier plan.

- cursor-pointer on camera tile buttons (USB + network) so hover signals clickability, disabled:cursor-not-allowed while saving - reassign-confirm modal raised above camera picker so the secondary dialog actually surfaces instead of being stacked under - camera stream auto-refreshes after assign by propagating feedRevision into the background feed component (key or query param) - preview-unavailable fallback on tiles whose MJPEG stream never produces a frame (was showing bare broken-image alt)

- tile MJPEG status callback uses direct property assignment (tileStreamStatus[idx] = status) instead of Object.spread — the spread cloned the whole record on every frame for 6 parallel streams, saturating Svelte's reactivity graph and freezing the renderer before saveCameraRole's fetch could resolve - close camera picker unconditionally after save rather than gating on cameraError; an error now shows as inline alert, but the modal no longer stays open silently while the backend did accept the assignment

- new CameraPreviewHub broadcaster: one VideoCapture per device, fan-out to N subscribers via asyncio.Queue at ~10fps preview rate - new /ws/camera-preview/{index} websocket endpoint — clients receive raw jpeg bytes per message - refuses subscription when device is owned by vision_manager (primary role feeds), to avoid device-handle conflicts - new wsJpegStream svelte action mirroring the mjpeg action API (status callback, reconnect, destroy) - ZoneSection modal tiles switch to wsJpegStream; the main zone feed and dashboard feed remain on MJPEG - legacy GET /api/cameras/stream/{index} kept as fallback Eliminates the 6-connection-per-host HTTP/1.1 pool exhaustion that froze the picker whenever saveCameraRole's POST couldn't acquire a socket.

Clean-slate before the dossier refactor left a 1 GB local_state.sqlite.backup-<ts> file behind as a safety net. Keep those around locally but don't commit them.

motion_confirmed was a sticky latch — once a track briefly displaced past the 18 px birth threshold, the stagnant-false-track filter was disarmed for that track forever. Apparatus ghosts that wiggled during autoexposure settle at startup latched True and then blocked the dropzone indefinitely: feeder pipeline stalled on wait_chX_dropzone_clear with no self-recovery path. Add a role-agnostic recent-cartesian-stationary helper and short-circuit the motion_confirmed gate: when a latched track has stayed within RECENT_STATIONARY_MAX_DISPLACEMENT_PX (6 px) over RECENT_STATIONARY_MIN_COVERAGE_S (1.8 s) of a 2.5 s window, allow the stagnant filter to fire. Legit pieces pausing briefly mid-travel (sub-second) are unaffected because their recent window still contains the anchor from pre-pause motion.

… dynamic Feeder roles on hive:* or gemini_sam never consume the MOG2 detector output, yet a FeederAnalysisThread was still running per channel at 33Hz with BGR->Lab conversion + MOG2 background subtraction. Gate the thread at init-time and on runtime algorithm-switch so it only runs when static mog2 is actually selected.

…achinery Fundamentally inverts the feeder-tracker gate: a track is no longer "real until proven ghost" — it's now "unknown until proven real". Every tracked piece starts with ``confirmed_real=False`` and flips sticky-True only after demonstrating monotonic polar-angular progress ≥ 5° or centroid drift ≥ 40 px across a ≥ 6-sample window. Apparatus ghosts (screws, reflections, guides) physically cannot clear that bar — their jitter is fixed-position, not monotonic. Dossier mint + segment archival gate on ``confirmed_real`` instead of the Phase-7 displacement thresholds. Everything removed in a single tombstone: * stagnant_false_track filter (all flavours: carousel polar, motion- confirmed recovery, pending-drop protection); * persistent_static_ghost_regions infrastructure (load/save/prune/ suppress + role wiring + state_entries rows with a migration that purges any legacy rows on backend boot); * motion_confirmed latch + MIN_ANGULAR/PIECE_UUID_DISPLACEMENT gates in _maybe_bind_piece_uuid; * ANGULAR_SPAN < 3° motion-gate in _archive_segment_to_dossier_impl (DB-lookup-before-mint path stays). _archive_segment_to_dossier_impl now consults the live tracker for ``confirmed_real``; if the originating track is already dead, falls back to segment_sector_angular_span_deg ≥ 5° as a safe-but-lax check. TrackedPiece exposes ``confirmed_real`` so downstream consumers can filter on it in the follow-up propagation commit.

…e, handoff Downstream consumers of the polar tracker now all gate on the new ``confirmed_real`` whitelist flag: * ``_channelDetectionsFromTracks`` filters tracks before they reach ``analyzeFeederChannels`` — this is the single chokepoint that drives ch2/ch3 dropzone occupancy, clump/massage decisions, and exit wiggle. Apparatus ghosts never populate that path anymore, so the feeder stops stalling behind them. * ``subsystems/feeder/feeding.py`` classification-channel admission: ``classification_channel_track_count`` counts only confirmed-real tracks (prevents a carousel apparatus ghost from pinning ch3). * ``subsystems/channels/c1_bulk.py`` c2 saturation: the bulk feeder's max-ch2-pieces cap counts only confirmed tracks (prevents a c2 guide ghost from halting bulk feed forever). * ``vision/overlays/tracker.py`` live overlay: unconfirmed tracks render in dim grey with no id chip; confirmed tracks keep their green/amber/magenta colour coding + ``#xxxx`` pill. Handoff manager kept as-is. The existing ghost-reject gate on ``PendingHandoff.last_displacement_px`` already filters stationary upstream pendings, and a real piece near C3-exit can handoff before it has covered the full 5° whitelist arc — requiring confirmed_real at notify_track_death would break that edge-case.

… driver Two emergency back-pressure gates so the machine doesn't pile pieces onto C4 while upstream filtering stabilises: - admission.py: MAX_CLASSIFICATION_CHANNEL_DETECTION_CAP=3 enforces a hard admission block on C3 once raw YOLO sees 3+ bboxes on the carousel, independent of transport/zone state. Protects the pipeline when the whitelist hasn't confirmed tracks yet but C4 is physically full. - ch2_separation.py: CH2_SEPARATION_ENABLED=False kill-switch on the slip-stick driver. The CW/CCW rocking was firing even for 2 pieces on C2 with no real cluster; hard-disable until the cluster gate is tightened.

Snapshot of where tonight's ghost-elimination rework left the repo. **Pipeline is NOT working end-to-end.** Captured as WIP so we can reset with a clear head tomorrow. ### What works - Vulkan-wired ncnn inference (6f2c307) — YOLO off CPU. - MOG2 thread skipped when detection algorithm is dynamic (9bbf3e8) — CPU drops from ~600% to ~17% in idle pipeline. - Hard cap of 3 raw C4 detections blocks C3 from piling the carousel (a636f9d). - Ch2 slip-stick separation driver hard-disabled via kill-switch flag (a636f9d). ### What is broken - **Whitelist refactor (706a24d + a213794) is too lax.** The `_evaluate_confirmed_real` check uses full-path median(first-5) vs median(last-5) centroid drift, which grows monotonically for ghost tracks that live forever now that the stagnant-false-track filter was removed. In a 45 s window on a physically clean C4 the pipeline still mints ~27 confirmed dossiers and ~24 segments. Expected fix: rolling window (~1.5 s) + dormancy kill when a track shows no progress for ~3 s. - **CPU regression to ~350 %.** MOG2 was not the only hog; whitelist adds per-frame path-evaluation overhead and trackers never die naturally now, so active-track count grows. - **dist = 0, multi_drop_fail = 3 out of 5 pieces seen** — classify success 0 %. Either cluster arrivals at C4 or downstream classify pipeline stalled. Root cause not isolated tonight. - Startup-purge (`classification_channel/idle.py`) did not fire on reboot — C4 had to be emptied by hand. Either the Idle state's `transport.activePieces()` early-skip catches too much, or the classification state machine was stuck in a previous idle snapshot. Needs diagnosis. - Runtime-stats snapshot appeared stale for minutes at a time across restarts — `main_to_server_queue` + broadcast pipe may have wedged. ### Open workstreams for next session 1. Rolling-window + dormancy fix for `_evaluate_confirmed_real` (polar_tracker.py) so unconfirmed ghost tracks die and never confirm via cumulative drift. 2. Investigate CPU regression post-whitelist — profile which calls dominate (suspect: per-frame path comparisons + embedding EMA for ever-living tracks). 3. Classify 0 % success: trace one C4 piece through `classification_channel` state machine to see where crops / hive recognition drop out. 4. Feeder over-eager on C2: cluster-gate needs tightening, not just the slip-stick disable. 5. Runtime-stats staleness — audit the main-loop broadcast path. ### How to reset tomorrow - `git log --oneline sorthive ^main` lists all ghost-elimination commits. Rollback candidates: 706a24d, a213794 (whitelist core + propagation). The Vulkan + MOG2 + admission-cap + ch2-disable fixes (6f2c307, 9bbf3e8, a636f9d) are keepers. - Full DB + blob purge + supervisor restart is already scripted inline in this session's notes.

…ckpoint

Documentation baseline for the runtime rearchitecture on sorthive. All runtime work after this commit targets the new architecture: - runtime-architecture.html (canonical visual vision) - docs/lab/runtime-rebuild-design.md (engineering companion) The LEGACY map (runtime-current-state-map.md) captures the pre-rebuild state as migration reference only. Added: docs/lab/runtime-current-state-map.md — IST-state, LEGACY-banner, file:line cites docs/lab/runtime-rebuild-design.md — 4-layer x 5-column design, contracts, 7-step C4<->Distributor handshake, strategy plugins, 6-phase migration Changed: docs/lab/software-architecture-decisions/index.md — reference the two canonical docs Removed: docs/lab/software-architecture-decisions/vision-camera-runtime-refactor.md (superseded; broader rebuild replaces narrower VisionManager-split proposal) REVIEW-2026-04-16.md (obsolete one-off)

First phase of the runtime rearchitecture per docs/lab/runtime-rebuild-design.md. Contracts-only; no algorithm ports, no runtime implementations. Added under software/sorter/backend/rt/: contracts/ Feed, Zone (Rect/Polygon/Polar), Detector, Tracker, Filter, FilterChain, Classifier, Calibration, Runtime (ABC), AdmissionStrategy, EjectionTimingStrategy, RulesEngine, EventBus + StrategyRegistry with 8 register_* decorators (detector/tracker/filter/classifier/calibration/admission/ ejection_timing/rules_engine) config/ Pydantic v2 schema (SorterConfig with feeds/pipelines/ runtimes/classification/distribution) + TOML+SQLite loader with deep-merge override events/ InProcessEventBus — bounded queue (maxsize=2048), dispatcher thread, drop-oldest on overflow, fnmatch glob topics, drain() for synchronous tests. 9 typed topic constants context.py RuntimeContext DI container (replaces shared_state.py globals) __init__.py build_runtime() stub — NotImplementedError until phase 2+ tests/ 8 tests (registry round-trip, config validation, event bus subscribe/publish/glob/unsubscribe); all green under uv run pytest rt/tests/ Placeholder __init__.py for later phases: perception/ (detectors, trackers, filters, classifiers, calibration) runtimes/ (+ _states/) rules/ classification/ coupling/ hardware/ irl/ 1243 LoC total; every file under 250 LoC. No imports from or edits to the existing backend/ tree.

…+ filters) First perception stack portiert in das rt/-Contract-Frame. Phase 2a von 6, self-contained in rt/perception/ — keine main.py-Integration, kein Shadow-Mode, keine Berührung des alten backend/-Trees. Added under software/sorter/backend/rt/perception/: feeds.py CameraFeed adapter — pulls frames from legacy backend.vision.camera_service (explicit temporary bridge until rt/hardware/ lands). Monotonic frame_seq per feed zones.py build_zone(ZoneConfig) factory — Rect/Polygon/Polar detectors/mog2.py Mog2Detector port (registered "mog2"). Rect+Polygon masks ok; Polar raises NotImplementedError stub trackers/polar.py PolarTracker port (registered "polar"). Polar Kalman + Hungarian matching with cartesian fallback. Whitelist confirmation (angular >=5 deg OR centroid drift >=40px) filters/size.py SizeFilter (registered "size") filters/ghost.py GhostFilter (registered "ghost") — pulls confirmed_real gating out of the tracker into an explicit filter step pipeline.py PerceptionPipeline (detect -> track -> filter) + build_pipeline_from_config factory that wires PolarZone geometry into the polar tracker pipeline_runner.py PerceptionRunner — daemon thread per feed, periodic pipeline execution, duplicate frame_seq skip, error-threshold circuit-breaker, EventBus publish of PERCEPTION_TRACKS + HARDWARE_ERROR Subpackage __init__.py files updated with side-effect imports so `import rt.perception` registers all strategies. Tests (21 new, 29 total green): test_mog2_detector.py synthetic frames + MOG2 warmup + detection test_polar_tracker.py polar + cartesian fallback, confirmation gating test_filters.py size passes/blocks, ghost filters unconfirmed test_pipeline.py end-to-end detect -> track -> filter test_pipeline_runner.py start/stop lifecycle, duplicate-seq skip, circuit-breaker on repeated failures Verified: uv run pytest rt/tests/ -v -> 29 passed, 0 failed, 0.96s. Known limits (all explicit in code): - PolarZone masking in Mog2Detector is a NotImplementedError stub - PolarTracker does not carry OSNet appearance embeddings or history buffer (scope-excluded; those land with handoff in a later phase) - global_id == track_id for now; PieceHandoffManager is separate Phase 2b hooks ready: PerceptionRunner lifecycle handles, EventBus publishing, non-blocking latest_tracks() accessor, build_pipeline_from_config factory.

Live-hardware finding: the C4 transport move is configured for up to 24 000 µsteps/s but only reaches ~3% of that in practice — 53 µsteps per pulse is far too short to clear the acceleration ramp. The motor sits in a short triangular profile with an abrupt stop at the end, and pieces on the carousel slide because the stop's inertia outpaces friction. New PROFILE_TRANSPORT_PULSED splits a single transport_move into sub-pulses of configurable size with explicit settle pauses in between. Each sub-pulse is still a small triangular move, but the pause lets the piece re-grip the carousel before the next kick. Matches the manual pattern Marc observed working better during hand testing. Opt in via RT_C4_TRANSPORT_PROFILE=pulsed; sub-pulse size and settle duration are env-tunable (RT_C4_TRANSPORT_SUB_PULSE_DEG, RT_C4_TRANSPORT_SETTLE_MS) so we can A/B without a rebuild. Default profile stays "transport" — no behaviour change unless opted in.

Marc reported the Recent Pieces list was "wuseln" — pieces flickering in and out, reordering on nearly every frame. Measured at live sorting: 19 enters / 18 leaves / constant reorders over 15 s. After this change, the same 15 s window shows 0 / 0 / 0. Two separate instability sources addressed: - **Reorder source.** `RecentObjects.upcoming` was sorted by `exitDistanceDeg` — the piece's current angle relative to the drop point. Every small angle update from the tracker swapped adjacent rows. Switched to a stable FIFO sort on `first_carousel_seen_ts`: newest at top, oldest (next to drop) at the bottom right above the distributed divider. Order changes only when pieces actually enter or leave. - **Enter/leave source.** `MachineManager.handleKnownObject` ran each incoming event through `shouldKeepRecentObject` (== the display filter) and **removed** existing entries that no longer matched. A piece rotating past the drop zone flipped `classification_channel_zone_state` from `active` to `superseded` and got evicted from storage. The upcoming `$derived` then emitted "leave" + "enter" when it cycled back. Fixed by always updating-in-place on subsequent events: the display filter decides what to render, the storage layer just keeps the history. First- insertion eviction still uses the filter. Also dropped the 15 s "same-gid recently distributed" dedup in the upcoming list — it was a workaround for the pre-BoTSORT tracker splitting one physical piece across many global_ids, and now just hides active pieces for 15 s whenever their gid happens to match a just-distributed entry.

Live audit: the widget was showing nonsense because every dossier admission on C4 bumped the counters, and with BoTSORT keeping a stable tracked_global_id across carousel rotations the same physical piece can produce dozens of dossiers. Live sample before fix: pieces_seen=1092, classified=797, distributed=248 — against roughly ~17 actual physical pieces in the carousel. Reading "87 ppm feed rate" when only 17 pieces have ever been fed is actively misleading. Backend: RuntimeStatsCollector.snapshot() now also reports - unique_pieces_seen - unique_pieces_classified - unique_pieces_distributed by folding ``tracked_global_id`` into sets. Dossier-level counters (pieces_seen / classified / distributed) stay for the Totals footer so the attempt-level numbers remain visible. Widget: - Feed rate uses unique_pieces_seen / running_time_s. - Distributed / min uses unique_pieces_distributed / running_time_s instead of the dossier-based throughput.overall_ppm. - Multi-drop denominator is finished classifications, not pieces_seen (pieces_seen was inflated by admissions of pieces still on C4). - New "Unique pieces" tile: "4 of 17 seen" — the honest answer to "how many pieces has the sorter actually handled?". - Dropped the broken "C4 active ppm" tile. ``channel_throughput. classification_channel.active_ppm`` is permanently None because ``observeChannelExit`` was never wired into the rt graph after the cutover; fixing that is a separate task. - Fixed a dead-code lookup on ``outcomes.classified?.active_ppm`` (the real key is ``classified_success``). Totals footer still shows the raw dossier counts so operators can see the inflation if they want to debug it, but the headline tiles no longer double-count re-circulating pieces.

Follow-up to the unique-piece-count fix. The widget was still showing nonsense rates right after a backend restart: unique_pieces_seen is cumulative across sessions (backed by the piece-dossier DB) but running_time_s resets to zero each start, so ``cumulative_count * 60 / running_time_s`` inflated wildly for the first few minutes and then slowly decayed. Backend: - ``throughput.recent_ppm`` — pieces distributed per minute observed over a rolling 5-minute window, computed from the actual wall-clock span of the ``distributed_at`` timestamps. Session-independent. - ``throughput.feed_recent_ppm`` — same idea keyed off the ``first_carousel_seen_ts`` of each unique tracked_global_id, so the feed rate reflects the real cadence at which new physical pieces enter C4 rather than a ratio against a just-reset clock. - Snapshot lazily syncs ``_is_running`` from the rt_handle's ``started`` / ``paused`` flags so running_time_s ticks even when the command-queue ``setLifecycleState`` path races or drops the event (observed live: rt reported not-paused while the collector still claimed "initializing"). Widget: ``dist_rate_ppm`` and ``feed_rate_ppm`` prefer the new recent-window values, falling through to the cumulative-over-time numbers only when no recent events exist.

Now that BoTSORT+ReID is the production primary tracker on every feed, painting both the primary boxes (solid green) and the shadow boxes (dashed magenta) on the same stream is just visual noise. Operators need the live boxes from the tracker that's actually making sorting decisions — the shadow comparison is debug info that belongs on ``/api/rt/status`` (still wired there) or in offline benchmark runs, not on the camera feed. RuntimeAnnotationProvider returns only RuntimeTrackOverlay + RuntimeGhostOverlay by default. The shadow overlay class stays in the codebase so ad-hoc debugging can re-add it, just not on the default provider chain. Tests updated to match.

Live observation: with BoTSORT as the production tracker, tracks with hit_count well into the thousands were sitting at confirmed_real=False because the rotation-window verdict only fires when c2/c3/c4 happens to publish a PERCEPTION_ROTATION while the tracker has accumulated 6+ samples in that window. The verdict gate was a useful ghost filter when motion-only trackers regularly birthed false positives on apparatus pixels — with appearance-aware association it just hides the live tracker output operators want to see. RuntimeTrackOverlay now draws the ID label for every non-ghost track. Box colour still reflects the verdict (green = confirmed, dim grey = pending) so the visual signal is preserved, but the ID is always visible. Ghosts continue to be filtered out and rendered separately by RuntimeGhostOverlay.

mneuhaus added 30 commits April 21, 2026 18:27

sorter(tracked): persist piece dossiers and truthful lifecycle state

bc4f5be

sorter: ignore local sqlite backup snapshots

1799794

Clean-slate before the dossier refactor left a 1 GB local_state.sqlite.backup-<ts> file behind as a safety net. Keep those around locally but don't commit them.

sorter(api): filter ghost stubs by default in tracked-pieces list

6eafe54

sorter(tracker): motion-gate piece_uuid binding

d539641

sorter(vision): segment archival db-lookup + motion-gate

a182cef

sorter(tracker): enable persistent ghost memory for c2/c3

076d82d

sorter(ui): defense filter + show-stubs toggle for tracked pieces

f44a784

sorter(ml): wire ncnn vulkan preference to production factory

6f2c307

chore: gitignore sqlite backups

a787343

sorter(ui): adapt zone-editor canvas to live camera resolution

7a3f79e

wip(docs): architecture vision page — detailed pre-simplification che…

91969ee

…ckpoint

vercel Bot deployed to Preview April 24, 2026 18:20 View deployment

mneuhaus added 2 commits April 24, 2026 20:36

Redesign tracked pieces list as a sortable dense table

eb31488

vercel Bot deployed to Preview April 24, 2026 18:48 View deployment

mneuhaus added 2 commits April 24, 2026 20:49

Bump tracked-pieces font sizes and cache list across navigation

c997618

Add track-derived transport velocity control

5ac3b65

vercel Bot deployed to Preview April 24, 2026 18:52 View deployment

Add stepper thermal guard

47d61fd

vercel Bot deployed to Preview April 24, 2026 19:11 View deployment

Make stepper driver mode exclusive

900070c

vercel Bot deployed to Preview April 24, 2026 19:16 View deployment

Persist stepper microstep settings

2255578

vercel Bot deployed to Preview April 24, 2026 19:23 View deployment

Verify CoolStep register writes

6a7129c

vercel Bot deployed to Preview April 24, 2026 19:31 View deployment

Expose TMC write counters for CoolStep

a6855fc

vercel Bot deployed to Preview April 24, 2026 19:33 View deployment

Show only live C4 tracks in recent pieces

6860ce1

vercel Bot deployed to Preview April 24, 2026 19:45 View deployment

vercel Bot deployed to Preview April 24, 2026 19:57 View deployment

vercel Bot deployed to Preview April 24, 2026 20:11 View deployment

vercel Bot deployed to Preview April 24, 2026 21:38 View deployment

Drive recent pieces from live C4 tracks

4357e41

vercel Bot deployed to Preview April 24, 2026 21:42 View deployment

vercel Bot deployed to Preview April 24, 2026 21:59 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RT runtime cutover, generic purge strategy, and tracked-pieces rebuild#129

RT runtime cutover, generic purge strategy, and tracked-pieces rebuild#129
mneuhaus wants to merge 222 commits intomainfrom
sorthive

mneuhaus commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mneuhaus commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Recent additions (on top of the base cutover work)

BoTSORT + OSNet ReID primary tracker

Hive integration

Gemini condition teacher

Runtime + channels

Extractions + sweep

Base cutover (what opened this PR)

Runtime cutover (RT)

Generic purge architecture

Tracked pieces rebuild

Service extractions + ports (principle: composition wires, services coordinate)

Dead-code sweep

Bug fixes surfaced by live-debug workflow

Docs

Validation

Live hardware run (2026-04-24 evening, ~5 min total)

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mneuhaus commented Apr 23, 2026 •

edited

Loading