Skip to content

RT runtime cutover, generic purge strategy, and tracked-pieces rebuild#129

Open
mneuhaus wants to merge 222 commits intomainfrom
sorthive
Open

RT runtime cutover, generic purge strategy, and tracked-pieces rebuild#129
mneuhaus wants to merge 222 commits intomainfrom
sorthive

Conversation

@mneuhaus
Copy link
Copy Markdown
Collaborator

@mneuhaus mneuhaus commented Apr 23, 2026

Summary

This is the current sorthive branch delta against main. It carries the runtime rebuild from shim-heavy legacy wiring onto the new RT runtime, a generic cross-channel purge architecture, a fully persisted tracked-pieces view with per-segment debug crops, a broad service-extraction sweep, the Hive integration with Gemini-backed condition labeling, and a BoTSORT + OSNet ReID primary tracker that replaces motion-only association.

Architectural direction is codified in [lab/sorter-architecture-principles]({{ '/lab/sorter-architecture-principles/' | relative_url }}) — the docs tree has been realigned so the principles doc is the active guide.

Recent additions (on top of the base cutover work)

BoTSORT + OSNet ReID primary tracker

  • rt/perception/trackers/boxmot_reid.py wraps boxmot's BoTSORT with OSNet ReID and emits appearance_embedding on each Track. Weights (osnet_x0_25_msmt17, ~3 MB) auto-download into blob/reid_models/ on first use.
  • Primary tracker key is now botsort_reid (override via RT_PRIMARY_TRACKER_KEY); shadow is turntable_groundplane. Motion-only trackers stay registered as benchmarks.
  • Pure-geometry helpers factored into rt/perception/trackers/_geometry.py so the three legacy adapters share one implementation instead of three near-identical copies.
  • Intra-channel re-acquisition (including the Carousel "same-piece reappearing after occlusion" case) is now the tracker's job — previously paper-over logic in the transit stitcher.
  • TrackTransitRegistry still owns the cross-channel C3 → C4 hand-off but now carries the source track's embedding and gates claim() by cosine similarity (default 0.55). Catches the documented failure mode where a red slope's track got merged with an unrelated white-piece track. Missing embeddings never block, so pure-motion fleets keep working.

Hive integration

  • Hive linking + per-target queue management UI, live queue polling, retry/purge controls
  • Queue page filters by sample type (gemini boxes / condition / classification / other) and ships a unified Backfill dialog for upload backfills and condition-teacher runs
  • Hive training reports surface on each Hive model; balanced-dataset tooling lands alongside
  • Obsolete Hive register endpoint removed

Gemini condition teacher

  • condition_teacher.py adapter scans persisted piece crops and asks Gemini for composition (single_part / compound_part / multi_part / …) and condition (clean_ok / minor_wear / dirty / damaged / trash_candidate), emitting a condition sample type
  • sample_payloads.build_sample_payload produces a first-class condition analysis block
  • Hive uploader classifies each queued sample (teacher_detection / condition / classification / other) so the queue can filter and backfill by type
  • Background auto-teacher thread drives it on recent piece crops with a per-piece crop-count ceiling; Queue UI surfaces its live state
  • Hive frontend renders a SampleConditionCard on sample detail + review pages

Runtime + channels

  • Runtime track transit stitching across channels (with registry bootstrap fix and dossier-stitching audit)
  • Runtime ghost gating softened; confirmed tracker identities stay real
  • C4 piece-crop capture made more aggressive; C-channel sample capture/upload QA improved
  • Direct motion controls + named motion profiles for sample transport (C1–C4), continuous-motion ramping tuned
  • C4 hardware command backlog avoidance + unjam watchdog; handoff callbacks replaced by HandoffPort
  • Sorter runtime sidebar simplified; Carousel relabeled as Classification Channel; tracked-piece detail page redesigned around a verdict band

Extractions + sweep

  • Camera-settings service out of cameras router (and out of sidebar)
  • Stepper HTTP calls moved into a service; sorting-profile reads collapsed; app-settings persistence behind a helper
  • Empty frontend helper wrappers trimmed; generated local-state DBs ignored

Base cutover (what opened this PR)

Runtime cutover (RT)

  • RT contracts, registry, event bus, perception pipeline, and runtime handle as the primary runtime path
  • Channel runtimes for C1/C2/C3/C4 plus downstream distribution and classifier wiring
  • Detection endpoints and registry callers ported off the legacy path; detection shim deleted
  • MJPEG pipeline removed — WebSocket is the only stream
  • Runtime startup restored, detection previews and current-frame testing fixed after restarts

Generic purge architecture

  • New PurgePort contract and PurgeStrategy abstraction
  • C2, C3, and C4 bound onto the generic strategy (was previously C4-only, ad-hoc)
  • Bootstrap coordinator wires strategies uniformly; C234 coordinator extracted into a maintenance service
  • Startup purge forces owned-sweep when no exit_track exists
  • Orchestrator caps upstream capacity by downstream headroom (true backpressure)

Tracked pieces rebuild

  • Single /api/tracked/pieces endpoint feeds list + detail + modal (no more parallel code paths)
  • SegmentRecorder persists per-track path points, channel geometry, and wedge crops to SQLite + disk
  • Detail page renders track path and wedge composites directly from DB
  • Wedges sized dynamically from bbox (radial + angular) and gated for non-overlap — no more fixed 15° sectors
  • List view preview shows the latest captured wedge crop, not the intake snapshot
  • 4-char base36 track labels — tracker overlay and tracked-pieces UI speak the same language

Service extractions + ports (principle: composition wires, services coordinate)

  • SorterLifecyclePort replaces hardware callable-globals
  • rt/projections package with piece_dossier subscriber
  • Detection config read/save, polygon save, runtime_stats access, camera annotation introspection → dedicated services
  • rt/hardware/channel_callables.py, rt/config/channels.py, perception runner builder → own modules
  • blob_manager forwarder layer dissolved

Dead-code sweep

  • vision_manager + controller_ref compat stubs and branches removed from cameras/steppers/training/sorting-profile/detection-config/camera_preview_hub
  • runtime_variables module + endpoint path removed (hidden config store violating principle 4)
  • Dead debug endpoints wired to legacy stubs deleted
  • Empty detection wrappers removed (and the anti-abstraction rule codified)
  • Dead imports, empty re-export __init__.py files, dead local variables (ruff F841) swept
  • Direct gc_ref.xxx field reads removed from routers

Bug fixes surfaced by live-debug workflow

  • Detector zone filter tightened, confirmed_real gate dropped in C2/C3 (enables ring visibility)
  • Per-channel polygon resolution and crop scaling fixed
  • Stepper-init lambda loop-var binding + storage-layer hardware save NameError + layer-sync setattr-with-constant

Docs

  • Lab index points at sorter-architecture-principles
  • docs/sorter/architecture.md no longer carries the stale pre-RT component map; it's a short pointer into the principles doc
  • Old runtime-rebuild design / current-state map / audit HTML removed (~4200 LOC gone)

Validation

  • rt test suite green (363 passed) with the new test_boxmot_reid_tracker + embedding-aware test_track_transit + C3 embedding forwarding test
  • Condition-teacher + hive-uploader test suites green
  • Hive upload round-trip test covering condition_sample payload passing

Live hardware run (2026-04-24 evening, ~5 min total)

metric value
pieces_seen 501
classified 216
unknown 214
tracker_id_switch_suspect_total 0
distributed 0 (distributor bottleneck — separate bug)
  • BoTSORT adapter loaded cleanly on all three feeds (tracker=botsort_reid); ReID weights auto-downloaded to blob/reid_models/osnet_x0_25_msmt17.pt on first backend boot
  • 512-d OSNet appearance embeddings confirmed flowing to each live Track via the extended /api/rt/tracks/{feed_id} surface
  • Identity check on 150 archived dossiers: every unique tracked_global_id resolved to exactly one classified part_id/color combo — the Police-vs-Slope class of identity swap did not reproduce. BoTSORT keeps the same gid across full carousel rotations of the same physical piece, which is the intended behaviour and matches operator expectations.
  • Side-finds from this run (committed in the same branch):
    • runtime-stats widget (bottom-right UI) was reporting empty state_machines after the RT cutover — nothing in the rt graph was calling RuntimeStatsCollector.observeStateTransition. Fixed by adding a state_observer callback on BaseRuntime and wiring it in bootstrap; widget now shows current state + transition counts + per-state time shares for c1/c2/c3/c4/distributor.
  • Addressed in follow-up commits on this branch:
    • Distributor backoff (82980aa0): HandoffPort gets an available_slots() probe, each dossier has a 250 ms retry cooldown after a busy rejection. C4 stops hammering the distributor at tick rate — the 2.6% acceptance rate was pure bookkeeping noise from handoff_request being called repeatedly while the distributor worked through a single piece.
    • Pulse-settle-pulse C4 transport profile (fde8c8c): new PROFILE_TRANSPORT_PULSED splits a transport move into sub-pulses with explicit settle pauses between them, so pieces on the carousel can re-grip between kicks instead of sliding under the abrupt stops. Opt-in via RT_C4_TRANSPORT_PROFILE=pulsed; sub-pulse size (RT_C4_TRANSPORT_SUB_PULSE_DEG, default 2°) and settle duration (RT_C4_TRANSPORT_SETTLE_MS, default 120 ms) are env-tunable so Marc can A/B without a rebuild. Default profile stays transport — no behaviour change unless opted in.

Notes

  • Intentionally a large integration PR — the full branch delta from main (200+ commits)
  • Architectural principles are the durable handle for future iteration; this PR is the point where the runtime is architecturally coherent enough to keep shipping from

mneuhaus added 30 commits April 21, 2026 18:27
Wumms snapshot of ~65 files from ongoing throughput/UX iteration:

Backend:
- classification_channel cleanup (running/recognition/ejecting), ghost
  handling, ignored regions, tracker exemption for pending-drop
- local_state: ghost ignore memory + dossier groundwork
- piece_transport: motion-sync metrics, distribution positioning fix
- vision: overlay scaling for 4K feeds, stream overlay cleanups
  (helper IN/white-red DROP removed), camera_feed/service tweaks,
  tracking history + polar_tracker refinements
- server/api + detection router: known-objects endpoints, dossier
  read paths
- new: role_aliases, overlays/scaling, tests/test_distribution_positioning

Frontend:
- RecentObjects + TrackPathComposite + tracked/[uuid] restored from
  APFS snapshot, burst filmstrip, KnownObject-by-uuid fallback
- recent-pieces.ts extracted (dedup key logic)
- settings sections updates, CameraFeed crop rotation gate
- stores/sortingProfile + manager.svelte.ts wiring

gitignore:
- exclude stray tmp-*.png / dashboard-*.png / sorter-dashboard-*.png
  debug screenshots from repo root

Safety-net commit before the unified piece-dossier refactor (plan at
~/.claude/plans/lass-uns-gerade-einmal-wiggly-parrot.md). Intent is
recoverability, not clean history — the next commits split this work
by subsystem.
- new piece_segments table persisting per-segment polar track data,
  sector snapshots, and recognition results per physical piece
- indexes for (piece_uuid, sequence) and (session_id, role, first_seen_ts)
- helpers remember_piece_segment / list_piece_segments /
  clear_piece_segments_for_session
- additive only: existing piece_dossiers untouched
- phase 3 will wire this table from the vision manager

Phase 2 of unified piece-dossier plan.
- piece_transport.registerIncomingPiece now cascades tracked_global_id
  through active lookup, DB dossier lookup, explicit piece_uuid, and
  finally fresh uuid — preventing double-register on tracker glitches
- new resumeExistingPiece() rehydrates a dossier from SQLite into
  active pieces
- new KnownObject.from_dossier() classmethod reciprocal to event serialiser
- classification_channel running.py _recoverExistingTrackedPieces uses
  resumeExistingPiece on DB-hit; _registerNewIntakePiece guards against
  double-register at call-site too
- tests for idempotent register + dossier rehydration

Phase 1 of unified piece-dossier plan.
- blob_manager: piece_crops_dir + write_piece_crop, best-effort on OSError
- PieceHistoryBuffer: on_segment_archived callback slot
- vision_manager: wires archive callback -> resolves piece_uuid via
  piece_transport, creates stub dossier on miss, writes sector_snapshot
  wedge/piece JPEGs to blob/piece_crops/<uuid>/seg<n>/, then
  remember_piece_segment() with paths instead of b64
- api.py: GET /api/piece-crops/{uuid}/seg{n}/{kind}/{idx}.jpg
  FileResponse with path-traversal guard and long cache
- piece_transport: bindStubPieceUuid helper
- tests for roundtrip, stub-dossier creation, and error tolerance

Phase 3 of unified piece-dossier plan.
- _LiveTrack.piece_uuid assigned when C3 track reaches 4 stable hits
- stub dossier written to SQLite via remember_piece_dossier on first
  bind, so downstream lookups find the record from the moment the
  piece becomes reliably tracked
- PendingHandoff and handoff.register_track propagate piece_uuid
  across C3 -> Carousel transition
- TrackAngularExtent surfaces piece_uuid to classification channel
- running.py passes piece_uuid through registerIncomingPiece so the
  idempotent cascade (phase 1) reuses the same UUID on C4 intake
- tests for binding threshold, handoff preservation, and register
  idempotency with explicit piece_uuid

Phase 4 of unified piece-dossier plan.
- local_state.build_piece_detail_payload merges dossier + segments
- GET /api/tracked/pieces/{uuid} now reads DB first, falls back to
  runtime LRU, returns 404 only when truly unknown — kills the
  spurious "Track Not Found" on still-known pieces
- response adds track_detail.{segments, live} block; live tracker
  details merged when the piece is still active
- GET /api/tracked/pieces items carry has_track_segments flag
- bulk segment-count helper keeps list endpoint O(1) per row
- tests for persisted-after-restart, live-merge, and fallback paths

Phase 5 of unified piece-dossier plan.
- recent-pieces.ts key logic: piece_uuid first, tracked_global_id
  fallback, recentPhysicalKeyOrNull() returns null when neither is
  available (drop the item instead of inventing keys)
- RecentObjects + tracked/+page dedupe skip null-key items
- tracked/[uuid] reads track_detail.segments from /api/tracked/pieces
  response, only hits /api/feeder/tracking/history when
  track_detail.live === true
- TrackPathComposite prefers jpeg_path (-> /api/piece-crops/...) over
  jpeg_b64 for snapshot rendering
- removed "Track Not Found" label; replaced with neutral loading/error
  states that reflect the DB-first reality

Phase 6 of unified piece-dossier plan.
- cursor-pointer on camera tile buttons (USB + network) so hover
  signals clickability, disabled:cursor-not-allowed while saving
- reassign-confirm modal raised above camera picker so the secondary
  dialog actually surfaces instead of being stacked under
- camera stream auto-refreshes after assign by propagating
  feedRevision into the background feed component (key or query param)
- preview-unavailable fallback on tiles whose MJPEG stream never
  produces a frame (was showing bare broken-image alt)
- tile MJPEG status callback uses direct property assignment
  (tileStreamStatus[idx] = status) instead of Object.spread — the
  spread cloned the whole record on every frame for 6 parallel
  streams, saturating Svelte's reactivity graph and freezing the
  renderer before saveCameraRole's fetch could resolve
- close camera picker unconditionally after save rather than gating
  on cameraError; an error now shows as inline alert, but the modal
  no longer stays open silently while the backend did accept the
  assignment
- new CameraPreviewHub broadcaster: one VideoCapture per device,
  fan-out to N subscribers via asyncio.Queue at ~10fps preview rate
- new /ws/camera-preview/{index} websocket endpoint — clients
  receive raw jpeg bytes per message
- refuses subscription when device is owned by vision_manager
  (primary role feeds), to avoid device-handle conflicts
- new wsJpegStream svelte action mirroring the mjpeg action API
  (status callback, reconnect, destroy)
- ZoneSection modal tiles switch to wsJpegStream; the main
  zone feed and dashboard feed remain on MJPEG
- legacy GET /api/cameras/stream/{index} kept as fallback

Eliminates the 6-connection-per-host HTTP/1.1 pool exhaustion
that froze the picker whenever saveCameraRole's POST couldn't
acquire a socket.
Clean-slate before the dossier refactor left a 1 GB
local_state.sqlite.backup-<ts> file behind as a safety net.
Keep those around locally but don't commit them.
motion_confirmed was a sticky latch — once a track briefly displaced
past the 18 px birth threshold, the stagnant-false-track filter was
disarmed for that track forever. Apparatus ghosts that wiggled during
autoexposure settle at startup latched True and then blocked the
dropzone indefinitely: feeder pipeline stalled on wait_chX_dropzone_clear
with no self-recovery path.

Add a role-agnostic recent-cartesian-stationary helper and short-circuit
the motion_confirmed gate: when a latched track has stayed within
RECENT_STATIONARY_MAX_DISPLACEMENT_PX (6 px) over RECENT_STATIONARY_MIN_COVERAGE_S
(1.8 s) of a 2.5 s window, allow the stagnant filter to fire. Legit
pieces pausing briefly mid-travel (sub-second) are unaffected because
their recent window still contains the anchor from pre-pause motion.
… dynamic

Feeder roles on hive:* or gemini_sam never consume the MOG2 detector
output, yet a FeederAnalysisThread was still running per channel at
33Hz with BGR->Lab conversion + MOG2 background subtraction. Gate the
thread at init-time and on runtime algorithm-switch so it only runs
when static mog2 is actually selected.
…achinery

Fundamentally inverts the feeder-tracker gate: a track is no longer "real
until proven ghost" — it's now "unknown until proven real". Every tracked
piece starts with ``confirmed_real=False`` and flips sticky-True only
after demonstrating monotonic polar-angular progress ≥ 5° or centroid
drift ≥ 40 px across a ≥ 6-sample window. Apparatus ghosts (screws,
reflections, guides) physically cannot clear that bar — their jitter is
fixed-position, not monotonic.

Dossier mint + segment archival gate on ``confirmed_real`` instead of
the Phase-7 displacement thresholds. Everything removed in a single
tombstone:

 * stagnant_false_track filter (all flavours: carousel polar, motion-
   confirmed recovery, pending-drop protection);
 * persistent_static_ghost_regions infrastructure (load/save/prune/
   suppress + role wiring + state_entries rows with a migration that
   purges any legacy rows on backend boot);
 * motion_confirmed latch + MIN_ANGULAR/PIECE_UUID_DISPLACEMENT gates
   in _maybe_bind_piece_uuid;
 * ANGULAR_SPAN < 3° motion-gate in _archive_segment_to_dossier_impl
   (DB-lookup-before-mint path stays).

_archive_segment_to_dossier_impl now consults the live tracker for
``confirmed_real``; if the originating track is already dead, falls
back to segment_sector_angular_span_deg ≥ 5° as a safe-but-lax check.
TrackedPiece exposes ``confirmed_real`` so downstream consumers can
filter on it in the follow-up propagation commit.
…e, handoff

Downstream consumers of the polar tracker now all gate on the new
``confirmed_real`` whitelist flag:

 * ``_channelDetectionsFromTracks`` filters tracks before they reach
   ``analyzeFeederChannels`` — this is the single chokepoint that drives
   ch2/ch3 dropzone occupancy, clump/massage decisions, and exit wiggle.
   Apparatus ghosts never populate that path anymore, so the feeder
   stops stalling behind them.
 * ``subsystems/feeder/feeding.py`` classification-channel admission:
   ``classification_channel_track_count`` counts only confirmed-real
   tracks (prevents a carousel apparatus ghost from pinning ch3).
 * ``subsystems/channels/c1_bulk.py`` c2 saturation: the bulk feeder's
   max-ch2-pieces cap counts only confirmed tracks (prevents a c2 guide
   ghost from halting bulk feed forever).
 * ``vision/overlays/tracker.py`` live overlay: unconfirmed tracks
   render in dim grey with no id chip; confirmed tracks keep their
   green/amber/magenta colour coding + ``#xxxx`` pill.

Handoff manager kept as-is. The existing ghost-reject gate on
``PendingHandoff.last_displacement_px`` already filters stationary
upstream pendings, and a real piece near C3-exit can handoff before it
has covered the full 5° whitelist arc — requiring confirmed_real at
notify_track_death would break that edge-case.
… driver

Two emergency back-pressure gates so the machine doesn't pile pieces onto
C4 while upstream filtering stabilises:

- admission.py: MAX_CLASSIFICATION_CHANNEL_DETECTION_CAP=3 enforces a
  hard admission block on C3 once raw YOLO sees 3+ bboxes on the carousel,
  independent of transport/zone state. Protects the pipeline when the
  whitelist hasn't confirmed tracks yet but C4 is physically full.

- ch2_separation.py: CH2_SEPARATION_ENABLED=False kill-switch on the
  slip-stick driver. The CW/CCW rocking was firing even for 2 pieces on
  C2 with no real cluster; hard-disable until the cluster gate is
  tightened.
Snapshot of where tonight's ghost-elimination rework left the repo.
**Pipeline is NOT working end-to-end.** Captured as WIP so we can reset
with a clear head tomorrow.

### What works

- Vulkan-wired ncnn inference (6f2c307) — YOLO off CPU.
- MOG2 thread skipped when detection algorithm is dynamic (9bbf3e8) —
  CPU drops from ~600% to ~17% in idle pipeline.
- Hard cap of 3 raw C4 detections blocks C3 from piling the carousel
  (a636f9d).
- Ch2 slip-stick separation driver hard-disabled via kill-switch flag
  (a636f9d).

### What is broken

- **Whitelist refactor (706a24d + a213794) is too lax.** The
  `_evaluate_confirmed_real` check uses full-path median(first-5) vs
  median(last-5) centroid drift, which grows monotonically for ghost
  tracks that live forever now that the stagnant-false-track filter
  was removed. In a 45 s window on a physically clean C4 the pipeline
  still mints ~27 confirmed dossiers and ~24 segments. Expected fix:
  rolling window (~1.5 s) + dormancy kill when a track shows no
  progress for ~3 s.
- **CPU regression to ~350 %.** MOG2 was not the only hog; whitelist
  adds per-frame path-evaluation overhead and trackers never die
  naturally now, so active-track count grows.
- **dist = 0, multi_drop_fail = 3 out of 5 pieces seen** — classify
  success 0 %. Either cluster arrivals at C4 or downstream classify
  pipeline stalled. Root cause not isolated tonight.
- Startup-purge (`classification_channel/idle.py`) did not fire on
  reboot — C4 had to be emptied by hand. Either the Idle state's
  `transport.activePieces()` early-skip catches too much, or the
  classification state machine was stuck in a previous idle snapshot.
  Needs diagnosis.
- Runtime-stats snapshot appeared stale for minutes at a time across
  restarts — `main_to_server_queue` + broadcast pipe may have wedged.

### Open workstreams for next session

1. Rolling-window + dormancy fix for `_evaluate_confirmed_real`
   (polar_tracker.py) so unconfirmed ghost tracks die and never
   confirm via cumulative drift.
2. Investigate CPU regression post-whitelist — profile which calls
   dominate (suspect: per-frame path comparisons + embedding EMA for
   ever-living tracks).
3. Classify 0 % success: trace one C4 piece through
   `classification_channel` state machine to see where crops / hive
   recognition drop out.
4. Feeder over-eager on C2: cluster-gate needs tightening, not just
   the slip-stick disable.
5. Runtime-stats staleness — audit the main-loop broadcast path.

### How to reset tomorrow

- `git log --oneline sorthive ^main` lists all ghost-elimination
  commits. Rollback candidates: 706a24d, a213794 (whitelist core +
  propagation). The Vulkan + MOG2 + admission-cap + ch2-disable fixes
  (6f2c307, 9bbf3e8, a636f9d) are keepers.
- Full DB + blob purge + supervisor restart is already scripted
  inline in this session's notes.
Documentation baseline for the runtime rearchitecture on sorthive.
All runtime work after this commit targets the new architecture:
  - runtime-architecture.html           (canonical visual vision)
  - docs/lab/runtime-rebuild-design.md  (engineering companion)

The LEGACY map (runtime-current-state-map.md) captures the pre-rebuild state
as migration reference only.

Added:
  docs/lab/runtime-current-state-map.md  — IST-state, LEGACY-banner, file:line cites
  docs/lab/runtime-rebuild-design.md     — 4-layer x 5-column design, contracts,
                                           7-step C4<->Distributor handshake,
                                           strategy plugins, 6-phase migration

Changed:
  docs/lab/software-architecture-decisions/index.md  — reference the two canonical docs

Removed:
  docs/lab/software-architecture-decisions/vision-camera-runtime-refactor.md
    (superseded; broader rebuild replaces narrower VisionManager-split proposal)
  REVIEW-2026-04-16.md (obsolete one-off)
First phase of the runtime rearchitecture per docs/lab/runtime-rebuild-design.md.
Contracts-only; no algorithm ports, no runtime implementations.

Added under software/sorter/backend/rt/:

  contracts/  Feed, Zone (Rect/Polygon/Polar), Detector, Tracker, Filter,
              FilterChain, Classifier, Calibration, Runtime (ABC),
              AdmissionStrategy, EjectionTimingStrategy, RulesEngine,
              EventBus + StrategyRegistry with 8 register_* decorators
              (detector/tracker/filter/classifier/calibration/admission/
              ejection_timing/rules_engine)
  config/     Pydantic v2 schema (SorterConfig with feeds/pipelines/
              runtimes/classification/distribution) + TOML+SQLite loader
              with deep-merge override
  events/     InProcessEventBus — bounded queue (maxsize=2048), dispatcher
              thread, drop-oldest on overflow, fnmatch glob topics,
              drain() for synchronous tests. 9 typed topic constants
  context.py  RuntimeContext DI container (replaces shared_state.py globals)
  __init__.py build_runtime() stub — NotImplementedError until phase 2+
  tests/      8 tests (registry round-trip, config validation, event bus
              subscribe/publish/glob/unsubscribe); all green under
              uv run pytest rt/tests/

Placeholder __init__.py for later phases:
  perception/ (detectors, trackers, filters, classifiers, calibration)
  runtimes/ (+ _states/)  rules/  classification/  coupling/
  hardware/  irl/

1243 LoC total; every file under 250 LoC. No imports from or edits to
the existing backend/ tree.
…+ filters)

First perception stack portiert in das rt/-Contract-Frame. Phase 2a von 6,
self-contained in rt/perception/ — keine main.py-Integration, kein Shadow-Mode,
keine Berührung des alten backend/-Trees.

Added under software/sorter/backend/rt/perception/:

  feeds.py          CameraFeed adapter — pulls frames from legacy
                    backend.vision.camera_service (explicit temporary bridge
                    until rt/hardware/ lands). Monotonic frame_seq per feed
  zones.py          build_zone(ZoneConfig) factory — Rect/Polygon/Polar
  detectors/mog2.py Mog2Detector port (registered "mog2"). Rect+Polygon
                    masks ok; Polar raises NotImplementedError stub
  trackers/polar.py PolarTracker port (registered "polar"). Polar Kalman +
                    Hungarian matching with cartesian fallback. Whitelist
                    confirmation (angular >=5 deg OR centroid drift >=40px)
  filters/size.py   SizeFilter (registered "size")
  filters/ghost.py  GhostFilter (registered "ghost") — pulls confirmed_real
                    gating out of the tracker into an explicit filter step
  pipeline.py       PerceptionPipeline (detect -> track -> filter) +
                    build_pipeline_from_config factory that wires
                    PolarZone geometry into the polar tracker
  pipeline_runner.py PerceptionRunner — daemon thread per feed, periodic
                    pipeline execution, duplicate frame_seq skip,
                    error-threshold circuit-breaker, EventBus publish of
                    PERCEPTION_TRACKS + HARDWARE_ERROR

Subpackage __init__.py files updated with side-effect imports so
`import rt.perception` registers all strategies.

Tests (21 new, 29 total green):
  test_mog2_detector.py    synthetic frames + MOG2 warmup + detection
  test_polar_tracker.py    polar + cartesian fallback, confirmation gating
  test_filters.py          size passes/blocks, ghost filters unconfirmed
  test_pipeline.py         end-to-end detect -> track -> filter
  test_pipeline_runner.py  start/stop lifecycle, duplicate-seq skip,
                           circuit-breaker on repeated failures

Verified: uv run pytest rt/tests/ -v  -> 29 passed, 0 failed, 0.96s.

Known limits (all explicit in code):
  - PolarZone masking in Mog2Detector is a NotImplementedError stub
  - PolarTracker does not carry OSNet appearance embeddings or history
    buffer (scope-excluded; those land with handoff in a later phase)
  - global_id == track_id for now; PieceHandoffManager is separate

Phase 2b hooks ready: PerceptionRunner lifecycle handles, EventBus
publishing, non-blocking latest_tracks() accessor, build_pipeline_from_config
factory.
Live-hardware finding: the C4 transport move is configured for up to
24 000 µsteps/s but only reaches ~3% of that in practice — 53 µsteps
per pulse is far too short to clear the acceleration ramp. The motor
sits in a short triangular profile with an abrupt stop at the end,
and pieces on the carousel slide because the stop's inertia outpaces
friction.

New PROFILE_TRANSPORT_PULSED splits a single transport_move into
sub-pulses of configurable size with explicit settle pauses in
between. Each sub-pulse is still a small triangular move, but the
pause lets the piece re-grip the carousel before the next kick.
Matches the manual pattern Marc observed working better during hand
testing.

Opt in via RT_C4_TRANSPORT_PROFILE=pulsed; sub-pulse size and settle
duration are env-tunable (RT_C4_TRANSPORT_SUB_PULSE_DEG,
RT_C4_TRANSPORT_SETTLE_MS) so we can A/B without a rebuild. Default
profile stays "transport" — no behaviour change unless opted in.
Marc reported the Recent Pieces list was "wuseln" — pieces flickering
in and out, reordering on nearly every frame. Measured at live sorting:
19 enters / 18 leaves / constant reorders over 15 s. After this change,
the same 15 s window shows 0 / 0 / 0.

Two separate instability sources addressed:

- **Reorder source.** `RecentObjects.upcoming` was sorted by
  `exitDistanceDeg` — the piece's current angle relative to the drop
  point. Every small angle update from the tracker swapped adjacent
  rows. Switched to a stable FIFO sort on
  `first_carousel_seen_ts`: newest at top, oldest (next to drop) at
  the bottom right above the distributed divider. Order changes only
  when pieces actually enter or leave.

- **Enter/leave source.** `MachineManager.handleKnownObject` ran each
  incoming event through `shouldKeepRecentObject` (== the display
  filter) and **removed** existing entries that no longer matched.
  A piece rotating past the drop zone flipped
  `classification_channel_zone_state` from `active` to `superseded`
  and got evicted from storage. The upcoming `$derived` then emitted
  "leave" + "enter" when it cycled back. Fixed by always
  updating-in-place on subsequent events: the display filter decides
  what to render, the storage layer just keeps the history. First-
  insertion eviction still uses the filter.

Also dropped the 15 s "same-gid recently distributed" dedup in the
upcoming list — it was a workaround for the pre-BoTSORT tracker
splitting one physical piece across many global_ids, and now just
hides active pieces for 15 s whenever their gid happens to match a
just-distributed entry.
Live audit: the widget was showing nonsense because every dossier
admission on C4 bumped the counters, and with BoTSORT keeping a
stable tracked_global_id across carousel rotations the same physical
piece can produce dozens of dossiers. Live sample before fix:
pieces_seen=1092, classified=797, distributed=248 — against roughly
~17 actual physical pieces in the carousel. Reading "87 ppm feed
rate" when only 17 pieces have ever been fed is actively misleading.

Backend: RuntimeStatsCollector.snapshot() now also reports
- unique_pieces_seen
- unique_pieces_classified
- unique_pieces_distributed
by folding ``tracked_global_id`` into sets. Dossier-level counters
(pieces_seen / classified / distributed) stay for the Totals footer so
the attempt-level numbers remain visible.

Widget:
- Feed rate uses unique_pieces_seen / running_time_s.
- Distributed / min uses unique_pieces_distributed / running_time_s
  instead of the dossier-based throughput.overall_ppm.
- Multi-drop denominator is finished classifications, not pieces_seen
  (pieces_seen was inflated by admissions of pieces still on C4).
- New "Unique pieces" tile: "4 of 17 seen" — the honest answer to
  "how many pieces has the sorter actually handled?".
- Dropped the broken "C4 active ppm" tile. ``channel_throughput.
  classification_channel.active_ppm`` is permanently None because
  ``observeChannelExit`` was never wired into the rt graph after the
  cutover; fixing that is a separate task.
- Fixed a dead-code lookup on ``outcomes.classified?.active_ppm``
  (the real key is ``classified_success``).

Totals footer still shows the raw dossier counts so operators can see
the inflation if they want to debug it, but the headline tiles no
longer double-count re-circulating pieces.
Follow-up to the unique-piece-count fix. The widget was still showing
nonsense rates right after a backend restart: unique_pieces_seen is
cumulative across sessions (backed by the piece-dossier DB) but
running_time_s resets to zero each start, so
``cumulative_count * 60 / running_time_s`` inflated wildly for the
first few minutes and then slowly decayed.

Backend:
- ``throughput.recent_ppm`` — pieces distributed per minute observed
  over a rolling 5-minute window, computed from the actual wall-clock
  span of the ``distributed_at`` timestamps. Session-independent.
- ``throughput.feed_recent_ppm`` — same idea keyed off the
  ``first_carousel_seen_ts`` of each unique tracked_global_id, so the
  feed rate reflects the real cadence at which new physical pieces
  enter C4 rather than a ratio against a just-reset clock.
- Snapshot lazily syncs ``_is_running`` from the rt_handle's
  ``started`` / ``paused`` flags so running_time_s ticks even when
  the command-queue ``setLifecycleState`` path races or drops the
  event (observed live: rt reported not-paused while the collector
  still claimed "initializing").

Widget: ``dist_rate_ppm`` and ``feed_rate_ppm`` prefer the new
recent-window values, falling through to the cumulative-over-time
numbers only when no recent events exist.
Now that BoTSORT+ReID is the production primary tracker on every feed,
painting both the primary boxes (solid green) and the shadow boxes
(dashed magenta) on the same stream is just visual noise. Operators
need the live boxes from the tracker that's actually making sorting
decisions — the shadow comparison is debug info that belongs on
``/api/rt/status`` (still wired there) or in offline benchmark runs,
not on the camera feed.

RuntimeAnnotationProvider returns only RuntimeTrackOverlay +
RuntimeGhostOverlay by default. The shadow overlay class stays in the
codebase so ad-hoc debugging can re-add it, just not on the default
provider chain. Tests updated to match.
Live observation: with BoTSORT as the production tracker, tracks with
hit_count well into the thousands were sitting at confirmed_real=False
because the rotation-window verdict only fires when c2/c3/c4 happens
to publish a PERCEPTION_ROTATION while the tracker has accumulated 6+
samples in that window. The verdict gate was a useful ghost filter
when motion-only trackers regularly birthed false positives on
apparatus pixels — with appearance-aware association it just hides
the live tracker output operators want to see.

RuntimeTrackOverlay now draws the ID label for every non-ghost track.
Box colour still reflects the verdict (green = confirmed, dim grey =
pending) so the visual signal is preserved, but the ID is always
visible. Ghosts continue to be filtered out and rendered separately
by RuntimeGhostOverlay.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant