Tracking Algorithm Deep Dive¶

This page documents the current tracking implementation as it exists in the codebase today.

Primary modules:

hydra_suite.core.tracking.worker
hydra_suite.core.filters.kalman
hydra_suite.core.assigners.hungarian
hydra_suite.core.post.processing
hydra_suite.core.identity.runtime_api

Design Goals¶

The current implementation optimizes for four things at the same time:

stable online tracking during long videos,
reproducible reruns through detection caching,
conservative identity handling in difficult scenes,
and practical support for richer cues such as pose-derived direction.

The design is therefore not a single elegant algorithm block. It is a staged pipeline with explicit checkpoints.

Runtime Topology¶

TrackingWorker is the top-level orchestrator. It owns the run lifecycle, frame iteration, detector construction, Kalman manager, assignment logic, visualization, caching, pose precompute, and final signal emission.

The main control branches are:

forward live detection,
forward cached replay,
backward cached replay,
preview mode,
and YOLO two-phase mode with batched prepass.

Data Contracts¶

Track state¶

Each Kalman slot stores a state vector:

state = [x, y, theta, vx, vy]

with:

x, y: center position in resized-frame coordinates,
theta: orientation in radians,
vx, vy: velocity components in pixels per frame.

Measurement state¶

Online measurement updates operate in:

measurement = [x, y, theta]

Auxiliary detection attributes¶

Association also consumes parallel per-detection arrays:

area,
aspect ratio,
confidence,
OBB corners,
DetectionID,
pose-derived heading,
normalized pose keypoints,
pose visibility,
and crop quality heuristics.

Detection Layer¶

Background subtraction path¶

The background-subtraction branch:

converts the frame to grayscale,
applies brightness, contrast, gamma, and optional lighting stabilization,
updates the background model,
generates a foreground mask,
applies ROI masking,
optionally applies conservative split morphology,
and extracts measurement candidates.

This branch is lightweight and self-contained, but it depends strongly on scene stability.

YOLO OBB path¶

The YOLO path:

runs object detection on the BGR frame,
preserves raw detections,
generates deterministic DetectionID values from absolute frame index,
applies filtering after inference,
and optionally writes raw detections to cache for later replay.

The filtering stage is important because it decouples:

expensive inference,
ROI and size filtering,
downstream pose extraction,
and tracking experiments.

Detection cache semantics¶

The detection cache is central to the current architecture.

It is used for:

backward tracking,
replay without re-running inference,
pose-property precompute,
and reproducible tuning.

The worker validates that the cache:

exists,
matches the requested frame range,
and has a compatible format.

Backward mode refuses to run without a compatible cache.

Kalman Filter Implementation¶

The filter manager is vectorized across all target slots.

State transition¶

The transition matrix is:

[1 0 0 damp 0   ]
[0 1 0 0    damp]
[0 0 1 0    0   ]
[0 0 0 damp 0   ]
[0 0 0 0    damp]

with damp = KALMAN_DAMPING.

That means:

position is advanced using damped velocity,
velocity persists but decays,
and orientation is modeled as a carried state rather than a velocity-driven state.

Process noise¶

The process-noise model is anisotropic. Longitudinal and lateral velocity noise are rotated into the current heading frame before they are injected into the covariance.

Operationally, this means the tracker can assume:

more uncertainty along the direction of travel,
less uncertainty sideways,
and a more biologically plausible motion envelope than isotropic noise would provide.

Correction¶

Measurement correction uses:

circular innovation logic for theta,
Joseph-form covariance update for numerical stability,
and a post-correction speed clamp based on REFERENCE_BODY_SIZE.

Age-dependent motion restraint¶

Young tracks are intentionally conservative. Before KALMAN_MATURITY_AGE, velocity is attenuated toward zero according to KALMAN_INITIAL_VELOCITY_RETENTION.

This reduces the chance that a newly initialized slot immediately predicts itself into a nearby wrong animal.

Orientation Handling¶

The orientation logic deserves separate attention because it affects both assignment and visualization.

Axis ambiguity collapse¶

OBB orientation is treated as a body axis unless pose provides a directed heading. The worker compares theta and theta + pi against the last reliable orientation and chooses whichever is closer.

This avoids unhelpful 180-degree oscillations.

Pose-derived heading¶

If pose extraction is enabled and both configured keypoint groups are visible enough:

the worker computes a directed posterior-to-anterior heading,
normalizes it into [0, 2*pi),
and marks the detection as directed.

If pose does not provide a valid directed heading, the fallback path returns to axis-based orientation collapse.

Motion-conditioned smoothing¶

After correction, orientation is further smoothed:

low-speed tracks retain historical orientation unless the change is small enough,
high-speed tracks can flip by 180 degrees if motion direction indicates the heading is reversed.

This is a pragmatic mix of body-axis and motion-direction reasoning.

Association Stack¶

The cost matrix is implemented in TrackAssigner.

Baseline cost¶

The base cost per track-detection pair is:

cost =
  W_POSITION * position_distance
  + W_ORIENTATION * orientation_difference
  + W_AREA * area_difference
  + W_ASPECT * aspect_difference

Position distance can be either:

Euclidean distance, or
Mahalanobis distance using the predicted innovation covariance.

Orientation difference is circular and respects the directed-vs-axis distinction.

Stage-1 candidate gate¶

When advanced association data is available, a coarse candidate gate runs before full cost scoring. It rejects pairs that exceed a local motion envelope derived from:

global culling threshold,
per-track uncertainty,
per-track average step size,
maximum allowed area ratio,
and maximum allowed aspect-ratio change.

This keeps the expensive stage focused on plausible candidates.

Pose rejection¶

If normalized pose keypoints are available, the assigner can compute a paired pose distance between:

the current detection,
and the track's stored pose prototype.

If visibility is high enough and the pose distance exceeds POSE_REJECTION_THRESHOLD, the candidate is vetoed even if motion looks acceptable.

This is a strong identity-protection mechanism when pose is reliable.

Assignment phases¶

The assignment output is not a single one-shot matching step.

Established tracks¶

Tracks with tracking_continuity >= CONTINUITY_THRESHOLD are handled first.

Options:

Hungarian global assignment,
or greedy assignment when throughput matters more.

Unstable tracks¶

Lower-continuity tracks are filled greedily from the remaining detections.

Lost-track respawn¶

Free detections can be assigned to lost slots if they are far enough from non-lost predictions, using MIN_RESPAWN_DISTANCE.

This is a controlled reuse policy, not a free-for-all.

Track Memory Beyond Kalman State¶

The worker keeps more than just the Kalman filter state.

Per-track memory includes:

orientation_last,
last_shape_info,
track_pose_prototypes,
track_avg_step,
continuity count,
missed-frame count,
local CSV row count,
and recent positions for speed estimation.

This extra memory is what lets the tracker remain practical in crowded scenes without inflating the Kalman state unnecessarily.

Pose-Enhanced Tracking Path¶

Pose extraction is not only a downstream analysis feature. In the current implementation it also feeds back into tracking.

Precompute stage¶

If pose extraction is enabled in YOLO OBB mode, the worker can precompute pose properties from cached detections before online tracking begins.

That precompute produces a deterministic individual-properties cache keyed by:

video,
frame range,
detection hash,
filter settings hash,
and extractor hash.

Runtime use¶

During frame processing the worker can recover, per detection:

pose keypoints,
pose visibility,
normalized pose prototype,
and pose-derived heading.

Those values then influence:

orientation override,
association vetoing,
track prototype updates,
relinking,
and optional export.

Forward and Backward Passes¶

Backward mode is not a special detector. It is a special playback mode over the same cached detections.

Important properties:

it reuses the requested frame range,
it can skip frame reads entirely for speed,
it writes a second trajectory hypothesis,
and orientation handling includes a backward fallback correction for non-pose-directed cases.

The purpose is not to generate a prettier trajectory. The purpose is to offer a second causal interpretation for later consensus.

Post-Processing Pipeline¶

The main post-processing functions are in hydra_suite.core.post.processing.

Cleaning and break detection¶

The cleaning stage can:

remove short fragments,
split on excessive absolute velocity,
split on abnormal velocity z-scores,
and split across long occlusion runs.

These breakers exist because online assignment is allowed to be imperfect if later consistency checks can safely cut bad sections.

Conservative forward/backward resolution¶

resolve_trajectories does not blindly average two passes.

It:

converts trajectory inputs into DataFrames,
removes trivially bad fragments,
finds forward/backward overlap candidates,
keeps only pairs with enough agreeing frames,
performs conservative segment-level merging,
removes redundant fragments,
merges overlapping agreeing fragments,
and stitches nearby fragments across short gaps.

The result is intentionally fragmentation-tolerant and identity-protective.

Motion-and-pose relinking¶

relink_trajectories_with_pose summarizes each fragment by:

start and end frames,
start and end position,
start and end heading,
short-window velocity,
and optional pose prototypes at both ends.

Candidate relinks are accepted only if:

gap length is within MAX_OCCLUSION_GAP,
predicted motion reaches the next fragment within an allowed distance,
heading is compatible when motion is informative,
and pose distance is below RELINK_POSE_MAX_DISTANCE when both sides have usable pose.

Interpolation¶

interpolate_trajectories reindexes each trajectory onto a complete frame range, fills missing state labels as occluded, and interpolates:

X,
Y,
and Theta.

Theta interpolation uses circular logic, which prevents wrap-around artifacts near 0 and 2*pi.

Performance Model¶

The implementation includes several explicit performance levers:

batched YOLO prepass,
detection cache reuse,
optional frame prefetching,
optional KD-tree candidate pruning,
optional greedy assignment,
Numba kernels for cost computation and post-processing inner loops,
and visualization-free cached replay.

For larger target counts, enabling spatial optimization is usually worth it.

Parameter Surfaces That Matter Most¶

Parameter	Role in current implementation	Failure if too small	Failure if too large
`REFERENCE_BODY_SIZE`	Scales motion, velocity, and geometric heuristics	tracker becomes too tight	tracker becomes overly permissive
`MAX_DISTANCE_THRESHOLD`	hard assignment acceptance ceiling	fragmentation	swaps and duplicates
`LOST_THRESHOLD_FRAMES`	occlusion tolerance	premature track death	stale tracks survive too long
`KALMAN_DAMPING`	motion persistence	jerky short-term prediction	overshoot after stops
`POSE_REJECTION_THRESHOLD`	pose veto strictness	true matches rejected	pose provides little protection
`AGREEMENT_DISTANCE`	forward/backward merge tolerance	under-merged outputs	over-merged outputs
`MAX_OCCLUSION_GAP`	relinking and splitting window	missed recoveries	speculative reconnects

Source-of-Truth Files¶

If you need to audit behavior, these files are the primary reference points:

src/hydra_suite/core/tracking/worker.py
src/hydra_suite/core/filters/kalman.py
src/hydra_suite/core/assigners/hungarian.py
src/hydra_suite/core/post/processing.py
src/hydra_suite/core/identity/runtime_api.py
tests/test_tracking_pipeline_synthetic.py
tests/test_post_tracklet_relinking.py