Runtime Integration Guide¶

This guide defines the runtime contract for end-to-end integration of:

New detection models
New pose models
New identity/individual-analysis methods (classifiers, embeddings, contrastive features, tag readers)

Design Goal¶

All compute-heavy methods must be controlled by one canonical runtime setting:

compute_runtime

No feature should require users to configure a separate runtime selector.

Source of Truth¶

Runtime support and translation logic are centralized in:

src/hydra_suite/core/runtime/compute_runtime.py
src/hydra_suite/utils/gpu_utils.py

Core public helpers:

CANONICAL_RUNTIMES
allowed_runtimes_for_pipelines(...)
infer_compute_runtime_from_legacy(...)
derive_detection_runtime_settings(...)
derive_pose_runtime_settings(...)

Canonical Runtime Values¶

cpu
mps
cuda
rocm
onnx_cpu
onnx_cuda
onnx_rocm
tensorrt

Integration Checklist (Required)¶

1) Define a pipeline key¶

Add a stable pipeline name and use it in runtime gating.

Current examples:

yolo_obb_detection
yolo_pose
sleap_pose

For future additions, use names like:

appearance_embedding
contrastive_embedding
apriltag_classifier
colortag_classifier

2) Add capability rules¶

Update _pipeline_supports_runtime(...) in compute_runtime.py so the new pipeline explicitly defines supported runtimes.

Rules must be strict:

If unsupported, return False.
Do not silently remap unsupported runtime to a different backend.

3) Add runtime translation¶

If the pipeline consumes legacy backend knobs, add mapping from compute_runtime to backend settings.

Examples already used:

Detection: yolo_device, enable_onnx_runtime, enable_tensorrt
Pose: pose_runtime_flavor, pose_sleap_device

4) Wire UI intersection gating¶

Ensure the UI includes the new pipeline in the runtime context set.

MAT pattern:

Gather enabled pipeline set.
Call allowed_runtimes_for_pipelines(...).
Populate runtime dropdown from the intersection.

PoseKit uses the same pattern for active prediction backend scope.

5) Implement runtime lifecycle¶

If the integration has long-lived resources (service/subprocess/session), lifecycle must be run-scoped:

Initialize once per run.
Warmup once.
Close on complete/error/cancel.

Use existing runtime manager/service patterns where possible.

6) Export artifacts automatically¶

If ONNX/TensorRT export is needed:

Generate artifacts automatically.
Store artifacts adjacent to model paths.
Save runtime metadata signature for freshness checks.
Never require a manual export path for normal operation.

7) Keep cache keys runtime-correct¶

Any cached output that depends on runtime/model/export shape must include those inputs in cache identity.

For new features:

Include model fingerprint and runtime flavor in cache signatures.
Include feature-specific shaping params (for example max instances, embedding dimension, preprocessing mode).

8) Lock controls during compute¶

UI controls that could invalidate active runtime sessions must be disabled while jobs are running.

This prevents mid-run backend switches and thread crashes.

9) Add tests (minimum bar)¶

Add/extend tests for:

Capability matrix and intersection gating.
Runtime translation determinism from compute_runtime.
Migration from legacy config values.
Lifecycle correctness (startup/teardown on success and failure).
Artifact auto-export + freshness behavior.
Failure fallback behavior with explicit logging.

End-to-End Acceptance Criteria¶

A new model/method integration is complete only when:

It appears in runtime gating with explicit support rules.
It runs from canonical compute_runtime without extra runtime selectors.
Its ONNX/TensorRT artifacts are auto-managed (if applicable).
Caches remain valid and runtime/model-aware.
MAT and PoseKit behavior is consistent where the feature exists.

Common Anti-Patterns (Do Not Add)¶

Hidden runtime remapping (onnx_* requested, CPU used without notice).
Feature-specific runtime dropdowns when global runtime is available.
Manual exported-model-path requirements for standard workflows.
Runtime checks scattered across GUI/business logic without shared resolver usage.