ML Problem Framing¶

¶

1) Understanding the Business Objective¶

Objective: deliver safer, more reliable driver-assistance for cars and trucks by increasing perception robustness and reducing intervention events—without exploding cloud costs or iteration time.

Stakeholder alignment:

Product: safety KPIs first, low-latency UX (real-time alerts, <100–150 ms p95 from sensor ingest to decision).
Operations/Program: predictable rollouts, gated by safety thresholds and audits.
Data/ML: fast iteration loops, reproducible experiments, clear failure mining for edge cases.
MLOps/Infra: scalable pipelines, cost controls, observability, and security-by-default.
Compliance/Legal: privacy, traceability, explainability commensurate with safety review.

Explainability baseline: agree that perception models must support post-hoc explanations for incident review (e.g., SHAP for tabular/CAN features, saliency/Grad-CAM for vision). Define reference baselines (e.g., median-frame conditions) to anchor explanations and audits.

2) Is Machine Learning the Right Approach? (Use-Case Evaluation)¶

Why ML here: perception across cameras/lidar/radar is a high-dimensional pattern recognition problem; rules alone are brittle. Failure modes evolve (weather, construction, vehicle styles), requiring continual learning.

When ML wins (our case):

Complex, non-linear patterns (multi-sensor fusion).
Scale (millions of frames; long operational horizon).
Evolving environment (drift) → retraining required.

Guardrails & baselines:

Start with robust non-ML heuristics for sanity checks (e.g., speed gates, sensor drop detection).
Gate ML predictions with safety predicates and shadow mode before activation.

Data Engine requirement: design a closed loop to collect → curate → label → train → deploy → monitor → retrain, with fleet triggers and targeted data mining to shorten time-to-improvement.

3) Defining the ML Problem¶

Product ideal outcome: fewer risky situations and fewer driver interventions at the same or better comfort level.

Model goals (decomposed):

Perception (primary): multi-task detection & segmentation (vehicles, pedestrians, cyclists, lanes, drivable area), depth/BEV occupancy, and tracking.
Event understanding (secondary): cut-ins, harsh braking ahead, lane closures, stationary hazards.
Confidence & uncertainty: calibrated scores feeding planners/alerts.

Inputs (multi-modal):

Camera (multi-view video), Radar/Lidar (where available), IMU/GNSS/CAN.
Context features: weather/time, road class, speed limits (if available).

Outputs & task types:

Object detection/instance segmentation (multilabel, multi-class).
BEV occupancy/semantic map (dense prediction).
Time-to-collision / proximity risk (regression), event flags (classification).

Key issues to address:

Long-tail & class imbalance: rare but high-impact scenarios (night rain, occlusions, construction workers directing traffic).
Domain shift/drift: seasonal/weather/geographic variation.
Label scarcity: use auto-labeling, weak supervision, and targeted human QA.
Throughput & latency: train at scale; serve under tight budgets and on constrained edge targets where applicable.

4) Assessing Feasibility & Risks¶

Data readiness

Sufficient raw sensor coverage; rare events under-represented → plan trigger-based mining and similarity search.
Labeling cost high → hybrid strategy (auto-label + human verification on hard slices).

Technical constraints

Latency: real-time perception budget (<30–60 ms per frame for core heads; overall pipeline <100–150 ms p95).
Memory/compute: fit models to deployment targets (quantization/distillation where needed).
Robustness: enforce sensor health checks; degrade gracefully to fewer modalities if a sensor drops.

Operational risk

Safety gating (shadow, A/B with small percentages, rollback).
Strict experiment traceability (datasets, code, hyperparams) with Weights & Biases (WandB) for runs/artifacts, and Git/DVC (or equivalent) for data/model lineage.

Cost & ROI

Control training/inference cost with spot/managed schedules, data pruning, tiered storage, and on-demand auto-labeling only for “high-value” clips.

Ethics & compliance

PII handling (faces/plates blurring where required), audit trails, incident review packs with explanations.

5) Defining Success Metrics (Business, Model, Operational)¶

Type	Metric	Definition	How to Measure	Target (initial)
Business	Intervention rate ↓	Driver takeovers per 1000 km in assisted modes	Fleet telemetry and event logs, normalized by km and conditions	−15–20% vs. baseline cohort
Business	Safety-critical events ↓	Near-miss / harsh-brake alerts per 1000 km	On-vehicle triggers (brake pressure, decel, proximity) with post-hoc validation	−15–22%
Business	Feature reliability ↑	% sessions without faulted ADAS disengagement	Session analytics, error codes, watchdog	+10–15%
Model (Perception)	Primary mAP / mIoU ↑	Detection mAP@IoU, seg mIoU on curated “hard” sets	Benchmarks by scenario slices (night/rain/occlusion/construction)	+5–8 pts on hard slices
Model (Risk)	Calibration (ECE) ↓	Expected Calibration Error for confidence outputs	Reliability diagrams on offline eval & shadow data	< 3–5% ECE
Model (Long-tail)	Slice recall ↑	Recall on rare scenarios (e.g., night-rain pedestrians)	Per-slice eval sets maintained in W&B Artifacts	+10–15 pts
Operational	TTMU ↓	Time from failure discovery → safe model in prod	Track via ticket timestamps and deploy tags	8–10 w → 2–3 w
Operational	Pipeline SLA	Ingestion→curation→label turnaround time per drive	Orchestrator metrics, queue latencies	< 24 h
Operational	Serving latency	End-to-end p95 inference latency	Traces (Cloud + edge), p50/p95/p99	< 100–150 ms p95
Operational	Cost efficiency	$/1000 km processed; $/training experiment	Cost dashboards; W&B sweep cost per win	−20–30% vs. baseline quarter
Operational	Observability coverage	% models/pipelines with alerts, dashboards, SLOs	Runbooks + monitoring inventory	> 85% coverage

Notes on measurement

Business metrics are evaluated on held-out test routes and staggered rollouts to control for route/weather mix.
Model metrics are tracked in WandB (projects for perception/risk; Artifacts for datasets, models, and eval sets).
Operational metrics come from pipeline orchestration, tracing, and cost dashboards; tie each notable change to a W&B run/deployment tag for auditability.

Decision summary: ML is necessary and appropriate for the perception and event-understanding layers, provided it is embedded in a closed-loop data engine with strong safety gating, targeted long-tail mining, reproducible experimentation (WandB), and rigorous operational controls.