ML Problem Framing

1) Understanding the Business Objective

Objective: deliver safer, more reliable driver-assistance for cars and trucks by increasing perception robustness and reducing intervention events—without exploding cloud costs or iteration time.

Stakeholder alignment:

  • Product: safety KPIs first, low-latency UX (real-time alerts, <100–150 ms p95 from sensor ingest to decision).

  • Operations/Program: predictable rollouts, gated by safety thresholds and audits.

  • Data/ML: fast iteration loops, reproducible experiments, clear failure mining for edge cases.

  • MLOps/Infra: scalable pipelines, cost controls, observability, and security-by-default.

  • Compliance/Legal: privacy, traceability, explainability commensurate with safety review.

Explainability baseline: agree that perception models must support post-hoc explanations for incident review (e.g., SHAP for tabular/CAN features, saliency/Grad-CAM for vision). Define reference baselines (e.g., median-frame conditions) to anchor explanations and audits.


2) Is Machine Learning the Right Approach? (Use-Case Evaluation)

Why ML here: perception across cameras/lidar/radar is a high-dimensional pattern recognition problem; rules alone are brittle. Failure modes evolve (weather, construction, vehicle styles), requiring continual learning.

When ML wins (our case):

  • Complex, non-linear patterns (multi-sensor fusion).

  • Scale (millions of frames; long operational horizon).

  • Evolving environment (drift) → retraining required.

Guardrails & baselines:

  • Start with robust non-ML heuristics for sanity checks (e.g., speed gates, sensor drop detection).

  • Gate ML predictions with safety predicates and shadow mode before activation.

Data Engine requirement: design a closed loop to collect → curate → label → train → deploy → monitor → retrain, with fleet triggers and targeted data mining to shorten time-to-improvement.


3) Defining the ML Problem

Product ideal outcome: fewer risky situations and fewer driver interventions at the same or better comfort level.

Model goals (decomposed):

  • Perception (primary): multi-task detection & segmentation (vehicles, pedestrians, cyclists, lanes, drivable area), depth/BEV occupancy, and tracking.

  • Event understanding (secondary): cut-ins, harsh braking ahead, lane closures, stationary hazards.

  • Confidence & uncertainty: calibrated scores feeding planners/alerts.

Inputs (multi-modal):

  • Camera (multi-view video), Radar/Lidar (where available), IMU/GNSS/CAN.

  • Context features: weather/time, road class, speed limits (if available).

Outputs & task types:

  • Object detection/instance segmentation (multilabel, multi-class).

  • BEV occupancy/semantic map (dense prediction).

  • Time-to-collision / proximity risk (regression), event flags (classification).

Key issues to address:

  • Long-tail & class imbalance: rare but high-impact scenarios (night rain, occlusions, construction workers directing traffic).

  • Domain shift/drift: seasonal/weather/geographic variation.

  • Label scarcity: use auto-labeling, weak supervision, and targeted human QA.

  • Throughput & latency: train at scale; serve under tight budgets and on constrained edge targets where applicable.


4) Assessing Feasibility & Risks

Data readiness

  • Sufficient raw sensor coverage; rare events under-represented → plan trigger-based mining and similarity search.

  • Labeling cost high → hybrid strategy (auto-label + human verification on hard slices).

Technical constraints

  • Latency: real-time perception budget (<30–60 ms per frame for core heads; overall pipeline <100–150 ms p95).

  • Memory/compute: fit models to deployment targets (quantization/distillation where needed).

  • Robustness: enforce sensor health checks; degrade gracefully to fewer modalities if a sensor drops.

Operational risk

  • Safety gating (shadow, A/B with small percentages, rollback).

  • Strict experiment traceability (datasets, code, hyperparams) with Weights & Biases (WandB) for runs/artifacts, and Git/DVC (or equivalent) for data/model lineage.

Cost & ROI

  • Control training/inference cost with spot/managed schedules, data pruning, tiered storage, and on-demand auto-labeling only for “high-value” clips.

Ethics & compliance

  • PII handling (faces/plates blurring where required), audit trails, incident review packs with explanations.


5) Defining Success Metrics (Business, Model, Operational)

Type

Metric

Definition

How to Measure

Target (initial)

Business

Intervention rate ↓

Driver takeovers per 1000 km in assisted modes

Fleet telemetry and event logs, normalized by km and conditions

−15–20% vs. baseline cohort

Business

Safety-critical events ↓

Near-miss / harsh-brake alerts per 1000 km

On-vehicle triggers (brake pressure, decel, proximity) with post-hoc validation

−15–22%

Business

Feature reliability ↑

% sessions without faulted ADAS disengagement

Session analytics, error codes, watchdog

+10–15%

Model (Perception)

Primary mAP / mIoU ↑

Detection mAP@IoU, seg mIoU on curated “hard” sets

Benchmarks by scenario slices (night/rain/occlusion/construction)

+5–8 pts on hard slices

Model (Risk)

Calibration (ECE) ↓

Expected Calibration Error for confidence outputs

Reliability diagrams on offline eval & shadow data

< 3–5% ECE

Model (Long-tail)

Slice recall ↑

Recall on rare scenarios (e.g., night-rain pedestrians)

Per-slice eval sets maintained in W&B Artifacts

+10–15 pts

Operational

TTMU ↓

Time from failure discovery → safe model in prod

Track via ticket timestamps and deploy tags

8–10 w → 2–3 w

Operational

Pipeline SLA

Ingestion→curation→label turnaround time per drive

Orchestrator metrics, queue latencies

< 24 h

Operational

Serving latency

End-to-end p95 inference latency

Traces (Cloud + edge), p50/p95/p99

< 100–150 ms p95

Operational

Cost efficiency

$/1000 km processed; $/training experiment

Cost dashboards; W&B sweep cost per win

−20–30% vs. baseline quarter

Operational

Observability coverage

% models/pipelines with alerts, dashboards, SLOs

Runbooks + monitoring inventory

> 85% coverage

Notes on measurement

  • Business metrics are evaluated on held-out test routes and staggered rollouts to control for route/weather mix.

  • Model metrics are tracked in WandB (projects for perception/risk; Artifacts for datasets, models, and eval sets).

  • Operational metrics come from pipeline orchestration, tracing, and cost dashboards; tie each notable change to a W&B run/deployment tag for auditability.


Decision summary: ML is necessary and appropriate for the perception and event-understanding layers, provided it is embedded in a closed-loop data engine with strong safety gating, targeted long-tail mining, reproducible experimentation (WandB), and rigorous operational controls.