# ML Problem Framing ## ### 1) Understanding the Business Objective **Objective:** deliver safer, more reliable driver-assistance for cars and trucks by increasing perception robustness and reducing intervention events—without exploding cloud costs or iteration time. **Stakeholder alignment:** * **Product:** safety KPIs first, low-latency UX (real-time alerts, <100–150 ms p95 from sensor ingest to decision). * **Operations/Program:** predictable rollouts, gated by safety thresholds and audits. * **Data/ML:** fast iteration loops, reproducible experiments, clear failure mining for edge cases. * **MLOps/Infra:** scalable pipelines, cost controls, observability, and security-by-default. * **Compliance/Legal:** privacy, traceability, explainability commensurate with safety review. **Explainability baseline:** agree that perception models must support *post-hoc* explanations for incident review (e.g., SHAP for tabular/CAN features, saliency/Grad-CAM for vision). Define **reference baselines** (e.g., median-frame conditions) to anchor explanations and audits. --- ### 2) Is Machine Learning the Right Approach? (Use-Case Evaluation) **Why ML here:** perception across cameras/lidar/radar is a high-dimensional pattern recognition problem; rules alone are brittle. Failure modes evolve (weather, construction, vehicle styles), requiring **continual learning**. **When ML wins (our case):** * Complex, non-linear patterns (multi-sensor fusion). * Scale (millions of frames; long operational horizon). * Evolving environment (drift) → retraining required. **Guardrails & baselines:** * Start with robust non-ML heuristics for sanity checks (e.g., speed gates, sensor drop detection). * Gate ML predictions with safety predicates and **shadow mode** before activation. **Data Engine requirement:** design a closed loop to **collect → curate → label → train → deploy → monitor → retrain**, with fleet triggers and targeted data mining to shorten time-to-improvement. --- ### 3) Defining the ML Problem **Product ideal outcome:** fewer risky situations and fewer driver interventions at the same or better comfort level. **Model goals (decomposed):** * **Perception (primary):** multi-task detection & segmentation (vehicles, pedestrians, cyclists, lanes, drivable area), depth/BEV occupancy, and tracking. * **Event understanding (secondary):** cut-ins, harsh braking ahead, lane closures, stationary hazards. * **Confidence & uncertainty:** calibrated scores feeding planners/alerts. **Inputs (multi-modal):** * **Camera** (multi-view video), **Radar/Lidar** (where available), **IMU/GNSS/CAN**. * **Context features:** weather/time, road class, speed limits (if available). **Outputs & task types:** * **Object detection/instance segmentation** (multilabel, multi-class). * **BEV occupancy/semantic map** (dense prediction). * **Time-to-collision / proximity risk** (regression), **event flags** (classification). **Key issues to address:** * **Long-tail & class imbalance:** rare but high-impact scenarios (night rain, occlusions, construction workers directing traffic). * **Domain shift/drift:** seasonal/weather/geographic variation. * **Label scarcity:** use auto-labeling, weak supervision, and targeted human QA. * **Throughput & latency:** train at scale; serve under tight budgets and on constrained edge targets where applicable. --- ### 4) Assessing Feasibility & Risks **Data readiness** * Sufficient raw sensor coverage; rare events under-represented → plan **trigger-based mining** and similarity search. * Labeling cost high → **hybrid strategy** (auto-label + human verification on hard slices). **Technical constraints** * **Latency:** real-time perception budget (<30–60 ms per frame for core heads; overall pipeline <100–150 ms p95). * **Memory/compute:** fit models to deployment targets (quantization/distillation where needed). * **Robustness:** enforce sensor health checks; degrade gracefully to fewer modalities if a sensor drops. **Operational risk** * Safety gating (shadow, A/B with small percentages, rollback). * Strict experiment traceability (datasets, code, hyperparams) with **Weights & Biases (WandB)** for runs/artifacts, and Git/DVC (or equivalent) for data/model lineage. **Cost & ROI** * Control training/inference cost with spot/managed schedules, data pruning, tiered storage, and on-demand auto-labeling only for “high-value” clips. **Ethics & compliance** * PII handling (faces/plates blurring where required), audit trails, incident review packs with explanations. --- ### 5) Defining Success Metrics (Business, Model, Operational) | Type | Metric | Definition | How to Measure | Target (initial) | | ---------------------- | ---------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------------------- | -------------------------------- | | **Business** | **Intervention rate ↓** | Driver takeovers per 1000 km in assisted modes | Fleet telemetry and event logs, normalized by km and conditions | **−15–20%** vs. baseline cohort | | **Business** | **Safety-critical events ↓** | Near-miss / harsh-brake alerts per 1000 km | On-vehicle triggers (brake pressure, decel, proximity) with post-hoc validation | **−15–22%** | | **Business** | **Feature reliability ↑** | % sessions without faulted ADAS disengagement | Session analytics, error codes, watchdog | **+10–15%** | | **Model (Perception)** | **Primary mAP / mIoU ↑** | Detection mAP\@IoU, seg mIoU on curated “hard” sets | Benchmarks by scenario slices (night/rain/occlusion/construction) | **+5–8 pts** on hard slices | | **Model (Risk)** | **Calibration (ECE) ↓** | Expected Calibration Error for confidence outputs | Reliability diagrams on offline eval & shadow data | **< 3–5%** ECE | | **Model (Long-tail)** | **Slice recall ↑** | Recall on rare scenarios (e.g., night-rain pedestrians) | Per-slice eval sets maintained in W\&B Artifacts | **+10–15 pts** | | **Operational** | **TTMU ↓** | Time from failure discovery → safe model in prod | Track via ticket timestamps and deploy tags | **8–10 w → 2–3 w** | | **Operational** | **Pipeline SLA** | Ingestion→curation→label turnaround time per drive | Orchestrator metrics, queue latencies | **< 24 h** | | **Operational** | **Serving latency** | End-to-end p95 inference latency | Traces (Cloud + edge), p50/p95/p99 | **< 100–150 ms p95** | | **Operational** | **Cost efficiency** | \$/1000 km processed; \$/training experiment | Cost dashboards; W\&B sweep cost per win | **−20–30%** vs. baseline quarter | | **Operational** | **Observability coverage** | % models/pipelines with alerts, dashboards, SLOs | Runbooks + monitoring inventory | **> 85%** coverage | **Notes on measurement** * **Business metrics** are evaluated on held-out test routes and staggered rollouts to control for route/weather mix. * **Model metrics** are tracked in **WandB** (projects for perception/risk; Artifacts for datasets, models, and eval sets). * **Operational metrics** come from pipeline orchestration, tracing, and cost dashboards; tie each notable change to a W\&B run/deployment tag for auditability. --- **Decision summary:** ML is necessary and appropriate for the perception and event-understanding layers, provided it is embedded in a **closed-loop data engine** with strong safety gating, targeted long-tail mining, reproducible experimentation (**WandB**), and rigorous operational controls.