Scene Understanding & Data Mining¶

¶

Perfect—here’s a clear, implementation-ready expansion of Workflows 6–12: Scene Understanding & Data Mining. I’ve kept each workflow consistent in structure so your team can lift this straight into design docs and tickets.

6) Scene Detection & Triggers¶

Trigger

New drive finishes Foundations (Bronze → Silver), or a telemetry trigger arrives (e.g., ABS, disengagement, harsh brake).

Inputs

Time-synced Silver assets: camera keyframes/clips, LiDAR sweeps/packets, CAN/IMU time series.
Precomputed detections from Foundations (e.g., coarse lanes/objects).
Map tiles/metadata (road class/intersection).
Shadow vs prod model outputs (if available).

Step-by-step (with guardrails)

Segment the drive into candidate scenes
- Adaptive segmentation on change-points: speed/accel deltas, heading/yaw rate, road topology transitions, stop→go, intersection proximity.
- Merge/split logic: enforce min scene length (e.g., 3–60s), bridge micro-gaps (<500 ms).
Detect & enrich
- Lightweight TensorRT/TorchScript models for 2D objects, lane edges, traffic-light state; ego-event heuristics from CAN/IMU (cut-in, hard brake, tailgating).
- Optional OOD/uncertainty probes (entropy, ODIN) and prod vs shadow disagreement hooks.
Score interestingness
- score = α·rarity + β·uncertainty + γ·disagreement + δ·diversity_margin.
- Rarity from rolling histograms per slice (weather/time/geo).
Validate
- Great Expectations: schema, timestamp monotonicity, frame-rate bounds, sensor coverage per scene.
- Sanity plots sampled to S3; spot-check top-N scenes by score.
Emit triggers
- Push trigger_flags (e.g., disengagement=true, ood=true) to EventBridge/SQS for downstream miners.

Outputs

scene_segments.parquet fields:

{scene_id, drive_id, vehicle_id, start_ts, end_ts, clip_uri[], tags[], trigger_flags[], score}

scene_events.parquet (per-event rows with attributes, e.g., {type, ts, value, confidence}).

Storage & Indexing

S3 Silver (Parquet, partitioned by dt/vehicle_id).
Glue/Athena external tables for analytics.
DynamoDB: per-scene manifest (fast lookups).
OpenSearch: scene docs for keyword/facet search.

Core tooling

Airflow DAG → EMR Spark or AWS Batch containers; Great Expectations; OpenSearch; DynamoDB; Weights & Biases (runs + artifacts).

7) Vector Index (Similarity Search)¶

Trigger

Batch completion of #6 (new scenes available).

Inputs

Scene keyframes (images), clip thumbnails, LiDAR BEV features, scene tags/notes.

Step-by-step (with guardrails)

Embed
- Images: CLIP ViT-B/32 (or ResNet-50) embeddings per keyframe; optionally average over clip.
- LiDAR: BEV encoder or pooled PointNet++ features.
- Text: sentence embeddings (tags/notes).
- Concatenate or keep modality-specific indices (recommended).
Normalize & reduce
- L2-normalize; optional PCA→256D; whitening for dense IVF/PQ.
Index build
- FAISS: IVF-PQ (nlist, m, nbits tuned to latency), or
- OpenSearch k-NN: HNSW (M, ef_construction), per-scene document with vector field.
Validate
- Retrieval smoke tests (query-by-example must return near-duplicates).
- Offline mAP@K and duplicate recall; log to W&B.
- Evidently drift on embedding stats (mean/var, cov trace); alert on large shift.

Outputs

embeddings.parquet (uri, scene_id, modality, vector[]).
FAISS artifacts (embeddings.faiss, pca.npy) or OpenSearch k-NN index.

Storage & Indexing

S3 Silver (vectors + FAISS); Glue for vectors; OpenSearch for real-time k-NN search.

Core tooling

SageMaker Processing/EMR for batch compute; FAISS/OpenSearch k-NN; Evidently; W&B.

8) Scenario Mining (Programmatic / Query UI)¶

Trigger

On-demand engineer queries, scheduled gap-analysis, or post-deployment error mining.

Inputs

Scene catalog (#6), vector index (#7), telemetry triggers, map/weather joins.

Step-by-step (with guardrails)

Unified Query (GraphQL via AWS AppSync)
- Combine: structured filters (time range, weather, road type), OpenSearch facets/text, and vector similarity (k-NN).
- Example filter: weather in [rain,snow] ∧ time_of_day=dusk ∧ tags.contains('cyclist') ∧ kNN(image_vec, q) < τ.
Long-tail mining
- Rarity scoring vs fleet distribution; enforce diversity (min pairwise distance); slice coverage constraints (ensure each critical slice ≥ target).
- Budgeted selection under storage/labeling limits.
Validate
- Deduplicate via perceptual hash & embedding distance.
- Balance report by slice; human UI spot QA (sampled thumbnails).
Materialize
- Emit dataset spec and clip lists; version with DVC and W&B Artifacts.

Outputs

dataset_spec.yaml (filters, slices, version pins), curated clip URI lists for train/val/test.

Storage & Indexing

S3 Gold curation/...; DVC tags/locks; saved queries in DynamoDB; discoverability in OpenSearch.

Core tooling

AppSync (GraphQL), OpenSearch, Athena, DVC, internal React curation UI.

9) Auto-Labeling (Offboard)¶

Trigger

New curated set from #8 (or nightly bulk run).

Inputs

Curated clips; camera/LiDAR calibration; map layers (speed limits, lanes); optional prior labels.

Step-by-step (with guardrails)

Run offboard labelers (GPU)
- 2D detection/segmentation (e.g., YOLOX / Mask R-CNN), 3D detection (CenterPoint/Pillar), multi-camera fusion (BEVFusion-style), tracking (ByteTrack/DeepSORT), lane topology (LaneATT or lane graph extractor).
- Temporal smoothing (Kalman/IMM); identity stitching across cameras.
Confidence gating
- Keep high-confidence pseudo-labels; route uncertain/rare classes to #10.
- Calibrate thresholds per slice (e.g., night/rain).
Self-consistency checks
- Cross-view re-projection residuals, track continuity, kinematics plausibility (speed vs displacement), lane adherence.
Validate
- Label schema conformance; IoU and AP deltas against a held-out human-labeled subset; per-class coverage/imbalance report; ECE for calibration.
- Gate deployment of labels on quality thresholds; log to W&B.

Outputs

labels_auto/ (COCO/Waymo-style JSON), tracks, lane vectors/graphs, scene graphs; quality_report.json.

Storage & Indexing

S3 Gold; summary tables in Glue/Athena; W&B Artifacts (dataset + model provenance).

Core tooling

EKS/Batch GPU jobs; PyTorch/TensorRT; multi-view fusion; W&B for metrics/artifacts.

10) Human QA (HITL)¶

Trigger

Low-confidence/uncertain slices from #9; periodic audit sampling.

Inputs

Auto-labels + media; labeling ontology (versioned); guidelines & golden tasks.

Step-by-step (with guardrails)

Priority queue
- Order by uncertainty, rarity, business priority; enforce per-slice quotas.
Annotate/verify
- Labelers in Labelbox or SageMaker Ground Truth; consensus labeling for critical classes; adjudication by senior reviewers.
Quality control
- Blind overlap to compute IAA (κ/α); golden tasks; geometric linting (box aspect, mask holes).
Validate
- Promote only if precision@accept ≥ target and IAA ≥ threshold; otherwise route feedback to guidelines or auto-labeler thresholds.

Outputs

labels_human/ (final truths), diffs vs auto-labels, QA reports, updated ontology version.

Storage & Indexing

S3 Gold human labels; tool-native label DB for audit; DVC dataset tags.

Core tooling

Labelbox/Ground Truth, reviewer dashboard, webhooks → S3; W&B lineage links.

11) Golden / Slice Builder¶

Trigger

After #9–#10 converge; prior to training cycles or benchmark refresh.

Inputs

Labeled pools (auto + human), scenario specs from #8, slice definitions (weather/time/geo/object).

Step-by-step (with guardrails)

Assemble & balance
- Stratified sampling to meet per-slice minima; handle class imbalance (reweighing or oversample rare).
- Respect temporal/geographic boundaries to avoid leakage.
Freeze & version
- Emit *.manifest with absolute URIs + checksums; produce Datasheet for Datasets and populate the data section of the Model Card; register in W&B Artifacts + tag in DVC.
Validate
- Leakage checks: no overlapping scene_id/drive_id across splits; near-duplicate screening via embedding distance.
- Baseline eval on Golden validation set; per-slice metrics recorded; gate if regressions vs last baseline.

Outputs

golden_train/val/test.manifest (with hashes), slices.yaml, datasheet.md, model_card.md (data section).

Storage & Indexing

S3 Gold; DVC & semantic tags; Glue/Athena tables for audits; W&B artifact registry.

Core tooling

DVC, Athena, Great Expectations (row-level rules), W&B Artifacts.

12) Offline Mining (Continuous Error/Drift Discovery)¶

Trigger

Nightly/weekly schedule; after evals; whenever new prod telemetry/shadow logs land.

Inputs

Production predictions & telemetry, shadow logs, monitoring slices, last Golden manifest.

Step-by-step (with guardrails)

Aggregate errors
- Join predictions with ground truth (where available) or proxy outcomes; compute FP/FN by slice; maintain error leaderboards.
Drift & OOD
- Evidently: PSI/KS on feature/score distributions vs reference; OOD scores (Mahalanobis/energy).
Mine candidates
- Seed with top error exemplars → nearest neighbors via vector index (#7) → cluster (HDBSCAN/DBSCAN) to discover themes.
- Exclude items present in last training set (check against manifests).
Validate
- Novelty (embedding distance vs training), utility (expected error coverage gain); deduplicate; spot-check sample.
Emit next round specs
- Produce next_specs.yaml + candidate lists for #8; log run in W&B.

Outputs

error_buckets/ (clustered examples with tags), mined_candidates.parquet, next_specs.yaml.

Storage & Indexing

S3 Silver/Gold; notes to OpenSearch; error dashboards in Athena/QuickSight.

Core tooling

Airflow schedule; Evidently; OpenSearch/FAISS; Athena/QuickSight; W&B.

Output Schemas¶

scene_segments.parquet

scene_id:str, drive_id:str, vehicle_id:str, start_ts:ts, end_ts:ts,
clip_uri:array<str>, tags:array<str>, trigger_flags:array<str>, score:float

embeddings.parquet

uri:str, scene_id:str, modality:str, vector:array<float32>, dt:date

dataset_spec.yaml (excerpt)

name: cyclists_dusk_rain_v3
slices:
  - name: dusk_rain_cyclists
    filters:
      weather: [rain]
      time_of_day: [dusk]
      tags: ["cyclist"]
    knn_seed_uri: "s3://.../seed.jpg"
    target_count: 5000
excludes:
  manifests: ["s3://.../golden_train.manifest"]

labels_auto/ (COCO-style excerpt)

{ "images":[{"id":123,"file_name":"...","scene_id":"S1", "ts":"..."}],
  "annotations":[{"image_id":123,"category_id":1,"bbox":[x,y,w,h],"score":0.92}],
  "categories":[{"id":1,"name":"pedestrian"}] }

golden_train.manifest

uri:str, checksum:str, scene_id:str, slice:str, label_uri:str

quality_report.json (auto-labels)

{"map@50":0.61,"ece":0.07,"iou_median":0.73,
 "by_class":{"pedestrian":{"ap50":0.58,"n":12400},"cyclist":{"ap50":0.55,"n":4200}}}

Validation Strategy Embedded in These Workflows¶

Schema & quality gates (Great Expectations) run at scene generation and dataset assembly to prevent corrupt data from propagating.
Statistical drift checks (Evidently) on embedding distributions and slice composition before building the Golden set.
Retrieval sanity on vector indices (near-duplicate recall, mAP@K) ensures the mining UX returns useful neighbors.
Label quality via IAA, golden tasks, and spot checks on auto-labels maintain training-data integrity.
Leakage checks (scene overlap across splits) in #11 safeguard evaluation integrity.