# Implementation Plan

##
### Section 3.7: Detailed Implementation Plan (The Master Prep List)

This section presents a high-level mapping of how the "Trending Now" project development will unfold alongside the study guide chapters.

*   **Chapter 4 (The Market Run – Data Sourcing, Discovery & Understanding):**
    *   *Project:*
        *   Finalize and document data sources (APIs, scraping targets for movies/shows and reviews).
        *   Implement initial scraping scripts (e.g., using Python with BeautifulSoup/Requests) for a sample set of data.
        *   Perform Exploratory Data Analysis (EDA) on the scraped samples using Pandas and visualization libraries in a notebook.
        *   Assess initial data quality, identify common issues (missing fields, inconsistent formats).
        *   Create initial Data Cards or documentation for the chosen data sources.
        *   Set up basic S3 buckets for raw data storage.
*   **Chapter 5 (Mise en Place – Data Engineering for Reliable ML Pipelines):**
    *   *Project:*
        *   Develop Python scripts for cleaning and preprocessing the scraped movie/review data (handle missing values, standardize text).
        *   Store processed data in Parquet format in S3.
        *   Initialize DVC for versioning the processed datasets.
        *   Design the schema for the processed data.
        *   Implement initial data validation checks (e.g., for expected fields, data types) as Python functions.
        *   Conceptualize the Airflow DAG for the Data Ingestion Pipeline (Steps: Scrape -> Clean -> Store -> Version).
*   **Chapter 6 (Perfecting Flavor Profiles – Feature Engineering and Feature Stores):**
    *   *Project:*
        *   Develop feature extraction logic for plot summaries and reviews (e.g., TF-IDF using Scikit-learn, placeholder for BERT embeddings).
        *   Create functions for generating these features.
        *   Discuss how these features would be defined and managed in Feast (conceptual design).
        *   Identify potential issues with feature consistency between a batch training path and a future online inference path.
*   **Chapter 7 (The Experimental Kitchen – Model Development & Iteration):**
    *   *Project:*
        *   Set up Weights & Biases for experiment tracking.
        *   Train baseline models (e.g., keyword-based or simple logistic regression on TF-IDF for genre classification).
        *   Develop and train the XGBoost model for genre classification.
        *   Develop scripts to fine-tune a pre-trained BERT model (e.g., from Hugging Face Transformers) for genre classification on a sample of the data.
        *   Track all experiments (parameters, metrics, code versions) in W&B.
        *   Perform hyperparameter tuning for XGBoost and/or BERT.
*   **Chapter 8 (Standardizing the Signature Dish – Building Scalable Training Pipelines):**
    *   *Project:*
        *   Refactor the XGBoost/BERT training scripts into modular, production-ready Python code.
        *   Design and implement the Airflow DAG for the Model Training Pipeline (Data Loading -> Feature Engineering -> Training -> Evaluation -> Registration).
        *   Integrate W&B for tracking automated training runs.
        *   Set up a CI (GitHub Actions) workflow for testing the training pipeline code.
*   **Chapter 9 (The Head Chef's Approval – Rigorous Offline Model Evaluation & Validation):**
    *   *Project:*
        *   Implement comprehensive model evaluation steps within the training pipeline DAG (calculating Macro F1, Precision/Recall per genre).
        *   Perform slice-based evaluation on important data segments (e.g., movies vs. TV shows).
        *   Register the validated XGBoost/BERT model versions and their metrics in W&B Model Registry.
        *   Create a Model Card for the best performing educational model.
*   **Chapter 10 (Grand Opening – Model Deployment Strategies & Serving Infrastructure):**
    *   *Project:*
        *   Develop the FastAPI backend service with endpoints for:
            *   Serving genres from the trained XGBoost/BERT model (educational path).
            *   Integrating with the chosen LLM API to get genre, summary, score, and tags.
        *   Package the FastAPI application with Docker.
        *   Write Terraform scripts to define and deploy the FastAPI service to AWS App Runner (for both Staging and Prod environments).
        *   Set up CI/CD using GitHub Actions to build and deploy the FastAPI service.
*   **Chapter 11 (Listening to the Diners – Production Monitoring & Observability for ML Systems):**
    *   *Project:*
        *   Set up AWS CloudWatch for monitoring the App Runner service (FastAPI).
        *   Implement structured logging within the FastAPI application.
        *   Design a conceptual process for using EvidentlyAI/WhyLogs to generate drift reports on LLM input/output and store them in S3.
        *   Set up basic Grafana dashboards (conceptual) to visualize key operational and LLM output metrics.
        *   Configure CloudWatch Alarms based on critical FastAPI metrics or the presence of drift reports.
*   **Chapter 12 (Refining the Menu – Continual Learning & Production Testing for Model Evolution):**
    *   *Project:*
        *   Define triggers for retraining the educational XGBoost/BERT model (e.g., based on new data from the Data Ingestion Pipeline).
        *   Update the Airflow Training Pipeline DAG to handle retraining logic.
        *   Design conceptual A/B testing (e.g., using App Runner's traffic splitting) to compare a new version of the XGBoost/BERT model or a new LLM prompt.
*   **Chapter 13 (Running a World-Class Establishment – Governance, Ethics & The Human Element in MLOps):**
    *   *Project:*
        *   Review the "Trending Now" project's MLOps setup for governance (auditability of pipeline runs via Airflow & W&B, data lineage via DVC).
        *   Discuss ethical considerations for the "Trending Now" app: potential biases in scraped genre data, fairness in how "vibes" or scores are generated by the LLM, and user data privacy related to review content.
        *   Reflect on the team collaboration aspects if this were a multi-person project.
        *   Consider potential UX improvements for presenting LLM-generated insights responsibly.

---