Project Management for MLOps

Appendix B: Managing the ML Kitchen – Project Management for MLOps

B.1 Introduction: From Recipe Ideas to a Coordinated Culinary Service

  • Just as a restaurant needs more than chefs and ingredients (it needs managers, schedules, order systems, and ways to track progress), an ML project needs more than just data scientists and code.

  • This appendix provides an overview of common project management methodologies and tools used in the tech industry, illustrating how complex projects like our “Trending Now” application are organized, tracked, and delivered.

  • While our study guide focuses on the MLOps technical lifecycle, understanding these management frameworks is crucial for real-world success.

B.2 Why Project Management in MLOps?

  • Handling Complexity: MLOps projects are multi-faceted, involving diverse tasks, skills, and dependencies.

  • Iterative Nature: ML is experimental; project plans need to adapt to new learnings.

  • Stakeholder Alignment: Keeping product, business, and technical teams aligned.

  • Visibility & Tracking: Ensuring everyone knows what’s being worked on, progress, and blockers.

  • Resource Allocation: Managing time, budget, and personnel effectively.

B.3 Core Project Management Methodologies

  • B.3.1 Waterfall (Briefly Mention as a Contrast)

    • Linear, sequential approach.

    • Often ill-suited for the experimental nature of ML.

  • B.3.2 Agile Principles (The Flexible Kitchen Philosophy)

    • Core values: Individuals and interactions over processes and tools, working software over comprehensive documentation, customer collaboration over contract negotiation, responding to change over following a plan.

    • Iterative development, incremental delivery.

    • Emphasis on continuous feedback and adaptation.

  • B.3.3 Scrum Framework (A Popular Agile Implementation)

    • Core Concepts:

      • Sprints: Short, time-boxed iterations (e.g., 1-4 weeks) where a usable increment of the product is built.

      • Product Backlog: A prioritized list of all desired features/work for the product.

      • Sprint Backlog: A subset of Product Backlog items selected for a specific Sprint.

      • Daily Scrum (Stand-up): Short daily meeting to sync progress, identify blockers.

      • Sprint Review: Demo of the work completed during the Sprint.

      • Sprint Retrospective: Team reflection on what went well, what to improve.

    • Scrum Roles (The Key Kitchen Staff):

      • Product Owner (The Head Chef/Menu Designer): Represents the customer/business. Defines and prioritizes the Product Backlog. Maximizes the value of the product.

      • Scrum Master (The Kitchen Manager/Facilitator): Ensures the team adheres to Scrum principles. Removes impediments. Facilitates Scrum events.

      • Development Team (The Cooks & Prep Staff): Cross-functional group that builds the product increment (e.g., Data Scientists, ML Engineers, Backend/Frontend Devs, MLOps Engineers). Self-organizing.

  • B.3.4 Kanban (Visualizing the Workflow)

    • Focuses on visualizing work, limiting Work In Progress (WIP), and maximizing flow.

    • Uses a Kanban Board with columns representing stages of work (e.g., To Do, In Progress, In Review, Done).

    • Work items (tasks/tickets) move across the board.

    • Emphasis on continuous flow rather than fixed sprints (though can be used with Scrum).

B.4 Common Project Management Tools (The Kitchen’s Order & Tracking System)

  • B.4.1 Issue Trackers (e.g., Jira, Trello, Asana, GitHub Issues)

    • Purpose: To create, assign, track, and manage work items (Epics, User Stories, Tasks, Bugs).

    • Jira: Very popular, powerful, highly customizable. Supports Scrum and Kanban. Can create different issue types, workflows, and reports.

    • Trello/Asana: Simpler, often more visual, good for Kanban-style boards.

    • GitHub Issues: Integrated with code repositories, good for development-focused tasks.

  • B.4.2 Kanban Boards (Digital or Physical)

    • Visual representation of the workflow, often part of issue tracking tools.

  • B.4.3 Wiki/Documentation Tools (e.g., Confluence, Notion, Google Docs, GitHub Wiki)

    • For storing PRDs, technical designs, meeting notes, retrospectives, user guides.

  • B.4.4 Communication Tools (e.g., Slack, Microsoft Teams)

    • For daily communication, quick updates, and team discussions.

B.5 Applying Project Management to the “Trending Now” MLOps Project (Conceptual)

This section will illustrate how we could manage the development of our “Trending Now” app if it were a real team project.

  • B.5.1 Team Roles & Personas (Conceptual for “Trending Now”)

    • Product Owner (Hypothetical): Defines features for the “Trending Now” app, prioritizes what gets built (e.g., “LLM Summaries” before “Vibe Search”).

    • Scrum Master (Hypothetical): Facilitates planning, ensures the “team” stays on track.

    • Development Team (The “Trending Now” Guide Crafters - Us!):

      • MLOps Lead/Developer(s): Responsible for designing and explaining the MLOps pipelines, backend API.

      • Frontend Developer(s) (Conceptual): Responsible for the D3.js visualization and UI (though guide won’t detail this).

      • (If we had SMEs): Domain experts for movie/TV show data, review analysis.

  • B.5.2 Structuring Work: Epics and User Stories (Examples for “Trending Now”)

    • Epic: “Implement LLM-Powered Content Enrichment”

      • User Story 1: “As a user, I want to see a concise summary of reviews for a movie so that I can quickly decide if I’m interested.” (Leads to LLM summarization task).

      • User Story 2: “As a user, I want to see a vibe score for a movie so that I can gauge its overall sentiment from reviews.” (Leads to LLM score generation task).

      • User Story 3: “As a user, I want to see vibe tags for a movie so that I can understand its themes/mood quickly.” (Leads to LLM tag generation task).

    • Epic: “Develop Data Ingestion Pipeline”

      • User Story: “As an MLOps system, I need to scrape new movie releases daily so that content is up-to-date.”

  • B.5.3 Sprint Planning (Example Sprints for “Trending Now”)

    • This would be a conceptual breakdown of how the project could be divided into sprints. Each sprint would deliver a functional increment of the MLOps system or application.

    • Sprint 1: Foundational Setup

      • Task: Finalize PRD, User Flow, Tech Stack (Content of our Chapter 3).

      • Task: Set up Git repository, basic project structure.

      • Task: Initial data source exploration & sample scraping script.

    • Sprint 2: Data Ingestion Pipeline V1

      • Task: Implement robust scraping for movies & reviews.

      • Task: Implement data cleaning and storage in S3/Parquet.

      • Task: Set up DVC for data versioning.

      • Task: Create initial Airflow DAG for data ingestion.

    • Sprint X: LLM Inference Pipeline

      • Task: Develop FastAPI endpoint for LLM genre classification.

      • Task: Integrate LLM for review summarization.

      • Task: Implement LLM for vibe score generation.

      • Task: Implement LLM for vibe tag generation.

      • Task: Store enriched data for backend.

    • Sprint Y: Frontend Visualization - Basic Bubbles

      • Task: Basic HTML/CSS structure.

      • Task: Fetch data from FastAPI.

      • Task: Initial D3.js bubble chart (default view, size by score).

    • (And so on, mapping to the chapter implementation plan)

  • B.5.4 Kanban Board Example for “Trending Now”

    • A visual representation of a Kanban board with columns:

      • Project Backlog (Derived from PRD & Implementation Plan)

      • Sprint Backlog (Tasks for the current conceptual sprint)

      • In Progress

      • In Review (Code review, QA)

      • Ready for Staging Deployment

      • In Staging

      • Ready for Prod Deployment

      • In Production / Done

    • Example tickets/cards on the board: “Develop scraping script for Rotten Tomatoes,” “Implement LLM summarization endpoint,” “Create Terraform for App Runner Staging.”

B.6 Challenges in Applying Agile/Scrum to ML Projects

  • Uncertainty of Research/Experimentation: Difficult to estimate story points or commit to sprint goals for ML model development tasks where outcomes are unknown.

  • Defining “Done”: When is an ML model “done”? Accuracy can always be improved.

  • Long Training Times: Can conflict with short sprint cycles.

  • Solution: Flexible sprint planning, focus on delivering learning or validated hypotheses as sprint outcomes for research tasks, time-boxing experiments. Separate tracks for research vs. engineering if needed.

B.7 Conclusion: Orchestrating the Culinary Symphony

  • While the technical aspects of MLOps are vital, effective project management is the conductor that ensures all parts of the “ML Kitchen” work in harmony to deliver value.

  • Agile methodologies like Scrum, coupled with tools like Jira and Kanban, provide frameworks for managing the iterative, complex, and collaborative nature of MLOps projects.

  • By understanding these principles, MLOps leads and teams can better plan, execute, and adapt, ultimately increasing the chances of creating successful and impactful ML-powered applications.