# Environment Strategy ## ### Section 3.4: Environment Strategy (Dev, Staging, Prod) (Organizing Our Kitchen Stations) We adopt a standard three-environment strategy to ensure stability and quality. * **3.4.1 Purpose and Configuration Goals for Each Environment** * **Development (Dev):** * *Purpose:* For developers to write and test code locally/in personal cloud workspaces. Focus on iteration speed and individual productivity. * *Configuration:* Local machines (with Docker for consistency) or cloud-based IDEs (GitHub Codespaces, AWS Cloud9, SageMaker Studio). Access to *sampled, anonymized, or synthetic data*. Minimal resources. Uses `feature` branches. * **Staging (Pre-Production):** * *Purpose:* To test code changes in an environment that *mirrors production* before deploying to live users. Focus on integration, end-to-end testing, and performance validation. * *Configuration:* Dedicated AWS account. Infrastructure managed by Terraform, identical or scaled-down version of Prod. Deploys from `main` branch after PR merge. Uses *staging-specific data sources* (e.g., a separate S3 bucket with a larger, more realistic dataset than dev, but not live prod data). Runs full integration tests, load tests. * **Production (Prod):** * *Purpose:* To serve live user traffic. Focus on stability, reliability, performance, and security. * *Configuration:* Dedicated AWS account. Infrastructure managed by Terraform. Deploys from `main` branch after successful Staging validation and manual approval. Uses *live production data sources*. Comprehensive monitoring and alerting. * **3.4.2 Data Access Strategy and Permissions Across Environments** * **Dev:** Read-only access to specific, small, and potentially anonymized/synthetic datasets (e.g., sample of S3 data). No access to production databases or sensitive user data. * **Staging:** Read-only access to dedicated staging data sources that mimic production data structure and volume but are not live production data. This might be a regularly refreshed, sanitized snapshot of production data or a large, curated test dataset. * **Prod:** * *Data Ingestion Pipeline:* Read access to raw data sources (scraping targets, APIs). Write access to its S3 processed data bucket. * *Training Pipeline:* Read access to processed data in S3 (Prod). Write access to Model Registry (W&B) and artifact stores. * *Inference Pipeline (LLM path):* Read access to processed data in S3 (Prod). Write access to the enriched data store for the FastAPI backend. * *FastAPI Backend:* Read access to its enriched data store. No direct write access to core data pipelines, only to its own logs. * *IAM Roles:* Define specific IAM roles for each pipeline/service within each environment to enforce least privilege. ---