Config Management¶

¶

Section 3.3: Configuration and Secrets Management Strategy (Securing Recipes & Special Ingredients)**¶

3.3.1 Why Robust Config & Secrets Management is Crucial in MLOps
- Security (protecting API keys, database credentials).
- Reproducibility (tracking exact configurations used for runs).
- Environment Management (different settings for Dev, Staging, Prod).
- Collaboration (sharing non-sensitive configs safely).
3.3.2 Types of Configurations in an ML Project
- Data Sources: Paths to S3 buckets, database connection strings (excluding credentials).
- Feature Engineering Parameters: Binning strategies, embedding dimensions, list of features to use.
- Model Training Hyperparameters: Learning rate, batch size, number of epochs, model architecture details.
- Pipeline Parameters: Instance types for jobs, resource allocations, trigger schedules.
- Infrastructure Settings: VPC IDs, subnet IDs, security group IDs (managed by IaC but might be referenced).
- API Endpoints: URLs for external services (e.g., LLM provider).
- Secrets: Database passwords, API keys (LLM, Cloud provider services), private certificates.
3.3.3 Common Approaches to Configuration Management
- Configuration Files (e.g., YAML, JSON, TOML, INI):
  - Pros: Human-readable, easy to edit, commonly supported by libraries, good for version control (Git).
  - Cons: Can become unwieldy for complex projects, managing environment-specific overrides needs a strategy.
  - Strategy: Use base config files and environment-specific override files (e.g., config_base.yaml, config_staging.yaml, config_prod.yaml). Load base then merge environment-specific.
- Environment Variables:
  - Pros: Standard way to pass configs in containerized environments (Docker, Kubernetes) and CI/CD systems. Easy to set dynamically.
  - Cons: Not ideal for complex/nested structures. Managing many variables can be cumbersome. Less auditable directly within the application codebase if not explicitly loaded from a file.
- Dedicated Config Management Tools (e.g., HashiCorp Consul, AWS AppConfig):
  - Pros: Centralized management, dynamic updates without redeployment, versioning, access control.
  - Cons: Adds another tool to the stack, can be overkill for simpler projects.
3.3.4 Best Practices for Managing Secrets
- NEVER commit secrets directly to Git.
- Use .env files for LOCAL development ONLY, and ensure .env is in .gitignore.
- Secrets Management Services (The Secure Ingredient Lockbox):
  - Cloud-Native: AWS Secrets Manager, Google Secret Manager, Azure Key Vault.
  - Third-Party: HashiCorp Vault.
  - How they work: Store secrets encrypted. Applications/Pipelines fetch secrets at runtime using IAM roles/service accounts with appropriate permissions.
- Injecting Secrets into Pipelines/Applications:
  - CI/CD systems (e.g., GitHub Actions Secrets) can securely inject secrets as environment variables into build/deployment steps.
  - Orchestrators (e.g., Airflow Connections, Kubernetes Secrets) can manage secrets for pipeline tasks.
  - Applications (e.g., FastAPI service) fetch from secrets manager at startup or per request (with caching).
3.3.5 Configuration and Secrets Strategy for “Trending Now”
- Non-Sensitive Configurations:
  - Use YAML files stored in the mlops/config/ directory.
  - Example: config_base.yaml for common settings.
  - config_dev.yaml, config_staging.yaml, config_prod.yaml for environment-specific overrides (e.g., S3 bucket names, Airflow connection IDs, LLM model choice).
  - These will be version-controlled with Git.
  - Pipelines and applications will load the appropriate config based on an environment variable (e.g., APP_ENV=staging).
- Secrets Management:
  - Local Development (Dev): Use a .env file (added to .gitignore) to store API keys (LLM provider, AWS keys for local DVC/S3 interaction if needed). The application/scripts will load from this .env file if APP_ENV=dev.
  - Staging & Production:
    - LLM API Key: Store in AWS Secrets Manager.
    - (If needed) Database credentials for Airflow metadata DB (if self-hosted on EC2): Store in AWS Secrets Manager.
    - AWS Service credentials (for S3, App Runner, etc.): Handled via IAM Roles attached to the EC2 instances (for Airflow workers/scheduler) or App Runner service. This is the preferred method for AWS service-to-service communication.
    - GitHub Actions Secrets: Used to store AWS credentials needed for Terraform to deploy infrastructure and for Airflow/App Runner to pull from ECR if using private images.
- Loading Configs in the Application/Pipelines:
  - Python scripts (in Airflow tasks, FastAPI) will use a helper function to:
    1. Load config_base.yaml.
    2. Identify current environment (from APP_ENV environment variable).
    3. Load and merge the corresponding config_<env>.yaml.
    4. If APP_ENV=dev, load secrets from .env.
    5. For other environments, fetch necessary secrets from AWS Secrets Manager using boto3 and the IAM role associated with the execution context.