Config Management¶
¶
Section 3.3: Configuration and Secrets Management Strategy (Securing Recipes & Special Ingredients)**¶
3.3.1 Why Robust Config & Secrets Management is Crucial in MLOps
Security (protecting API keys, database credentials).
Reproducibility (tracking exact configurations used for runs).
Environment Management (different settings for Dev, Staging, Prod).
Collaboration (sharing non-sensitive configs safely).
3.3.2 Types of Configurations in an ML Project
Data Sources: Paths to S3 buckets, database connection strings (excluding credentials).
Feature Engineering Parameters: Binning strategies, embedding dimensions, list of features to use.
Model Training Hyperparameters: Learning rate, batch size, number of epochs, model architecture details.
Pipeline Parameters: Instance types for jobs, resource allocations, trigger schedules.
Infrastructure Settings: VPC IDs, subnet IDs, security group IDs (managed by IaC but might be referenced).
API Endpoints: URLs for external services (e.g., LLM provider).
Secrets: Database passwords, API keys (LLM, Cloud provider services), private certificates.
3.3.3 Common Approaches to Configuration Management
Configuration Files (e.g., YAML, JSON, TOML, INI):
Pros: Human-readable, easy to edit, commonly supported by libraries, good for version control (Git).
Cons: Can become unwieldy for complex projects, managing environment-specific overrides needs a strategy.
Strategy: Use base config files and environment-specific override files (e.g.,
config_base.yaml
,config_staging.yaml
,config_prod.yaml
). Load base then merge environment-specific.
Environment Variables:
Pros: Standard way to pass configs in containerized environments (Docker, Kubernetes) and CI/CD systems. Easy to set dynamically.
Cons: Not ideal for complex/nested structures. Managing many variables can be cumbersome. Less auditable directly within the application codebase if not explicitly loaded from a file.
Dedicated Config Management Tools (e.g., HashiCorp Consul, AWS AppConfig):
Pros: Centralized management, dynamic updates without redeployment, versioning, access control.
Cons: Adds another tool to the stack, can be overkill for simpler projects.
3.3.4 Best Practices for Managing Secrets
NEVER commit secrets directly to Git.
Use
.env
files for LOCAL development ONLY, and ensure.env
is in.gitignore
.Secrets Management Services (The Secure Ingredient Lockbox):
Cloud-Native: AWS Secrets Manager, Google Secret Manager, Azure Key Vault.
Third-Party: HashiCorp Vault.
How they work: Store secrets encrypted. Applications/Pipelines fetch secrets at runtime using IAM roles/service accounts with appropriate permissions.
Injecting Secrets into Pipelines/Applications:
CI/CD systems (e.g., GitHub Actions Secrets) can securely inject secrets as environment variables into build/deployment steps.
Orchestrators (e.g., Airflow Connections, Kubernetes Secrets) can manage secrets for pipeline tasks.
Applications (e.g., FastAPI service) fetch from secrets manager at startup or per request (with caching).
3.3.5 Configuration and Secrets Strategy for “Trending Now”
Non-Sensitive Configurations:
Use YAML files stored in the
mlops/config/
directory.Example:
config_base.yaml
for common settings.config_dev.yaml
,config_staging.yaml
,config_prod.yaml
for environment-specific overrides (e.g., S3 bucket names, Airflow connection IDs, LLM model choice).These will be version-controlled with Git.
Pipelines and applications will load the appropriate config based on an environment variable (e.g.,
APP_ENV=staging
).
Secrets Management:
Local Development (Dev): Use a
.env
file (added to.gitignore
) to store API keys (LLM provider, AWS keys for local DVC/S3 interaction if needed). The application/scripts will load from this.env
file ifAPP_ENV=dev
.Staging & Production:
LLM API Key: Store in AWS Secrets Manager.
(If needed) Database credentials for Airflow metadata DB (if self-hosted on EC2): Store in AWS Secrets Manager.
AWS Service credentials (for S3, App Runner, etc.): Handled via IAM Roles attached to the EC2 instances (for Airflow workers/scheduler) or App Runner service. This is the preferred method for AWS service-to-service communication.
GitHub Actions Secrets: Used to store AWS credentials needed for Terraform to deploy infrastructure and for Airflow/App Runner to pull from ECR if using private images.
Loading Configs in the Application/Pipelines:
Python scripts (in Airflow tasks, FastAPI) will use a helper function to:
Load
config_base.yaml
.Identify current environment (from
APP_ENV
environment variable).Load and merge the corresponding
config_<env>.yaml
.If
APP_ENV=dev
, load secrets from.env
.For other environments, fetch necessary secrets from AWS Secrets Manager using
boto3
and the IAM role associated with the execution context.