Energy Demand Forecasting in Time Series IoT Data


TL;DR: ML-Powered Energy Demand Forecasting for Smart Buildings

  • Challenge: The company needed to forecast building-level energy demand 24-72 hours in advance to provide valuable data to energy suppliers for grid balancing and to enable a new resident-facing feature for optimizing solar energy self-consumption. The system had to be accurate, reliable, and capable of handling complex factors like weather and holidays.

  • My Role & Solution: I led the development and operationalization of the forecasting system, serving as the Data Scientist and MLOps Engineer. My key contributions were:

    • Strategic Approach: I designed a model development strategy that began with strong baselines (like SARIMAX and Prophet) and iteratively progressed to a high-performance XGBoost model. I established that forecasting at the building-level was the most effective initial approach, balancing accuracy with computational feasibility.

    • Feature Engineering: I engineered a rich feature set crucial for forecast accuracy. This involved creating time-series features (lags, rolling windows), incorporating calendar events (holidays), and building a robust pipeline to process and align future weather forecast data with historical consumption patterns.

    • MLOps Infrastructure: Using Terraform, I built a fully automated, serverless MLOps pipeline on AWS. The architecture included a distinct training workflow orchestrated by Step Functions for monthly model retraining, and a separate daily inference workflow that generates and stores forecasts.

    • Production Lifecycle Management: I implemented the end-to-end system, including a containerized (Docker/ECR) forecasting model, versioning and governance via SageMaker Model Registry, and a CI/CD pipeline in Bitbucket for automated testing and deployment. The solution included a scalable serving layer using Amazon Timestream, API Gateway, and Lambda to deliver forecasts to both B2B and B2C consumers.

  • Impact: The forecasting system became a key data product for the company, unlocking new value for both business partners and residents.

    • Achieved a <10% Mean Absolute Percentage Error (MAPE) on 24-hour ahead building-level forecasts, providing reliable data to energy suppliers.

    • Enabled the launch of the “Smart Energy Advisor” feature in the resident app, leading to a 15% increase in user engagement with energy management tools. This, in turn, drove a measured ~10% increase in the building’s solar self-consumption rate by empowering residents to align appliance usage with peak solar generation periods.

  • System Architecture: I designed and implemented the complete AWS solution, from data ingestion to API serving, ensuring scalability and automation.

Introduction

Purpose

This document provides detailed technical information about the Machine Learning (ML) based Energy Demand Forecasting (EDF) system developed. It serves as a guide for developers, MLOps engineers, operations teams.

Business Goal

The primary goals of the EDF system are to:

  • Provide accurate short-term (e.g., 24-72 hours) aggregated energy demand forecasts at the building level to external energy suppliers and Distribution System Operators (DSOs) for improved grid balancing, energy procurement, and network management.

  • Empower residents by providing insights into predicted building energy consumption versus predicted solar generation, enabling them to optimize appliance usage for increased solar self-consumption and potential cost savings.

Scope

This documentation details the end-to-end pipelines for data processing, model training, model evaluation, model registration, batch inference (forecasting), forecast storage, and the conceptual API serving layer. It assumes the existence of a separate data ingestion pipeline providing the necessary raw data feeds.

Key Technologies

  • Cloud Platform: AWS (Amazon Web Services)

  • Data Lake: Amazon S3

  • Data Processing/ETL: AWS Glue (PySpark), AWS Lambda (Python), SageMaker Processing Jobs (PySpark/Python)

  • Feature Management: Shared Code Libraries (Primary), Amazon SageMaker Feature Store (Optional for historical features)

  • Model Training: Amazon SageMaker Training Jobs (using custom Docker containers with Prophet, XGBoost, etc.)

  • Model Forecasting: Amazon SageMaker Processing Jobs (running forecast generation scripts)

  • Model Registry: Amazon SageMaker Model Registry

  • Orchestration: AWS Step Functions

  • Scheduling: Amazon EventBridge Scheduler

  • Forecast Storage: Amazon Timestream (Primary Example), Amazon RDS, Amazon DynamoDB

  • API Layer: Amazon API Gateway, AWS Lambda

  • Infrastructure as Code: Terraform

  • CI/CD: Bitbucket Pipelines

  • Containerization: Docker

  • Core Libraries: PySpark, Pandas, Scikit-learn, Boto3, PyYAML, Prophet, XGBoost, AWS Data Wrangler.

Table of Contents

  1. Introduction

  2. Discovery and Scoping

  3. System Architecture

  4. Configuration Management

  5. Infrastructure as Code (Terraform)

  6. CI/CD Pipeline (Bitbucket)

  7. Deployment & Execution

  8. Monitoring & Alerting

  9. Troubleshooting Guide

  10. Security Considerations

  11. Roadmap & Future Enhancements

  12. Appendices

Discovery and Scoping

Use Case Evaluation

Product Strategies

Features

Product Requirements Document

Development Stages

System Architecture

Overall Data Flow

The EDF system utilizes distinct, automated pipelines for training forecasting models and generating daily forecasts. It relies on processed historical data and external weather forecasts. Forecasts are stored in a time-series database and made available via APIs.

  1. Raw Data: Aggregated Consumption, Solar Generation, Historical Weather, Future Weather Forecasts, Calendar/Topology data land in S3 Raw Zone.

  2. Processed Data: Ingestion pipeline processes historical data into processed_edf_data in S3 Processed Zone / Glue Catalog.

  3. Features (Training): Training Feature Engineering step reads processed_edf_data, calculates historical time-series features (lags, windows, time components), splits train/eval sets, and writes them to S3.

  4. Features (Inference): Inference Feature Engineering step reads recent processed_edf_data and future raw weather forecasts, calculates features for the forecast horizon using shared logic, and writes them to S3.

  5. Model Artifacts: Training jobs save serialized forecast models (Prophet JSON, XGBoost UBJ) to S3.

  6. Evaluation Reports: Evaluation jobs save forecast metrics (MAPE, RMSE) to S3.

  7. Model Packages: Approved forecast models are registered in the EDF Model Package Group.

  8. Raw Forecasts: Inference forecast generation step writes raw forecast outputs (timestamp, building, yhat, yhat_lower, yhat_upper) to S3.

  9. Stored Forecasts: Load Forecasts step reads raw forecasts from S3 and writes them into the target database (e.g., Timestream).

  10. Served Forecasts: API Gateway and Lambda query the forecast database to serve B2B and B2C clients.

Forecasting Pipeline Description:

  1. Ingestion: Aggregated Consumption, Solar Generation, Topology/Calendar, and crucially, Forecasted Weather data are ingested into the Raw S3 Zone.

  2. Processing: A daily Glue ETL job aggregates data to the building level (if needed), joins it with weather forecasts and calendar info, and engineers features (lags, time features, weather interactions). Processed features are stored in S3 (and potentially SageMaker Feature Store for easier reuse/serving). The Glue Data Catalog is updated.

  3. Model Training: A Step Functions workflow orchestrates training, similar to AD, using appropriate forecasting models (ARIMA, Prophet, Gradient Boosting, etc.). Models are registered.

  4. Batch Inference: A daily Step Functions workflow prepares features for the forecast horizon (using actual latest data and future weather forecasts), runs SageMaker Batch Transform, and stores the resulting forecasts (e.g., hourly demand for next 72h) in S3.

  5. Results Storage: Raw forecasts are stored in S3. A subsequent job loads the forecasts into a database optimized for time-series queries or fast lookups (DynamoDB, RDS, or potentially Amazon Timestream).

  6. Serving:

    • B2B: A dedicated API Gateway endpoint uses Lambda to query the Forecast Database, perform necessary aggregation/anonymization, and return forecasts to authorized suppliers/DSOs.

    • B2C: The existing App/Tablet backend queries the Forecast Database (likely via another API Gateway endpoint) to get simplified data for resident visualization (comparing building demand vs. solar).

    • Analysts: Use Athena for ad-hoc analysis.

Training Workflow

Summary: Triggered by Schedule/Manual -> Validates Schema -> Engineers Historical Features (Train/Eval Sets) -> Trains Model (Prophet/XGBoost) -> Evaluates Model (MAPE, RMSE) -> Conditionally Registers Model (Pending Approval).

Step Function State Machine

  1. State: ValidateSchema (Optional)

    • Service: SM Processing / Glue Shell

    • Action: Validates schema of processed_edf_data.

  2. State: FeatureEngineeringTrainingEDF

    • Service: SM Processing Job (Spark) / Glue ETL

    • Action: Reads processed_edf_data for training range. Creates time features, lags (consumption, weather), rolling windows. Splits into train/eval feature sets (Parquet) written to S3.

  3. State: ModelTrainingEDF

    • Service: SM Training Job

    • Action: Reads training features from S3. Instantiates/fits chosen model (Prophet/XGBoost) based on input strategy and hyperparameters. Saves model artifact (prophet_model.json / xgboost_model.ubj + model_metadata.json) within model.tar.gz to S3.

  4. State: ModelEvaluationEDF

    • Service: SM Processing Job (Python + libs)

    • Action: Loads model artifact. Reads evaluation features from S3. Generates forecasts for eval period. Calculates MAPE, RMSE, MAE against actuals (from eval features). Writes evaluation_report_edf.json to S3.

  5. State: CheckEvaluationEDF (Choice)

    • Action: Compares metrics (e.g., MAPE, RMSE) against thresholds from config.

  6. State: RegisterModelEDF

    • Service: Lambda

    • Action: Creates Model Package in EDF group (PendingManualApproval), embedding metadata and evaluation results.

  7. Terminal States: WorkflowSucceeded, EvaluationFailedEDF, WorkflowFailed.

Inference Workflow

Summary: Triggered Daily by Scheduler -> Gets Latest Approved Model -> Creates SM Model Resource -> Engineers Inference Features -> Generates Forecasts (SM Processing) -> Loads Forecasts into DB (Timestream/RDS).

  1. State: GetInferenceDate (Optional Lambda / Step Functions Context)

    • Action: Determines target forecast start date (e.g., “today” or “tomorrow” relative to execution time) and forecast end date based on horizon.

  2. State: GetApprovedEDFModelPackage

    • Service: Lambda

    • Action: Gets latest Approved Model Package ARN from EDF group.

  3. State: CreateEDFSageMakerModel

    • Service: Lambda

    • Action: Creates SM Model resource from the approved package ARN.

  4. State: FeatureEngineeringInferenceEDF

    • Service: SM Processing Job (Spark) / Glue ETL

    • Action: Reads recent historical processed_edf_data (for lags) AND future raw weather-forecast data. Creates feature set covering the forecast horizon (including future dates and weather). Writes inference features (Parquet) to S3.

  5. State: GenerateForecastsEDF

    • Service: SM Processing Job (Python + libs)

    • Action: Loads model artifact (mounted via SM Model resource). Reads inference features from S3. Calls model’s predict method to generate forecasts (yhat, yhat_lower, yhat_upper). Writes forecast results (Parquet/CSV) to S3.

  6. State: LoadForecastsToDB

    • Service: Lambda / Glue Python Shell

    • Action: Reads forecast results from S3. Formats data for target DB (Timestream example). Writes records to the database using batch operations.

  7. Terminal States: WorkflowSucceeded, WorkflowFailed.

Forecast Serving

  • B2B API: API Gateway endpoint proxying to a Lambda function. Lambda queries the Forecast DB (e.g., Timestream) based on requested building_id and time_range. Requires authentication/authorization (e.g., API Keys, Cognito Authorizer).

  • B2C API: Separate API Gateway endpoint/Lambda. Queries Forecast DB, potentially performs simple comparison with Solar forecast (if available), returns simplified data structure for UI visualization. Requires app user authentication.

Model Development & Iteration

Models for Time Series Forecasting

Configuration Management

  • Uses config/edf_config.yaml version-controlled in Git.

  • File uploaded to S3 by CI/CD.

  • Scripts load config from S3 URI passed via environment variable/argument.

  • Includes sections for feature_engineering, training (with nested model hyperparameters), evaluation (thresholds), inference (schedule, instance types, DB config).

  • Secrets managed via AWS Secrets Manager / SSM Parameter Store.

Infrastructure as Code (Terraform)

  • Manages all AWS resources for EDF pipelines.

  • Stacks:

    • training_edf: ECR Repo (edf-training-container), Model Package Group (EDFBuildingDemandForecaster), Lambda (RegisterEDFModelFunction), Step Function (EDFTrainingWorkflow), associated IAM roles. Optionally Feature Group (edf-building-features). Requires outputs from ingestion.

    • inference_edf: Timestream DB/Table (or RDS/DDB), Lambdas (GetModel, CreateModel - potentially reused; LoadForecasts), Step Function (EDFInferenceWorkflow), EventBridge Scheduler, API Gateway Endpoints/Lambdas (for serving), associated IAM roles. Requires outputs from ingestion and training_edf.

  • State: Remote backend (S3/DynamoDB) configured.

CI/CD Pipeline (Bitbucket)

  • Defined in bitbucket-pipelines.yml.

  • CI (Branches/PRs): Lints, runs ALL unit tests, builds EDF container, pushes to ECR (edf-training-container repo), validates ALL Terraform stacks.

  • EDF Training CD (custom:deploy-and-test-edf-training): Manual trigger. Deploys training_edf stack. Runs EDF training integration tests.

  • EDF Inference CD (custom:deploy-and-test-edf-inference): Manual trigger. Deploys inference_edf stack. Runs EDF inference integration tests (requires approved model).

Deployment & Execution

8.1 Prerequisites: Base AWS setup, Terraform, Docker, Python, Git, Bitbucket config, deployed ingestion stack.

8.2 Initial Deployment: Deploy Terraform stacks (training_edf, inference_edf) after ingestion. Build/push EDF container. Upload initial edf_config.yaml. Ensure processed EDF data exists.

8.3 Running Training: Trigger EDFTrainingWorkflow Step Function (schedule/manual) with appropriate input JSON (date range, model strategy, hyperparameters, image URI).

8.4 Running Inference: EDFInferenceWorkflow runs automatically via EventBridge Scheduler. Ensure prerequisite data (processed history, weather forecasts) is available daily.

8.5 Model Approval: Manually review PendingManualApproval packages in the EDFBuildingDemandForecaster group and promote to Approved based on evaluation metrics.

Monitoring & Alerting

  • CloudWatch Logs: Monitor logs for EDF Step Functions, Lambdas, SageMaker Processing Jobs.

  • CloudWatch Metrics: Monitor SFN ExecutionsFailed, Lambda Errors/Duration, Processing Job CPU/Memory, Timestream/RDS metrics (if applicable), API Gateway Latency/4XX/5XX Errors.

  • Forecast Accuracy Tracking: CRITICAL: Implement a separate process (e.g., scheduled Lambda/Glue job) to periodically compare stored forecasts against actual consumption data (loaded later) and calculate ongoing MAPE/RMSE. Log these metrics to CloudWatch or a dedicated monitoring dashboard.

  • Data Pipeline Monitoring: Monitor success/failure of ingestion jobs providing raw data and the process_edf_data Glue job.

  • Weather Forecast API: Monitor availability and error rates of the external weather forecast API.

  • CloudWatch Alarms: Set alarms on: SFN Failures, Lambda Errors, Forecast Accuracy Degradation (MAPE/RMSE exceeding threshold), Weather API Fetch Failures, Target DB Write Errors.

Estimated Monthly Costs

Assumptions:

  • Environment: AWS eu-central-1 (Frankfurt) region.

  • Buildings: 120.

  • Data Volume: Building-level aggregation results in smaller processed data sizes compared to the raw sensor data for AD.

    • Daily Processed EDF Data: ~10 MB.

    • Total Monthly Processed EDF Data: 10 MB/day * 30 days = ~300 MB.

  • Model Training Frequency: 1 time per month.

  • Batch Inference Frequency: 30 times per month (daily).

  • API Serving Layer:

    • B2B API (Suppliers): Low volume, high value. Assume 10,000 requests/month.

    • B2C API (Residents): Higher volume. Assume 3,000 apartments * 1 request/day * 30 days = 90,000 requests/month.

    • Total API Requests: ~100,000 per month.

Pipeline Component

AWS Service(s)

Detailed Cost Calculation & Rationale

Estimated Cost (USD)

Data Ingestion & Processing

AWS Glue
AWS Lambda
S3

AWS Glue ETL (Processing EDF Data): Daily job to aggregate raw data and join with weather/calendar. Less data than AD raw processing.
- 1 job/day * 30 days * 0.15 hours/job * 4 DPUs * ~$0.44/DPU-hr = ~$7.92

AWS Lambda (Ingesting Raw Data): Lambdas for fetching weather forecasts, historical weather, etc.
- Assume 5 functions, ~100k total invocations/month, avg 10 sec duration, 256MB. All usage is well within the Lambda free tier. = ~$0.00

S3 (PUT Requests): Lower volume than AD due to aggregation.
- ~1000 PUT req/day * 30 days * ~$0.005/1k req = ~$0.15

$8 - $15

Feature Engineering

SageMaker Processing

SageMaker Processing Jobs: Priced per instance-second. Covers feature engineering for both the monthly training and daily inference pipelines.
- Training (monthly): 1 run/month * 1.0 hour/run * 1 ml.m5.large instance * ~$0.11/hr = ~$0.11
- Inference (daily): 30 runs/month * 0.25 hours/run * 1 ml.m5.large instance * ~$0.11/hr = ~$0.83

$1 - $3

Model Training

SageMaker Training
Step Functions

SageMaker Training Jobs: Priced per instance-second. Assuming monthly retraining on a standard instance.
- 1 run/month * 1.5 hours/run * 1 ml.m5.large instance * ~$0.11/hr = ~$0.17

Step Functions: The training workflow runs infrequently.
- ~10 transitions/run * 1 run/month = 10 transitions. Well within free tier. = ~$0.00

$1 - $2

Model Inference

SageMaker Processing
Step Functions

SageMaker Processing Job (Forecast Generation): We use a Processing Job for flexibility. This is the daily forecasting job.
- 30 runs/month * 0.5 hours/run * 1 ml.m5.large instance * ~$0.11/hr = ~$1.65

Step Functions: The inference workflow runs daily.
- ~8 transitions/run * 30 runs/month = 240 transitions. Well within free tier. = ~$0.00

$2 - $4

Forecast Storage & Serving

API Gateway
AWS Lambda
Amazon Timestream

API Gateway: Priced per million requests.
- 100,000 requests/month is well within the 1M free tier requests (for REST APIs). = ~$0.00

AWS Lambda (Serving): Lambdas for serving B2B/B2C requests.
- 100k invocations/month * avg 150ms duration * 256MB memory. All usage well within free tier. = ~$0.00

Amazon Timestream: Priced per GB of ingestion, storage, and queries.
- Ingestion: 30 days * 120 bldgs * 72 hr fcst * ~1KB/record = ~260 MB ingest/month. Free tier is 1GB. = ~$0.00
- Memory Store: Small, rolling window of recent data. ~1 GB * ~$0.036/GB-hr * 720 hrs = ~$25.92 (This can be high). Let’s assume we reduce memory retention to 1 day = ~$3.60
- Magnetic Store: ~3 GB/year. ~0.3 GB * ~$0.03/GB-month = ~$0.01
- Queries: 100k requests * ~10MB scanned/query (estimate) = ~1TB queries. $0.01/GB * 1024 GB = ~$10.24

$15 - $25

Storage & Logging

S3
ECR
CloudWatch

S3: Priced per GB-month. Lower processed/feature data volume than AD, but still need to store raw data.
- Assume total storage for Raw, Processed, Features, Artifacts = ~380 GB
- 380 GB * ~$0.023/GB-month = ~$8.74
- Add ~$2 for GET/LIST requests.

ECR: Priced per GB-month. For the EDF container image.
- Assume a separate 10 GB storage for EDF images. = ~$1.00

CloudWatch: Logs from all jobs and Lambdas.
- Assume ~10 GB log ingestion * ~$0.50/GB = ~$5.00

$15 - $25

Total Estimated Monthly Cost

-

-

$42 - $74

This detailed breakdown reveals that the highest operational cost for the forecasting system is the Forecast Storage & Serving layer, specifically the Timestream memory store and query costs. The batch ML compute costs for feature engineering, training, and inference remain very low. This highlights the importance of optimizing the database configuration (e.g., memory vs. magnetic retention) and query patterns for the serving APIs to manage costs effectively.

Challenges and learnings

Troubleshooting Guide

  1. SFN Failures: Check execution history for failed state, input/output, error message.

  2. Job Failures (Processing/Training): Check CloudWatch Logs for the specific job run. Look for Python errors, resource exhaustion, S3 access issues.

  3. Lambda Failures: Check CloudWatch Logs. Verify IAM permissions, input payload structure, environment variables, timeouts, memory limits. Check DLQ if configured.

  4. Forecast Accuracy Issues:

    • Verify quality/availability of input weather forecasts.

    • Check feature engineering logic for errors or skew vs. training.

    • Analyze residuals from the model evaluation step.

    • Check if model drift has occurred (compare recent performance to registry metrics). Trigger retraining if needed.

    • Ensure correct model version was loaded by inference pipeline.

  5. Data Loading Issues (Timestream/RDS): Check LoadForecastsToDB Lambda logs for database connection errors, write throttling, data type mismatches, constraint violations. Check DB metrics.

  6. API Serving Issues: Check API Gateway logs and metrics. Check serving Lambda logs. Verify DB connectivity and query performance.

Security Considerations

  • Apply IAM least privilege to all roles.

  • Encrypt data at rest (S3, Timestream/RDS, EBS) and in transit (TLS).

  • Use Secrets Manager for any API keys (e.g., weather provider).

  • Secure API Gateway endpoints (Authentication - Cognito/IAM/API Keys, Authorization, Throttling).

  • Perform regular vulnerability scans on the EDF container image.

  • Consider VPC deployment and endpoints for enhanced network security.

Roadmap & Future Enhancements

  • Implement probabilistic forecasting (prediction intervals).

  • Incorporate more granular data (e.g., appliance recognition, improved occupancy detection) if available.

  • Explore more advanced forecasting models (LSTM, TFT) and benchmark rigorously.

  • Implement automated retraining triggers based on monitored forecast accuracy drift.

  • Develop more sophisticated XAI for forecasting (feature importance).

  • Add A/B testing framework for forecast models.

  • Integrate forecasts with building control systems (e.g., HVAC pre-cooling based on forecast).

Appendices

Configuration File Example

# config/.yaml

# --- General Settings ---
project_name: "ml"
aws_region: "eu-central-1"
# env_suffix will likely be passed dynamically or set per deployment environment

# --- Data Paths (Templates - Execution specific paths often constructed) ---
# Base paths defined here, execution IDs/dates appended by scripts/workflows
s3_processed_edf_path: "s3://{processed_bucket}/processed_edf_data/"
s3_raw_weather_fcst_path: "s3://{raw_bucket}/edf-inputs/weather-forecast/"
s3_raw_calendar_path: "s3://{raw_bucket}/edf-inputs/calendar-topology/"
s3_feature_output_base: "s3://{processed_bucket}/features/edf/{workflow_type}/{sfn_name}/{exec_id}/" # workflow_type=training/inference
s3_model_artifact_base: "s3://{processed_bucket}/model-artifacts/{sfn_name}/{exec_id}/"
s3_eval_report_base: "s3://{processed_bucket}/evaluation-output/{sfn_name}/{exec_id}/"
s3_forecast_output_base: "s3://{processed_bucket}/forecast-output/{sfn_name}/{exec_id}/"

# --- AWS Resource Names (Base names - suffix added in Terraform locals) ---
scripts_bucket_base: "ml-glue-scripts" # Base name for script bucket
processed_bucket_base: "ml-processed-data" # Base name for processed bucket
raw_bucket_base: "ml-raw-data" # Base name for raw bucket
edf_feature_group_name_base: "edf-building-features"
ecr_repo_name_edf_base: "edf-training-container"
edf_model_package_group_name_base: "EDFBuildingDemandForecaster"
lambda_register_edf_func_base: "RegisterEDFModelFunction"
lambda_load_forecasts_func_base: "LoadEDFResultsLambda"
lambda_get_model_func_base: "GetApprovedModelLambda" # Shared Lambda
lambda_create_sm_model_func_base: "CreateSageMakerModelLambda" # Shared Lambda
edf_training_sfn_base: "EDFTrainingWorkflow"
edf_inference_sfn_base: "EDFInferenceWorkflow"
edf_scheduler_base: "DailyEDFInferenceTrigger"
forecast_db_base: "EDFDatabase" # Timestream DB base name
forecast_table_name: "BuildingDemandForecasts" # Timestream table name

# --- Feature Engineering (Common & EDF Specific) ---
common_feature_eng:
  lookback_days_default: 14 # Default days history needed

edf_feature_eng:
  target_column: "consumption_kwh"
  timestamp_column: "timestamp_hour"
  building_id_column: "building_id"
  time_features: # Features derived from timestamp
    - "hour_of_day"
    - "day_of_week"
    - "day_of_month"
    - "month_of_year"
    - "is_weekend" # Example custom flag
  lag_features: # Lag values in hours
    consumption_kwh: [24, 48, 168] # 1d, 2d, 1wk
    temperature_c: [24, 48, 168]
    solar_irradiance_ghi: [24]
  rolling_window_features: # Window size in hours, aggregations
    consumption_kwh:
      windows: [3, 24, 168]
      aggs: ["avg", "stddev", "min", "max"]
    temperature_c:
      windows: [24]
      aggs: ["avg"]
  imputation_value: 0.0 # Value used for fillna after lags/windows

# --- Training Workflow ---
edf_training:
  default_strategy: "Prophet" # Model strategy to use if not specified
  instance_type: "ml.m5.xlarge" # Larger instance for potentially heavier training
  instance_count: 1
  max_runtime_seconds: 7200 # 2 hours
  # Base hyperparameters (can be overridden by execution input)
  hyperparameters:
    Prophet:
      prophet_changepoint_prior_scale: 0.05
      prophet_seasonality_prior_scale: 10.0
      prophet_holidays_prior_scale: 10.0
      prophet_daily_seasonality: True
      prophet_weekly_seasonality: True
      prophet_yearly_seasonality: 'auto'
      # prophet_regressors: ["temperature_c", "is_holiday_flag"] # Example if using regressors
    XGBoost:
      xgb_eta: 0.1
      xgb_max_depth: 5
      xgb_num_boost_round: 150
      xgb_subsample: 0.7
      xgb_colsample_bytree: 0.7
      # feature_columns must align with feature_engineering output for XGBoost
      feature_columns_string: "temperature_c,solar_irradiance_ghi,humidity,is_holiday_flag,hour_of_day,day_of_week,day_of_month,month_of_year,consumption_lag_24h,consumption_lag_168h,consumption_roll_avg_24h"

# --- Evaluation Workflow ---
edf_evaluation:
  instance_type: "ml.m5.large"
  instance_count: 1
  # Metrics thresholds for the 'CheckEvaluation' choice state
  metrics_thresholds:
    max_mape: 20.0 # Example: Fail if MAPE > 20%
    max_rmse: 5.0  # Example: Fail if RMSE > 5 kWh (adjust based on typical consumption)
  # Optional: Path to historical labelled data for backtesting
  # historical_labels_path: "s3://..."

# --- Inference Workflow ---
edf_inference:
  scheduler_expression: "cron(0 5 * * ? *)" # 5 AM UTC Daily
  scheduler_timezone: "UTC"
  forecast_horizon_hours: 72
  # Processing job instance types (can override training defaults if needed)
  feature_eng_instance_type: "ml.m5.large"
  feature_eng_instance_count: 1
  forecast_gen_instance_type: "ml.m5.large" # Needs forecasting libs installed
  forecast_gen_instance_count: 1
  # Target DB Config
  forecast_db_type: "TIMESTREAM" # TIMESTREAM | RDS | DYNAMODB
  # Lambda Config
  load_forecasts_lambda_memory: 512 # MB
  load_forecasts_lambda_timeout: 300 # seconds

# --- Common Lambda Config ---
# Assuming shared Lambdas from AD are used
lambda_shared:
   get_model_memory: 128
   get_model_timeout: 30
   create_sm_model_memory: 128
   create_sm_model_timeout: 60

Data Schemas

This appendix provides the formal schema definitions for the primary data entities used across the Anomaly Detection and Energy Demand Forecasting workflows.

1. Raw Meter Data

This represents the logical structure of data as it arrives from the central database into the S3 Raw Zone for processing. It’s often in a semi-structured format like JSON Lines.

Field Name

Data Type

Description

timestamp_str

String

ISO 8601 formatted timestamp (e.g., “2024-10-27T10:30:05Z”) of when the readings were recorded by the tablet.

building_id

String

Unique identifier for the building (e.g., “bldg_A123”).

apartment_id

String

Unique identifier for the apartment (e.g., “apt_404”).

readings

Array[Object]

An array of sensor reading objects from the apartment.

readings.sensor_type

String

The type of measurement (e.g., heating_energy_kwh, room_temp_c).

readings.value

Double/Int

The numerical value of the sensor reading.

readings.room_name

String

(Optional) The specific room for the reading, if applicable (e.g., “living_room”).

Example (JSON Lines):

{"timestamp_str": "2024-10-27T10:30:00Z", "building_id": "bldg_A123", "apartment_id": "apt_404", "readings": [{"sensor_type": "heating_energy_kwh", "value": 15432.7}, {"sensor_type": "hot_water_litres", "value": 89541.2}, {"sensor_type": "room_temp_c", "room_name": "living_room", "value": 21.5}, {"sensor_type": "setpoint_temp_c", "room_name": "living_room", "value": 22.0}]}

2. Processed Meter Data (for Anomaly Detection)

This is the output of the initial Ingestion Glue ETL job, stored in the S3 Processed Zone. It’s a flattened, structured table optimized for analytical queries and as the source for feature engineering.

Format: Apache Parquet Partitioned by: year, month, day

Column Name

Data Type

Description

apartment_id

String

Unique identifier for the apartment.

building_id

String

Unique identifier for the building.

event_ts

Timestamp

The timestamp of the reading, cast to a proper timestamp type.

heating_energy_kwh

Double

The cumulative heating energy consumption in kilowatt-hours.

hot_water_litres

Double

The cumulative hot water consumption in litres.

room_temp_c

Double

The measured temperature in Celsius for a specific room (or average).

setpoint_temp_c

Double

The user-defined target temperature in Celsius for a specific room.

outdoor_temp_c

Double

The outdoor temperature at the time of the reading, joined from weather data.

year

Integer

Partition Key: Year derived from event_ts.

month

Integer

Partition Key: Month derived from event_ts.

day

Integer

Partition Key: Day derived from event_ts.


3. Weather Data

3.1 Raw Weather Forecast Data (from API)

This is the raw JSON structure ingested from the external weather forecast API into the S3 Raw Zone.

Field Name

Data Type

Description

latitude

Double

Latitude of the forecast location.

longitude

Double

Longitude of the forecast location.

generationtime_ms

Double

Time taken by the API to generate the forecast.

utc_offset_seconds

Integer

UTC offset for the location.

hourly

Object

An object containing arrays of hourly forecast values.

hourly.time

Array[String]

Array of ISO 8601 timestamps for the forecast horizon.

hourly.temperature_2m

Array[Double]

Array of corresponding forecasted temperatures (°C).

hourly.cloudcover

Array[Integer]

Array of corresponding forecasted cloud cover (%).

hourly.shortwave_radiation

Array[Double]

Array of corresponding forecasted solar irradiance (W/m²).

3.2 Processed Weather Data (Joined in processed_edf_data)

This represents the clean, hourly weather data after being processed and joined to the consumption data.

Column Name

Data Type

Description

building_id

String

Unique identifier for the building the weather corresponds to.

timestamp_hour

Timestamp

The specific hour for which the weather data is valid, truncated.

temperature_c

Double

The average temperature in Celsius for that hour.

humidity

Double

The average relative humidity (%) for that hour.

solar_irradiance_ghi

Double

The average Global Horizontal Irradiance (W/m²) for that hour.

is_holiday_flag

Integer

A binary flag (1 or 0) indicating if the date is a public holiday.


4. Feature Store Features (Anomaly Detection)

This defines the schema of the ad-apartment-features Feature Group in SageMaker Feature Store. These are the inputs to the AD model.

Feature Name

Data Type

Description

apartment_record_id

String

Record Identifier: Unique ID for the record (e.g., [apartment_id]_[date]).

event_time

Fractional

Event Time: Timestamp when the features were computed/ingested.

event_date

String

The specific date (YYYY-MM-DD) these daily features correspond to.

building_id

String

Identifier for the building.

avg_temp_diff

Fractional

The average difference between the setpoint and actual room temperature for the day.

daily_energy_kwh

Fractional

The total heating energy consumed on that day.

hdd

Fractional

Heating Degree Days, a measure of how cold the day was.

energy_lag_1d

Fractional

The value of daily_energy_kwh from the previous day.

energy_roll_avg_7d

Fractional

The 7-day rolling average of daily_energy_kwh.

temp_diff_roll_std_3d

Fractional

The 3-day rolling standard deviation of avg_temp_diff.


5. Alert Table Schema (DynamoDB)

This defines the structure of the ad-alerts table in Amazon DynamoDB, where the inference pipeline stores actionable alerts.

Table Name: hometech-ml-ad-alerts-[env]

Attribute Name

Data Type

Key Type / Index

Description

AlertID

String (S)

Partition Key (PK)

Unique identifier for the alert. Format: [ApartmentID]#[EventDate].

ApartmentID

String (S)

GSI-1 Partition Key

The unique identifier of the apartment that triggered the alert.

BuildingID

String (S)

GSI-2 Partition Key

The unique identifier of the building.

EventDate

String (S)

-

The date (YYYY-MM-DD) for which the anomaly was detected.

AlertTimestamp

String (S)

GSI-2 Sort Key

ISO 8601 timestamp of when the alert was created by the pipeline.

AnomalyScore

Number (N)

-

The raw numerical score from the ML model. Higher means more anomalous.

Threshold

Number (N)

-

The score threshold that was breached to create this alert.

Status

String (S)

GSI-1 Sort Key

The current state of the alert. Values: Unseen, Investigating, Resolved-True Positive, Resolved-False Positive.

ModelVersion

String (S)

-

Version of the model package that generated the score (for lineage).

FeedbackNotes

String (S)

-

(Optional) Notes entered by the maintenance technician during review.

Global Secondary Indexes (GSIs):

  • GSI-1 (ApartmentStatusIndex): Allows efficiently querying for alerts of a specific Status within a given ApartmentID.

    • Partition Key: ApartmentID

    • Sort Key: Status

  • GSI-2 (BuildingAlertsIndex): Allows efficiently querying for all alerts in a BuildingID, sorted by time.

    • Partition Key: BuildingID

    • Sort Key: AlertTimestamp