Feast Components¶
¶
Feast is a modular system composed of several key components that work together to provide a comprehensive feature store solution.
1. Feast Registry: The Central Catalog of Feature Definitions¶
Definition: A central catalog that stores all applied Feast object definitions (e.g.,
FeatureView
,Entity
,DataSource
,FeatureService
,Permission
) and their related metadata.Purpose:
Enables discovery, search, and collaboration on features.
Serves as the “source of truth” for what features exist and how they are defined.
Accessed by various Feast components (SDK, Feature Server) to understand the feature landscape.
Implementations:
File-based (default): Stores the registry as a serialized Protocol Buffer file on a local filesystem or cloud object store (GCS, S3).
SQL-based: Uses a SQL database as the backend for the registry.
Updating the Registry:
Typically done via
feast apply
command, often integrated into CI/CD pipelines.Best practice is to store feature definitions in a version-controlled repository (e.g., Git) and sync changes to the registry automatically.
Multiple registries can exist, often corresponding to different environments (dev, staging, prod), with stricter write access for production registries.
Accessing the Registry:
Programmatically: Define
RepoConfig
withRegistryConfig
in Python code when instantiatingFeatureStore
.from feast import FeatureStore, RepoConfig from feast.repo_config import RegistryConfig repo_config = RepoConfig( registry=RegistryConfig(path="gs://your-bucket/registry.pb"), project="my_project", provider="gcp", # ... other store configs ) store = FeatureStore(config=repo_config)
Via
feature_store.yaml
: Specify theregistry
path in thefeature_store.yaml
file.project: my_project provider: gcp registry: s3://your-bucket/registry.pb # ... other store configs
Then:
store = FeatureStore(repo_path=".")
2. Offline Store: The Home for Historical Feature Data¶
Definition: An interface (
OfflineStore
) for Feast to interact with historical, time-series feature data. This data typically resides in your existing data warehouses or data lakes.Purpose:
Building Training Datasets: Feast queries the offline store to generate point-in-time correct datasets for model training using
get_historical_features()
.Materializing Features: Serves as the source from which features are loaded into the Online Store.
How it Works: Feast doesn’t own the storage for the offline data but defines how to query it. It leverages the compute engine of the underlying system (e.g., BigQuery’s query engine, Spark).
Implementations:
BigQueryOfflineStore
,SnowflakeOfflineStore
,RedshiftOfflineStore
,FileOfflineStore
(for local Parquet/CSV files), etc.Configuration: Defined in
feature_store.yaml
. Only one offline store can be active at a time.Compatibility: Not all offline stores are compatible with all data source types (e.g.,
BigQueryOfflineStore
cannot directly query local files defined as aFileSource
).Writing to Offline Store: Feast can be configured to log served features or directly push features (via Push Sources) to the offline store, enabling a feedback loop or archiving.
3. Online Store: Low-Latency Access to Fresh Features¶
Definition: A database optimized for low-latency reads, storing only the latest feature values for each entity key.
Purpose: To serve features quickly for real-time model inference.
Data Population:
Materialization: The
feast materialize
orfeast materialize_incremental
commands load data from the Offline Store into the Online Store.Push Sources: Features can be written directly to the online store from streaming sources or other real-time processes using the
push
API.
Schema: The storage schema in the online store mirrors the feature definitions but only holds the most recent value per entity. No historical values are kept.
Implementations: Redis, DynamoDB, Datastore, SQLite (for local testing), etc.
Configuration: Defined in
feature_store.yaml
.
4. Feature Server: The API Gateway for Online Features¶
Definition: A REST API server (built with FastAPI) that provides low-latency access to features from the Online Store and handles feature pushes.
Motivation:
Simplifies real-time feature retrieval for client applications (e.g., model serving systems).
Provides endpoints for pushing data, ensuring freshness.
Standardizes interaction via HTTP/JSON.
Designed for scalability and secure communication (TLS).
Architecture: A stateless service that interacts with the Online Store (for data) and the Registry (for metadata/definitions).
Key Endpoints:
/get-online-features
: Retrieves latest feature values for specified entities./push
: Pushes feature data to the online and/or offline store./materialize
,/materialize-incremental
: Triggers materialization jobs./retrieve-online-documents
: [Alpha] Supports vector similarity search for RAG./docs
: OpenAPI documentation for the server.
Deployment: Can be run locally (
feast serve
), via Docker, or on Kubernetes (Helm charts provided).
5. Batch Materialization Engine: Moving Data from Offline to Online¶
Definition: The component responsible for executing the process of loading data from the Offline Store into the Online Store (materialization).
Purpose: Abstracts the underlying technology used for materialization.
Implementations:
LocalMaterializationEngine
(default): Runs materialization as a local, serialized process within the Feast client’s environment. Suitable for smaller datasets or simpler setups.LambdaMaterializationEngine
(AWS): Delegates materialization tasks to AWS Lambda functions for better scalability.Custom Engines: Users can create their own engines (e.g., using Spark, Flink, or other distributed processing frameworks) if built-in ones aren’t sufficient.
Configuration: Specified in
feature_store.yaml
.
6. Provider: Bundling Components for Specific Environments¶
Definition: An implementation of a feature store that orchestrates a specific set of components (offline store, online store, materialization engine, etc.) tailored for a particular environment (e.g., GCP, AWS, local).
Purpose: Simplifies setup by providing sensible default configurations for common cloud stacks while allowing overrides.
Built-in Providers:
local
: Uses local file system for offline store, SQLite for online store, and local materialization.gcp
: Defaults to BigQuery (offline) and Datastore (online).aws
: Defaults might involve S3/Redshift (offline) and DynamoDB (online).
Customization: Users can select a provider (e.g.,
gcp
) but then override specific components (e.g., use Redis as the online store with thegcp
provider).Custom Providers: Users can create fully custom providers to integrate with bespoke infrastructure.
Configuration: Specified in
feature_store.yaml
via theprovider
key.
8. OpenTelemetry Integration: Observability for Your Feature Store¶
Definition: Provides comprehensive monitoring and observability for Feast deployments by integrating with the OpenTelemetry standard.
Motivation: Critical for production ML systems to track performance, gain operational insights, troubleshoot issues, and optimize resources.
Architecture:
OpenTelemetry Collector: Deployed alongside Feast (often in Kubernetes) to receive, process, and export telemetry data (metrics, traces, logs).
Instrumentation: Feast (Python) can be auto-instrumented by OpenTelemetry to collect data.
Exporters: Components within the Collector that send data to monitoring backends (e.g., Prometheus, Jaeger, etc.).
Key Features:
Automated Python instrumentation.
Collection of key metrics: CPU/memory usage, request latencies, feature retrieval stats (e.g.,
feast_feature_server_memory_usage
,feast_feature_server_cpu_usage
).Flexible configuration for data collection and export.
Integration with Prometheus for metrics visualization.
Setup (Typical Kubernetes):
Deploy Prometheus Operator.
Deploy OpenTelemetry Operator (after installing
cert-manager
).Configure an OpenTelemetry Collector instance (often via a
OpenTelemetryCollector
CRD in K8s).Add OpenTelemetry instrumentation annotations to Feast server deployments (e.g.,
instrumentation.opentelemetry.io/inject-python: "true"
).Configure environment variables for the Feast server to point to the collector (e.g.,
OTEL_EXPORTER_OTLP_ENDPOINT
).Enable metrics in Feast Helm chart (
metrics.enabled=true
) and provide collector endpoint.Deploy necessary manifests like
Instrumentation
,ServiceMonitors
, and RBAC rules for OpenTelemetry components.