Feast Components

Feast is a modular system composed of several key components that work together to provide a comprehensive feature store solution.

1. Feast Registry: The Central Catalog of Feature Definitions

  • Definition: A central catalog that stores all applied Feast object definitions (e.g., FeatureView, Entity, DataSource, FeatureService, Permission) and their related metadata.

  • Purpose:

    • Enables discovery, search, and collaboration on features.

    • Serves as the “source of truth” for what features exist and how they are defined.

    • Accessed by various Feast components (SDK, Feature Server) to understand the feature landscape.

  • Implementations:

    • File-based (default): Stores the registry as a serialized Protocol Buffer file on a local filesystem or cloud object store (GCS, S3).

    • SQL-based: Uses a SQL database as the backend for the registry.

  • Updating the Registry:

    • Typically done via feast apply command, often integrated into CI/CD pipelines.

    • Best practice is to store feature definitions in a version-controlled repository (e.g., Git) and sync changes to the registry automatically.

    • Multiple registries can exist, often corresponding to different environments (dev, staging, prod), with stricter write access for production registries.

  • Accessing the Registry:

    • Programmatically: Define RepoConfig with RegistryConfig in Python code when instantiating FeatureStore.

      from feast import FeatureStore, RepoConfig
      from feast.repo_config import RegistryConfig
      
      repo_config = RepoConfig(
          registry=RegistryConfig(path="gs://your-bucket/registry.pb"),
          project="my_project",
          provider="gcp",
          # ... other store configs
      )
      store = FeatureStore(config=repo_config)
      
    • Via feature_store.yaml: Specify the registry path in the feature_store.yaml file.

      project: my_project
      provider: gcp
      registry: s3://your-bucket/registry.pb
      # ... other store configs
      

      Then: store = FeatureStore(repo_path=".")

2. Offline Store: The Home for Historical Feature Data

  • Definition: An interface (OfflineStore) for Feast to interact with historical, time-series feature data. This data typically resides in your existing data warehouses or data lakes.

  • Purpose:

    1. Building Training Datasets: Feast queries the offline store to generate point-in-time correct datasets for model training using get_historical_features().

    2. Materializing Features: Serves as the source from which features are loaded into the Online Store.

  • How it Works: Feast doesn’t own the storage for the offline data but defines how to query it. It leverages the compute engine of the underlying system (e.g., BigQuery’s query engine, Spark).

  • Implementations: BigQueryOfflineStore, SnowflakeOfflineStore, RedshiftOfflineStore, FileOfflineStore (for local Parquet/CSV files), etc.

  • Configuration: Defined in feature_store.yaml. Only one offline store can be active at a time.

  • Compatibility: Not all offline stores are compatible with all data source types (e.g., BigQueryOfflineStore cannot directly query local files defined as a FileSource).

  • Writing to Offline Store: Feast can be configured to log served features or directly push features (via Push Sources) to the offline store, enabling a feedback loop or archiving.

3. Online Store: Low-Latency Access to Fresh Features

  • Definition: A database optimized for low-latency reads, storing only the latest feature values for each entity key.

  • Purpose: To serve features quickly for real-time model inference.

  • Data Population:

    1. Materialization: The feast materialize or feast materialize_incremental commands load data from the Offline Store into the Online Store.

    2. Push Sources: Features can be written directly to the online store from streaming sources or other real-time processes using the push API.

  • Schema: The storage schema in the online store mirrors the feature definitions but only holds the most recent value per entity. No historical values are kept.

  • Implementations: Redis, DynamoDB, Datastore, SQLite (for local testing), etc.

  • Configuration: Defined in feature_store.yaml.

4. Feature Server: The API Gateway for Online Features

  • Definition: A REST API server (built with FastAPI) that provides low-latency access to features from the Online Store and handles feature pushes.

  • Motivation:

    • Simplifies real-time feature retrieval for client applications (e.g., model serving systems).

    • Provides endpoints for pushing data, ensuring freshness.

    • Standardizes interaction via HTTP/JSON.

    • Designed for scalability and secure communication (TLS).

  • Architecture: A stateless service that interacts with the Online Store (for data) and the Registry (for metadata/definitions).

  • Key Endpoints:

    • /get-online-features: Retrieves latest feature values for specified entities.

    • /push: Pushes feature data to the online and/or offline store.

    • /materialize, /materialize-incremental: Triggers materialization jobs.

    • /retrieve-online-documents: [Alpha] Supports vector similarity search for RAG.

    • /docs: OpenAPI documentation for the server.

  • Deployment: Can be run locally (feast serve), via Docker, or on Kubernetes (Helm charts provided).

5. Batch Materialization Engine: Moving Data from Offline to Online

  • Definition: The component responsible for executing the process of loading data from the Offline Store into the Online Store (materialization).

  • Purpose: Abstracts the underlying technology used for materialization.

  • Implementations:

    • LocalMaterializationEngine (default): Runs materialization as a local, serialized process within the Feast client’s environment. Suitable for smaller datasets or simpler setups.

    • LambdaMaterializationEngine (AWS): Delegates materialization tasks to AWS Lambda functions for better scalability.

    • Custom Engines: Users can create their own engines (e.g., using Spark, Flink, or other distributed processing frameworks) if built-in ones aren’t sufficient.

  • Configuration: Specified in feature_store.yaml.

6. Provider: Bundling Components for Specific Environments

  • Definition: An implementation of a feature store that orchestrates a specific set of components (offline store, online store, materialization engine, etc.) tailored for a particular environment (e.g., GCP, AWS, local).

  • Purpose: Simplifies setup by providing sensible default configurations for common cloud stacks while allowing overrides.

  • Built-in Providers:

    • local: Uses local file system for offline store, SQLite for online store, and local materialization.

    • gcp: Defaults to BigQuery (offline) and Datastore (online).

    • aws: Defaults might involve S3/Redshift (offline) and DynamoDB (online).

  • Customization: Users can select a provider (e.g., gcp) but then override specific components (e.g., use Redis as the online store with the gcp provider).

  • Custom Providers: Users can create fully custom providers to integrate with bespoke infrastructure.

  • Configuration: Specified in feature_store.yaml via the provider key.

7. Authorization Manager (AuthManager): Enforcing Permissions

  • Definition: A pluggable component within Feast servers (Feature Server, Registry Server) responsible for handling authorization.

  • Functionality:

    1. Extracts user identity/credentials (e.g., a token) from incoming client requests.

    2. Validates these credentials against an authentication server (Feast itself does not perform authentication; it relies on external systems).

    3. Retrieves user details (like roles) from the validated token or auth server.

    4. Injects these user details into Feast’s Permission framework to enforce access control policies.

  • Supported Implementations (Out-of-the-box):

    • OIDC (OpenID Connect):

      • Client retrieves a JWT token from an OIDC server and sends it as a Bearer Token.

      • Feast server validates the token with the OIDC server and extracts roles (case-sensitive match with Permission roles) and preferred_username from the token.

      • Requires OIDC server to be configured to expose roles in the access token and verify signature/expiry.

    • Kubernetes RBAC:

      • Client uses its Kubernetes service account token as the Bearer Token.

      • Feast server (running in K8s) queries Kubernetes RBAC resources (Roles, RoleBindings) to determine the client’s associated roles.

      • Requires specific K8s RBAC setup: Feast Permission roles must match K8s Role names in the Feast service’s namespace.

  • Configuration: Defined in the auth section of feature_store.yaml.

    • type: no_auth (default if auth section is missing): No authorization is applied.

    • type: oidc: Requires client_id, auth_discovery_url, and for clients, username, password, client_secret.

    • type: kubernetes: No extra parameters usually needed if running within K8s.

  • Consistency: Client and server configurations must be consistent for token injection and validation to work.

8. OpenTelemetry Integration: Observability for Your Feature Store

  • Definition: Provides comprehensive monitoring and observability for Feast deployments by integrating with the OpenTelemetry standard.

  • Motivation: Critical for production ML systems to track performance, gain operational insights, troubleshoot issues, and optimize resources.

  • Architecture:

    • OpenTelemetry Collector: Deployed alongside Feast (often in Kubernetes) to receive, process, and export telemetry data (metrics, traces, logs).

    • Instrumentation: Feast (Python) can be auto-instrumented by OpenTelemetry to collect data.

    • Exporters: Components within the Collector that send data to monitoring backends (e.g., Prometheus, Jaeger, etc.).

  • Key Features:

    • Automated Python instrumentation.

    • Collection of key metrics: CPU/memory usage, request latencies, feature retrieval stats (e.g., feast_feature_server_memory_usage, feast_feature_server_cpu_usage).

    • Flexible configuration for data collection and export.

    • Integration with Prometheus for metrics visualization.

  • Setup (Typical Kubernetes):

    1. Deploy Prometheus Operator.

    2. Deploy OpenTelemetry Operator (after installing cert-manager).

    3. Configure an OpenTelemetry Collector instance (often via a OpenTelemetryCollector CRD in K8s).

    4. Add OpenTelemetry instrumentation annotations to Feast server deployments (e.g., instrumentation.opentelemetry.io/inject-python: "true").

    5. Configure environment variables for the Feast server to point to the collector (e.g., OTEL_EXPORTER_OTLP_ENDPOINT).

    6. Enable metrics in Feast Helm chart (metrics.enabled=true) and provide collector endpoint.

    7. Deploy necessary manifests like Instrumentation, ServiceMonitors, and RBAC rules for OpenTelemetry components.