1. Feature Store Components

Minh Pham / April 03, 2024

2 min read

Key Components

Five key components of a Feature Store

FS Simplified Architecture

1. Feature Engineering (Transformations)#

Feature Transformation

  • Feature engineering phase should offers multiple transformation techniques like select, filter, aggregate, and manipulate raw data into reusable features for ML models
  • Different data sources present unique challenges:
    • Streaming sources: Dealing with continuous data ingestion and processing
    • Batch sources: Dealing with a large amount of static data, ingestion happens regularly or on-demand
    • Variety in data formats: CSV files, S3 bucket, Parquet files…

2. Feature Storage#

Feature Storage

  • The feature storage layer can be seen as a dual-database system. On one side, it stores historical data with a focus on columnar retrieval. On the other, it features a row-oriented retrieval system focusing on low-latency data lookup.
  • The offline store is the backbone for training and batch predictions. It is often time-based and is appended to rather than rewritten. Must be cost-efficient for storing large amounts of data.
  • The online store stores only the latest feature vectors for a specific feature set entity. It is engineered for speed and responsiveness. Needed in real-time cases like: Fraud detection systems or real-time personalization in digital platforms.

3. Feature Registry#

Feature Storage

  • This can be seen as Centralized repository for all features within the feature store
  • Stores metadata of feature (set, type, definition…)
  • Feature registry must also manage access control (privilege to access to a set of features and type of access granted)

4. Feature Serving#

Feature Serving

  • Enables DS/AI engineers and Real-time Models to interact with Feature Storage, retrieving historical features for training or fetching the latest feature vector for a specific entity
  • The feature serving client relies on an API or SDK to perform a specific action (like fetching features)

5. Feature Monitoring#

Feature Monitoring

This is a step to ensure ongoing quality, help detecting changes in the features data, cover the following aspects:

  • Data quality: ensuring data anomalies stay within defined error limits
  • Data drift: The statistical distribution of data over time
  • Serving performance: Throughput, serving latency, and requests per second
  • Training-Serving Skew: The consistency between the conditions during model training and real-time serving.

References#

© 2024 Minh Pham. Inspired by Takuya Matsuyama.