Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 2.17 KB

hopsworks.md

File metadata and controls

24 lines (16 loc) · 2.17 KB

Hopsworks is a modular MLOps platform with:

  • a feature store (available as standalone)
  • model registry and model serving based on KServe
  • vector database based on OpenSearch
  • a data science and data engineering platform

Standalone Feature Store

Hopsworks was the first open-source and first enterprise feature store for ML. You can use Hopsworks as a standalone feature store with the HSFS API.

Model Management

Hopsworks includes support for model management, with model deployments using the KServe framework and a model registry designed for KServe. Hopsworks logs all inference requests to Kafka to enable easy monitoring of deployed models, and provides model metrics with grafana/prometheus.

Vector DB

Hopsworks provides a vector database (or embedding store) based on OpenSearch kNN (FAISS and nmslib). Hopsworks Vector DB includes out-of-the-box support for authentication, access control, filtering, backup-and-restore, and horizontal scalability. Hopsworks' Feature Store and vector DB are often used together to build scalable recommender systems, such as ranking-and-retrieval for real-time recommendations.

Governance

Hopsworks provides a data-mesh architecture for managing ML assets and teams, with multi-tenant projects. Not unlike a GitHub repository, a project is a sandbox containing team members, data, and ML assets. In Hopsworks, all ML assets (features, models, training data) are versioned, taggable, lineage-tracked, and support free-text search. Data can be also be securely shared between projects.

Data Science Platform

You can develop feature engineering, model training and inference pipelines in Hopsworks. There is support for version control (GitHub, GitLab, BitBucket), Jupyter notebooks, a shared distributed file system, many bundled modular project python environments for managing python dependencies without needing to write Dockerfiles, jobs (Python, Spark, Flink), and workflow orchestration with Airflow.