|
| 1 | +# Hopsworks Client |
| 2 | + |
| 3 | +<p align="center"> |
| 4 | + <a href="https://community.hopsworks.ai"><img |
| 5 | + src="https://img.shields.io/discourse/users?label=Hopsworks%20Community&server=https%3A%2F%2Fcommunity.hopsworks.ai" |
| 6 | + alt="Hopsworks Community" |
| 7 | + /></a> |
| 8 | + <a href="https://docs.hopsworks.ai"><img |
| 9 | + src="https://img.shields.io/badge/docs-HOPSWORKS-orange" |
| 10 | + alt="Hopsworks Documentation" |
| 11 | + /></a> |
| 12 | + <a><img |
| 13 | + src="https://img.shields.io/badge/python-3.8+-blue" |
| 14 | + alt="python" |
| 15 | + /></a> |
| 16 | + <a href="https://pypi.org/project/hopsworks/"><img |
| 17 | + src="https://img.shields.io/pypi/v/hopsworks?color=blue" |
| 18 | + alt="PyPiStatus" |
| 19 | + /></a> |
| 20 | + <a href="https://pepy.tech/project/hopsworks/month"><img |
| 21 | + src="https://pepy.tech/badge/hopsworks/month" |
| 22 | + alt="Downloads" |
| 23 | + /></a> |
| 24 | + <a href=https://github.com/astral-sh/ruff><img |
| 25 | + src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" |
| 26 | + alt="Ruff" |
| 27 | + /></a> |
| 28 | + <a><img |
| 29 | + src="https://img.shields.io/pypi/l/hopsworks?color=green" |
| 30 | + alt="License" |
| 31 | + /></a> |
| 32 | +</p> |
| 33 | + |
| 34 | +*hopsworks* is the python API for interacting with a Hopsworks cluster. Don't have a Hopsworks cluster just yet? Register an account on [Hopsworks Serverless](https://app.hopsworks.ai/) and get started for free. Once connected to your project, you can: |
| 35 | + - Insert dataframes into the online or offline Store, create training datasets or *serve real-time* feature vectors in the Feature Store via the [Feature Store API](https://github.com/logicalclocks/feature-store-api). Already have data somewhere you want to import, checkout our [Storage Connectors](https://docs.hopsworks.ai/latest/user_guides/fs/storage_connector/) documentation. |
| 36 | + - register ML models in the model registry and *deploy* them via model serving via the [Machine Learning API](https://gitub.com/logicalclocks/machine-learning-api). |
| 37 | + - manage environments, executions, kafka topics and more once you deploy your own Hopsworks cluster, either on-prem or in the cloud. Hopsworks is open-source and has its own [Community Edition](https://github.com/logicalclocks/hopsworks). |
| 38 | + |
| 39 | +Our [tutorials](https://github.com/logicalclocks/hopsworks-tutorials) cover a wide range of use cases and example of what *you* can build using Hopsworks. |
| 40 | + |
| 41 | +## Getting Started On Hopsworks |
| 42 | + |
| 43 | +Once you created a project on [Hopsworks Serverless](https://app.hopsworks.ai) and created a new [Api Key](https://docs.hopsworks.ai/latest/user_guides/projects/api_key/create_api_key/), just use your favourite virtualenv and package manager to install the library: |
| 44 | + |
| 45 | +```bash |
| 46 | +pip install hopsworks |
| 47 | +``` |
| 48 | + |
| 49 | +Fire up a notebook and connect to your project, you will be prompted to enter your newly created API key: |
| 50 | +```python |
| 51 | +import hopsworks |
| 52 | + |
| 53 | +project = hopsworks.login() |
| 54 | +``` |
| 55 | + |
| 56 | +Access the Feature Store of your project to use as a central repository for your feature data. Use *your* favourite data engineering library (pandas, polars, Spark, etc...) to insert data into the Feature Store, create training datasets or serve real-time feature vectors. Want to predict likelyhood of e-scooter accidents in real-time? Here's how you can do it: |
| 57 | + |
| 58 | +```python |
| 59 | +fs = project.get_feature_store() |
| 60 | + |
| 61 | +# Write to Feature Groups |
| 62 | +bike_ride_fg = fs.get_or_create_feature_group( |
| 63 | + name="bike_rides", |
| 64 | + version=1, |
| 65 | + primary_key=["ride_id"], |
| 66 | + event_time="activation_time", |
| 67 | + online_enabled=True, |
| 68 | +) |
| 69 | + |
| 70 | +fg.insert(bike_rides_df) |
| 71 | + |
| 72 | +# Read from Feature Views |
| 73 | +profile_fg = fs.get_feature_group("user_profile", version=1) |
| 74 | + |
| 75 | +bike_ride_fv = fs.get_or_create_feature_view( |
| 76 | + name="bike_rides_view", |
| 77 | + version=1, |
| 78 | + query=bike_ride_fg.select_except(["ride_id"]).join(profile_fg.select(["age", "has_license"]), on="user_id") |
| 79 | +) |
| 80 | + |
| 81 | +bike_rides_Q1_2021_df = bike_ride_fv.get_batch_data( |
| 82 | + start_date="2021-01-01", |
| 83 | + end_date="2021-01-31" |
| 84 | +) |
| 85 | + |
| 86 | +# Create a training dataset |
| 87 | +version, job = bike_ride_fv.create_train_test_split( |
| 88 | + test_size=0.2, |
| 89 | + description='Description of a dataset', |
| 90 | + # you can have different data formats such as csv, tsv, tfrecord, parquet and others |
| 91 | + data_format='csv' |
| 92 | +) |
| 93 | + |
| 94 | +# Predict the probability of accident in real-time using new data + context data |
| 95 | +bike_ride_fv.init_serving() |
| 96 | + |
| 97 | +while True: |
| 98 | + new_ride_vector = poll_ride_queue() |
| 99 | + feature_vector = bike_ride_fv.get_online_feature_vector( |
| 100 | + {"user_id": new_ride_vector["user_id"]}, |
| 101 | + passed_features=new_ride_vector |
| 102 | + ) |
| 103 | + accident_probability = model.predict(feature_vector) |
| 104 | +``` |
| 105 | + |
| 106 | +Or you can use the Machine Learning API to register models and deploy them for serving: |
| 107 | +```python |
| 108 | +mr = project.get_model_registry() |
| 109 | +# or |
| 110 | +ms = project.get_model_serving() |
| 111 | +``` |
| 112 | + |
| 113 | +## Tutorials |
| 114 | + |
| 115 | +Need more inspiration or want to learn more about the Hopsworks platform? Check out our [tutorials](https://github.com/logicalclocks/hopsworks-tutorials). |
| 116 | + |
| 117 | +## Documentation |
| 118 | + |
| 119 | +Documentation is available at [Hopsworks Documentation](https://docs.hopsworks.ai/). |
| 120 | + |
| 121 | +## Issues |
| 122 | + |
| 123 | +For general questions about the usage of Hopsworks and the Feature Store please open a topic on [Hopsworks Community](https://community.hopsworks.ai/). |
| 124 | + |
| 125 | +Please report any issue using [Github issue tracking](https://github.com/logicalclocks/hopsworks-api/issues). |
| 126 | + |
| 127 | +## Contributing |
| 128 | + |
| 129 | +If you would like to contribute to this library, please see the [Contribution Guidelines](CONTRIBUTING.md). |
| 130 | + |
0 commit comments