Skip to content

Commit 247eb72

Browse files
Alexandru OrmenisanAlexandru Ormenisan
Alexandru Ormenisan
authored and
Alexandru Ormenisan
committed
fixes
1 parent b4976ec commit 247eb72

File tree

3 files changed

+94
-53
lines changed

3 files changed

+94
-53
lines changed

docs/user_guides/fs/provenance/provenance.md

+1-53
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction
44

5-
Hopsworks feature store allows users to track provenance (lineage) between storage connectors, feature groups, feature views, training datasets and models. Tracking lineage allows users to determine where/if a feature group is being used. You can track if feature groups are being used to create additional (derived) feature groups or feature views.
5+
Hopsworks feature store allows users to track provenance (lineage) between storage connectors, feature groups, feature views, training datasets and models. Tracking lineage allows users to determine where/if a feature group is being used. You can track if feature groups are being used to create additional (derived) feature groups or feature views, or to train models.
66

77
You can interact with the provenance graph using the UI and the APIs.
88

@@ -262,55 +262,3 @@ In the feature view overview UI you can explore the provenance graph of the feat
262262
<figcaption>Feature view provenance graph</figcaption>
263263
</figure>
264264
</p>
265-
266-
## Step 3: Model lineage
267-
268-
The relationship between feature views and models is captured automatically when you create a model. You can inspect the relationship between feature views and models using the APIs or the UI.
269-
=== "Python"
270-
271-
```python
272-
lineage = model.get_feature_view_provenance()
273-
274-
# List all accessible parent feature views
275-
lineage.accessible
276-
277-
# List all deleted parent feature views
278-
lineage.deleted
279-
280-
# List all the inaccessible parent feature views
281-
lineage.inaccessible
282-
```
283-
284-
You can also retrieve the training dataset provenance object.
285-
=== "Python"
286-
287-
```python
288-
lineage = model.get_training_dataset_provenance()
289-
290-
# List all accessible parent training datasets
291-
lineage.accessible
292-
293-
# List all deleted parent training datasets
294-
lineage.deleted
295-
296-
# List all the inaccessible parent training datasets
297-
lineage.inaccessible
298-
```
299-
300-
You can also retrieve directly the parent feature view object, without the need to extract them from the provenance links object
301-
=== "Python"
302-
303-
```python
304-
feature_view = model.get_feature_view()
305-
```
306-
This utility method also has the options to initialize the required components for batch or online retrieval of feature vectors.
307-
=== "Python"
308-
309-
```python
310-
model.get_feature_view(init: bool = True, online: Optional[bool]: None)
311-
```
312-
313-
By default, the base init for feature vector retrieval is enabled. In case you have a workflow that requires more particular options, you can disable this base init by setting the `init` to `false`.
314-
The method detects if it is running within a deployment and will initialize the feature vector retrieval for the serving.
315-
If the `online` argument is provided and `true` it will initialize for online feature vector retrieval.
316-
If the `online` argument is provided and `false` it will initialize the feature vector retrieval for batch scoring.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Provenance
2+
3+
## Introduction
4+
5+
Hopsworks feature store allows users to track provenance (lineage) between storage connectors, feature groups, feature views, training datasets and models. Tracking lineage allows users to determine where/if a feature group is being used. You can track if feature groups are being used to create additional (derived) feature groups or feature views, or to train models.
6+
7+
You can interact with the provenance graph using the UI and the APIs.
8+
9+
## Model provenance
10+
11+
The relationship between feature views and models is captured when you create a model. If you do not provide at least the feature view object to the constructor, the provenance will not capture this relation and you will not be able to navigate from model to the feature view it used or from the feature view to the models that were created from it.
12+
13+
You can provide the feature view object and have the training dataset version be inferred.
14+
=== "Python"
15+
```python
16+
# this object will be provided to the model constructor
17+
feature_view = hsfs.get_feature_view(...)
18+
19+
# when calling this method, the training dataset version is cached in the feature view and is implicitly provided to the model constructor
20+
X_train, X_test, y_train, y_test = feature_view.train_test_split(...)
21+
22+
# provide the feature_view object in the model constructor
23+
hsml.model_registry.ModelRegistry.python.create_model(..., feature_view = feature_view)
24+
```
25+
26+
You can of course explicitly provide the training dataset version.
27+
=== "Python"
28+
```python
29+
# this object will be provided to the model constructor
30+
feature_view = hsfs.get_feature_view(...)
31+
32+
# this training dataset version will be provided to the model constructor
33+
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(training_dataset_version=1)
34+
35+
# provide the feature_view object in the model constructor
36+
hsml.model_registry.ModelRegistry.python.create_model(..., feature_view = feature_view, training_dataset_version = training_dataset_version)
37+
```
38+
39+
Once the relation is stored in the provenance graph, you can navigate the graph from model to feature view and the other way around.
40+
41+
Users can call the provenance method which will return a Link object containing the parent feature view in either the `accessible`, `deleted` or `inaccessible` list.
42+
* If the user has access to both the model and the feature view (including shared featurestores), the feature view will be present in the `accessible` list.
43+
* If the user had access to the feature view at some point, through a shared feature store, it used it to generate the model, but after that the sharing feature store access was restricted, the relation is still maintained in the provenance, but the user only has access to limited metadata for the feature view and the provenanance method with return it in the `inaccessible` list.
44+
* If the feature view was deleted after the model creation, the provenance will retain the relation, with a minimum amount of metadata for the feature view and provenance method will return the feature view in the `deleted` list.
45+
46+
=== "Python"
47+
```python
48+
lineage = model.get_feature_view_provenance()
49+
50+
# List accessible parent feature view
51+
lineage.accessible
52+
53+
# List deleted parent feature view
54+
lineage.deleted
55+
56+
# List inaccessible parent feature view
57+
lineage.inaccessible
58+
```
59+
60+
You can also retrieve the training dataset provenance object.
61+
=== "Python"
62+
63+
```python
64+
lineage = model.get_training_dataset_provenance()
65+
66+
# List accessible parent training dataset
67+
lineage.accessible
68+
69+
# List deleted parent training dataset
70+
lineage.deleted
71+
72+
# List inaccessible parent training dataset
73+
lineage.inaccessible
74+
```
75+
76+
You can also retrieve directly the parent feature view object, without the need to extract them from the provenance links object
77+
=== "Python"
78+
79+
```python
80+
feature_view = model.get_feature_view()
81+
```
82+
This utility method also has the options to initialize the required components for batch or online retrieval of feature vectors.
83+
=== "Python"
84+
85+
```python
86+
model.get_feature_view(init: bool = True, online: Optional[bool]: None)
87+
```
88+
89+
By default, the base init for feature vector retrieval is enabled. In case you have a workflow that requires more particular options, you can disable this base init by setting the `init` to `false`.
90+
The method detects if it is running within a deployment and will initialize the feature vector retrieval for the serving.
91+
If the `online` argument is provided and `true` it will initialize for online feature vector retrieval.
92+
If the `online` argument is provided and `false` it will initialize the feature vector retrieval for batch scoring.

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ nav:
195195
- API Protocol: user_guides/mlops/serving/api-protocol.md
196196
- Troubleshooting: user_guides/mlops/serving/troubleshooting.md
197197
- Vector Database: user_guides/mlops/vector_database/index.md
198+
- Provenance: user_guides/mlops/provenance/provenance.md
198199
- Migration:
199200
- 3.X to 4.0: user_guides/migration/40_migration.md
200201
- Setup and Administration:

0 commit comments

Comments
 (0)