You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/user_guides/fs/feature_group/data_types.md
+69-13
Original file line number
Diff line number
Diff line change
@@ -120,7 +120,7 @@ When a feature is being used as a primary key, certain types are not allowed.
120
120
Examples of such types are *FLOAT*, *DOUBLE*, *TEXT* and *BLOB*.
121
121
Additionally, the size of the sum of the primary key online data types storage requirements **should not exceed 4KB**.
122
122
123
-
#### Online restrictions for row size
123
+
#### Online restrictions for row size
124
124
125
125
The online feature store supports **up to 500 columns** and all column types combined **should not exceed 30000 Bytes**.
126
126
The byte size of each column is determined by its data type and calculated as follows:
@@ -145,18 +145,74 @@ The byte size of each column is determined by its data type and calculated as fo
145
145
146
146
147
147
#### Pre-insert schema validation for online feature groups
148
-
149
-
The input dataframe can be validated for schema as per the valid online schema data types before online ingestion. The most important checks are mentioned below along with possible corrective actions. It is enabled by setting the keyword argument `validation_options={'run_validation':True}` in the `insert()` API of feature groups.
150
-
151
-
152
-
153
-
| Error type | Requirement | Suggested corrections |
| Primary key contains null values | Primary key columns must not contain any null values. For composite keys, all primary key columns are checked for nulls. | Remove the null rows from dataframe. OR impute the null values as applicable. |
156
-
| Primary key column is missing | The dataframe to be inserted must contain all the features defined in the primary key as per the feature group schema. | Add all the primary key columns in the dataframe. |
157
-
| Event time column is missing | The dataframe to be inserted must contain an event time column if it was specified in the schema while feature group creation. | Add the event time column in the dataframe. |
158
-
| String length exceeded | The character length of a string row exceeds the maximum length specified in feature online schema. However, if the feature group is not created and if no explicit schema was provided during feature group creation, then the length will be auto-increased to the maximum length found in a string column. This is handled during the first data ingestion and no user action is needed in this case. **Note:** The maximum row size in bytes should be less than 30000. | Trim the string values to fit within maximum set during feature group creation. OR remove the invalid rows. If the lengths are very long consider changing the feature schema to **TEXT** or **BLOB.**|
159
-
148
+
For online enabled feature groups, the dataframe to be ingested needs to adhere to the online schema definitions. The input dataframe is validated for schema checks accordingly.
149
+
The validation is enabled by setting below property when calling `insert()`
The most important validation checks or error messages are mentioned below along with possible corrective actions.
155
+
156
+
1. Primary key contains null values
157
+
158
+
-**Rule** Primary key column should not contain any null values.
159
+
-**Example correction** Drop the rows containing null primary keys. Alternatively, find the null values and assign them an unique value as per preferred strategy for data imputation.
160
+
161
+
=== "Pandas"
162
+
```python
163
+
# Assuming 'id' is the primary key column
164
+
df = df.dropna(subset=['id'])
165
+
# For composite keys
166
+
df = df.dropna(subset=['id1', 'id2'])
167
+
```
168
+
169
+
2. Primary key column missing
170
+
171
+
-**Rule** The dataframe to be inserted must contain all the columns defined as primary key(s) in the feature group.
172
+
-**Example correction** Add all the primary key columns in the dataframe.
173
+
174
+
==="Pandas"
175
+
```python
176
+
# Add missing primary key column
177
+
df['id'] = some_value
178
+
# If primary key is an auto-incrementing
179
+
df['id'] =range(1, len(df) +1)
180
+
```
181
+
182
+
3. String length exceeded
183
+
184
+
-**Rule** The character length of a string should be within the maximum length capacity in the online schema type of a feature. If the feature group isnot created and explicit feature schema was not provided, the limit will be auto-increased to the maximum length found in a string column in the dataframe.
185
+
-**Example correction**
186
+
Trim the string values to fit within maximum limit set during feature group creation.
The total row size limit should be less than 30kbas per [row size restrictions](#online-restrictions-for-row-size). In such cases it is possible to define the feature as **TEXT** or **BLOB**.
196
+
Below is an example of explicitly defining the string column asTEXTas online type.
0 commit comments