Skip to content

Commit 683b084

Browse files
committed
Set train-test-split shuffle=False as default and remove stratification
1 parent fb3d408 commit 683b084

File tree

3 files changed

+4
-28
lines changed

3 files changed

+4
-28
lines changed

docs/freqai-parameter-table.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,7 @@ Mandatory parameters are marked as **Required** and have to be set in one of the
2727
| `weight_factor` | Weight training data points according to their recency (see details [here](freqai-feature-engineering.md#weighting-features-for-temporal-importance)). <br> **Datatype:** Positive float (typically < 1).
2828
| `indicator_max_period_candles` | **No longer used (#7325)**. Replaced by `startup_candle_count` which is set in the [strategy](freqai-configuration.md#building-a-freqai-strategy). `startup_candle_count` is timeframe independent and defines the maximum *period* used in `populate_any_indicators()` for indicator creation. `FreqAI` uses this parameter together with the maximum timeframe in `include_time_frames` to calculate how many data points to download such that the first data point does not include a NaN <br> **Datatype:** Positive integer.
2929
| `indicator_periods_candles` | Time periods to calculate indicators for. The indicators are added to the base indicator dataset. <br> **Datatype:** List of positive integers.
30-
| `stratify_training_data` | Split the feature set into training and testing datasets. For example, `stratify_training_data: 2` would set every 2nd data point into a separate dataset to be pulled from during training/testing. See details about how it works [here](freqai-running.md#data-stratification-for-training-and-testing-the-model). <br> **Datatype:** Positive integer.
31-
| `principal_component_analysis` | Automatically reduce the dimensionality of the data set using Principal Component Analysis. See details about how it works [here](#reducing-data-dimensionality-with-principal-component-analysis) <br> **Datatype:** Boolean. defaults to `false`.
30+
| `principal_component_analysis` | Automatically reduce the dimensionality of the data set using Principal Component Analysis. See details about how it works [here](#reducing-data-dimensionality-with-principal-component-analysis) <br> **Datatype:** Boolean. defaults to `False`.
3231
| `plot_feature_importances` | Create a feature importance plot for each model for the top/bottom `plot_feature_importances` number of features.<br> **Datatype:** Integer, defaults to `0`.
3332
| `DI_threshold` | Activates the use of the Dissimilarity Index for outlier detection when set to > 0. See details about how it works [here](freqai-feature-engineering.md#identifying-outliers-with-the-dissimilarity-index-di). <br> **Datatype:** Positive float (typically < 1).
3433
| `use_SVM_to_remove_outliers` | Train a support vector machine to detect and remove outliers from the training dataset, as well as from incoming data points. See details about how it works [here](freqai-feature-engineering.md#identifying-outliers-using-a-support-vector-machine-svm). <br> **Datatype:** Boolean.

docs/freqai-running.md

-17
Original file line numberDiff line numberDiff line change
@@ -105,23 +105,6 @@ During dry/live mode, FreqAI trains each coin pair sequentially (on separate thr
105105

106106
In the presented example config, the user will only allow predictions on models that are less than 1/2 hours old.
107107

108-
## Data stratification for training and testing the model
109-
110-
You can stratify (group) the training/testing data using:
111-
112-
```json
113-
"freqai": {
114-
"feature_parameters" : {
115-
"stratify_training_data": 3
116-
}
117-
}
118-
```
119-
120-
This will split the data chronologically so that every Xth data point is used to test the model after training. In the example above, the user is asking for every third data point in the dataframe to be used for
121-
testing; the other points are used for training.
122-
123-
The test data is used to evaluate the performance of the model after training. If the test score is high, the model is able to capture the behavior of the data well. If the test score is low, either the model does not capture the complexity of the data, the test data is significantly different from the train data, or a different type of model should be used.
124-
125108
## Controlling the model learning process
126109

127110
Model training parameters are unique to the selected machine learning library. FreqAI allows you to set any parameter for any library using the `model_training_parameters` dictionary in the config. The example config (found in `config_examples/config_freqai.example.json`) shows some of the example parameters associated with `Catboost` and `LightGBM`, but you can add any parameters available in those libraries or any other machine learning library you choose to implement.

freqtrade/freqai/data_kitchen.py

+3-9
Original file line numberDiff line numberDiff line change
@@ -134,20 +134,14 @@ def make_train_test_datasets(
134134
"""
135135
feat_dict = self.freqai_config["feature_parameters"]
136136

137+
shuffle = self.freqai_config.get('data_split_parameters', {}).get('shuffle', False)
138+
137139
weights: npt.ArrayLike
138140
if feat_dict.get("weight_factor", 0) > 0:
139141
weights = self.set_weights_higher_recent(len(filtered_dataframe))
140142
else:
141143
weights = np.ones(len(filtered_dataframe))
142144

143-
if feat_dict.get("stratify_training_data", 0) > 0:
144-
stratification = np.zeros(len(filtered_dataframe))
145-
for i in range(1, len(stratification)):
146-
if i % feat_dict.get("stratify_training_data", 0) == 0:
147-
stratification[i] = 1
148-
else:
149-
stratification = None
150-
151145
if self.freqai_config.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
152146
(
153147
train_features,
@@ -160,7 +154,7 @@ def make_train_test_datasets(
160154
filtered_dataframe[: filtered_dataframe.shape[0]],
161155
labels,
162156
weights,
163-
stratify=stratification,
157+
shuffle=shuffle,
164158
**self.config["freqai"]["data_split_parameters"],
165159
)
166160
else:

0 commit comments

Comments
 (0)