Skip to content

Commit 96b5849

Browse files
authored
Merge pull request #559 from mindsdb/staging
Release 1.3.0
2 parents 824862c + 31232c3 commit 96b5849

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+2002
-1219
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: Bug report
33
about: Create a report to help us improve
4-
labels:
4+
labels: Bug
55
---
66

77
## Your Environment
@@ -13,3 +13,5 @@ labels:
1313

1414

1515
## How can we replicate it?
16+
* What dataset did you use (link to it please)
17+
* What was the code you ran

.github/ISSUE_TEMPLATE/question.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
name: Question
3+
about: Ask a question
4+
labels: question
5+
---

.github/ISSUE_TEMPLATE/suggestion.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
name: Suggestion
3+
about: Suggest a feature, improvement, doc change, etc.
4+
labels: enhancement
5+
---
6+
7+
8+

CONTRIBUTING.md

+34-9
Original file line numberDiff line numberDiff line change
@@ -10,26 +10,51 @@ We love to receive contributions from the community and hear your opinions! We w
1010
* Submit a bug fix
1111
* Propose new features
1212
* Test Lightwood
13+
* Solve an issue
1314

1415
# Code contributions
1516
In general, we follow the "fork-and-pull" Git workflow.
1617

1718
1. Fork the Lightwood repository
18-
2. Clone the repository
19-
3. Make changes and commit them
20-
4. Push your local branch to your fork
21-
5. Submit a Pull request so that we can review your changes
22-
6. Write a commit message
23-
7. Make sure that the CI tests are GREEN
19+
2. Checkout the `staging` branch, this is the development version that gets released weekly
20+
4. Make changes and commit them
21+
5. Make sure that the CI tests pass
22+
6. Submit a Pull request from your repo to the `staging` branch of mindsdb/lightwood so that we can review your changes
2423

25-
>NOTE: Be sure to merge the latest from "upstream" before making a pull request!
24+
> You will need to sign a CLI agreement for the code since lightwood is under a GPL license
25+
> Be sure to merge the latest from `staging` before making a pull request!
26+
> You can run the test suite locally by running `flake8 .` to check style and `python -m unittest discover tests` to run the automated tests. This doesn't guarantee it will pass remotely since we run on multiple envs, but should work in most cases.
2627
2728
# Feature and Bug reports
2829
We use GitHub issues to track bugs and features. Report them by opening a [new issue](https://github.com/mindsdb/lightwood/issues/new/choose) and fill out all of the required inputs.
2930

3031
# Code review process
3132
The Pull Request reviews are done on a regular basis.
32-
Please, make sure you respond to our feedback/questions.
33+
34+
If your change has a chance to affecting performance we will run our private benchmark suite to validate it.
35+
36+
Please, make sure you respond to our feedback and questions.
3337

3438
# Community
35-
If you have additional questions or you want to chat with MindsDB core team, you can join our community [![Discourse posts](https://img.shields.io/discourse/posts?server=https%3A%2F%2Fcommunity.mindsdb.com%2F)](https://community.mindsdb.com/). To get updates on MindsDB’s latest announcements, releases, and events, [sign up for our newsletter](https://mindsdb.us20.list-manage.com/subscribe/post?u=5174706490c4f461e54869879&id=242786942a).
39+
If you have additional questions or you want to chat with MindsDB core team, you can join our community slack.
40+
41+
# Setting up a dev environment
42+
43+
- Clone lightwood
44+
- `cd lightwood && pip install requirements.txt`
45+
- Add it to your python path (e.g. by adding `export PYTHONPATH='/where/you/cloned/lightwood:$PYTHONPATH` as a newline at the end of your `~/.bashrc` file)
46+
- Check that the unittest are passing by going into the directory where you cloned lightwood and running: `python -m unittest discover tests`
47+
48+
> If `python` default to python2.x on your environment use `python3` and `pip3` instead
49+
50+
## Setting up a vscode environment
51+
52+
Currently, the prefred environment for working with lightwood is vscode, it's a very popular python IDE. Any IDE should however work, while we don't have guides for those please use the following as a template.
53+
54+
* Install and enable setting sync using github account (if you use multiple machines)
55+
* Install pylance (for types) and make sure to disable pyright
56+
* Go to `Python > Lint: Enabled` and disable everything *but* flake8
57+
* Set `python.linting.flake8Path` to the full path to flake8 (which flake8)
58+
* Set `Python › Formatting: Provider` to autopep8
59+
* Add `--global-config=<path_to>/lightwood/.flake8` and `--experimental` to `Python › Formatting: Autopep8 Args`
60+
* Install live share and live share whiteboard

dev/README.md

-11
This file was deleted.

dev/requirements.txt

-2
This file was deleted.

lightwood/__about__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
__title__ = 'lightwood'
22
__package_name__ = 'lightwood'
3-
__version__ = '1.2.0'
3+
__version__ = '1.3.0'
44
__description__ = "Lightwood is a toolkit for automatic machine learning model building"
55
__email__ = "community@mindsdb.com"
66
__author__ = 'MindsDB Inc'

lightwood/analysis/__init__.py

+10-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,12 @@
1-
from lightwood.analysis.model_analyzer import model_analyzer
1+
# Base
2+
from lightwood.analysis.analyze import model_analyzer
23
from lightwood.analysis.explain import explain
34

4-
__all__ = ['model_analyzer', 'explain']
5+
# Blocks
6+
from lightwood.analysis.base import BaseAnalysisBlock
7+
from lightwood.analysis.nc.calibrate import ICP
8+
from lightwood.analysis.helpers.acc_stats import AccStats
9+
from lightwood.analysis.helpers.feature_importance import GlobalFeatureImportance
10+
11+
12+
__all__ = ['model_analyzer', 'explain', 'ICP', 'AccStats', 'GlobalFeatureImportance', 'BaseAnalysisBlock']

lightwood/analysis/analyze.py

+96
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
from typing import Dict, List, Tuple, Optional
2+
3+
from lightwood.api import dtype
4+
from lightwood.ensemble import BaseEnsemble
5+
from lightwood.analysis.base import BaseAnalysisBlock
6+
from lightwood.data.encoded_ds import EncodedDs
7+
from lightwood.encoder.text.pretrained import PretrainedLangEncoder
8+
from lightwood.api.types import ModelAnalysis, StatisticalAnalysis, TimeseriesSettings
9+
10+
11+
def model_analyzer(
12+
predictor: BaseEnsemble,
13+
data: EncodedDs,
14+
train_data: EncodedDs,
15+
stats_info: StatisticalAnalysis,
16+
target: str,
17+
ts_cfg: TimeseriesSettings,
18+
dtype_dict: Dict[str, str],
19+
accuracy_functions,
20+
analysis_blocks: Optional[List[BaseAnalysisBlock]] = []
21+
) -> Tuple[ModelAnalysis, Dict[str, object]]:
22+
"""
23+
Analyses model on a validation subset to evaluate accuracy, estimate feature importance and generate a
24+
calibration model to estimating confidence in future predictions.
25+
26+
Additionally, any user-specified analysis blocks (see class `BaseAnalysisBlock`) are also called here.
27+
28+
:return:
29+
runtime_analyzer: This dictionary object gets populated in a sequential fashion with data generated from
30+
any `.analyze()` block call. This dictionary object is stored in the predictor itself, and used when
31+
calling the `.explain()` method of all analysis blocks when generating predictions.
32+
33+
model_analysis: `ModelAnalysis` object that contains core analysis metrics, not necessarily needed when predicting.
34+
"""
35+
36+
runtime_analyzer = {}
37+
data_type = dtype_dict[target]
38+
39+
# retrieve encoded data representations
40+
encoded_train_data = train_data
41+
encoded_val_data = data
42+
data = encoded_val_data.data_frame
43+
input_cols = list([col for col in data.columns if col != target])
44+
45+
# predictive task
46+
is_numerical = data_type in (dtype.integer, dtype.float, dtype.array, dtype.tsarray, dtype.quantity)
47+
is_classification = data_type in (dtype.categorical, dtype.binary)
48+
is_multi_ts = ts_cfg.is_timeseries and ts_cfg.nr_predictions > 1
49+
has_pretrained_text_enc = any([isinstance(enc, PretrainedLangEncoder)
50+
for enc in encoded_train_data.encoders.values()])
51+
52+
# raw predictions for validation dataset
53+
normal_predictions = predictor(encoded_val_data) if not is_classification else predictor(encoded_val_data,
54+
predict_proba=True)
55+
normal_predictions = normal_predictions.set_index(data.index)
56+
57+
# ------------------------- #
58+
# Run analysis blocks, both core and user-defined
59+
# ------------------------- #
60+
kwargs = {
61+
'predictor': predictor,
62+
'target': target,
63+
'input_cols': input_cols,
64+
'dtype_dict': dtype_dict,
65+
'normal_predictions': normal_predictions,
66+
'data': data,
67+
'train_data': train_data,
68+
'encoded_val_data': encoded_val_data,
69+
'is_classification': is_classification,
70+
'is_numerical': is_numerical,
71+
'is_multi_ts': is_multi_ts,
72+
'stats_info': stats_info,
73+
'ts_cfg': ts_cfg,
74+
'accuracy_functions': accuracy_functions,
75+
'has_pretrained_text_enc': has_pretrained_text_enc
76+
}
77+
78+
for block in analysis_blocks:
79+
runtime_analyzer = block.analyze(runtime_analyzer, **kwargs)
80+
81+
# ------------------------- #
82+
# Populate ModelAnalysis object
83+
# ------------------------- #
84+
model_analysis = ModelAnalysis(
85+
accuracies=runtime_analyzer['score_dict'],
86+
accuracy_histogram=runtime_analyzer['acc_histogram'],
87+
accuracy_samples=runtime_analyzer['acc_samples'],
88+
train_sample_size=len(encoded_train_data),
89+
test_sample_size=len(encoded_val_data),
90+
confusion_matrix=runtime_analyzer['cm'],
91+
column_importances=runtime_analyzer['column_importances'],
92+
histograms=stats_info.histograms,
93+
dtypes=dtype_dict
94+
)
95+
96+
return model_analysis, runtime_analyzer

lightwood/analysis/base.py

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
from typing import Tuple, Dict, Optional
2+
3+
import pandas as pd
4+
from lightwood.helpers.log import log
5+
6+
7+
class BaseAnalysisBlock:
8+
"""Class to be inherited by any analysis/explainer block."""
9+
def __init__(self,
10+
deps: Optional[Tuple] = ()
11+
):
12+
13+
self.dependencies = deps # can be parallelized when there are no dependencies @TODO enforce
14+
15+
def analyze(self, info: Dict[str, object], **kwargs) -> Dict[str, object]:
16+
"""
17+
This method should be called once during the analysis phase, or not called at all.
18+
It computes any information that the block may either output to the model analysis object,
19+
or use at inference time when `.explain()` is called (in this case, make sure all needed
20+
objects are added to the runtime analyzer so that `.explain()` can access them).
21+
22+
:param info: Dictionary where any new information or objects are added. The next analysis block will use
23+
the output of the previous block as a starting point.
24+
:param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction
25+
pipeline.
26+
"""
27+
log.info(f"{self.__class__.__name__}.analyze() has not been implemented, no modifications will be done to the model analysis.") # noqa
28+
return info
29+
30+
def explain(self,
31+
row_insights: pd.DataFrame,
32+
global_insights: Dict[str, object], **kwargs) -> Tuple[pd.DataFrame, Dict[str, object]]:
33+
"""
34+
This method should be called once during the explaining phase at inference time, or not called at all.
35+
Additional explanations can be at an instance level (row-wise) or global.
36+
For the former, return a data frame with any new insights. For the latter, a dictionary is required.
37+
38+
:param row_insights: dataframe with previously computed row-level explanations.
39+
:param global_insights: dict() with any explanations that concern all predicted instances or the model itself.
40+
41+
:returns:
42+
- row_insights: modified input dataframe with any new row insights added here.
43+
- global_insights: dict() with any explanations that concern all predicted instances or the model itself.
44+
"""
45+
log.info(f"{self.__class__.__name__}.explain() has not been implemented, no modifications will be done to the data insights.") # noqa
46+
return row_insights, global_insights

0 commit comments

Comments
 (0)