bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline #149

jitingxu1 · 2024-09-04T22:37:51Z

In this competition, y column cannot be converted to numpy array.

~~I could run this on my local machine, but not on kaggle notebook.~~

～～**I could reproduce this on my local.**～～

local env

Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 10:07:17) [Clang 14.0.6 ]
scikit-learn version: 1.5.1
skorch version: 1.0.0
torch version: 2.4.0
ibis-framework version: 9.3.0

kaggle env

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
scikit-learn version: 1.2.2
skorch version: 1.0.0
torch version: 2.4.0+cpu
ibis-framework version: 9.3.0

# Wrap the PyTorch model with skorch
net = NeuralNetClassifier(
    MyModel,
    module__input_dim=635,  # Specify the input dimension
    max_epochs=1,
    lr=0.001,
    batch_size=32,
    optimizer=optim.Adam,
    criterion=nn.BCELoss,
    iterator_train__shuffle=True,
    callbacks=[
        EarlyStopping(monitor='valid_loss', patience=25, load_best=True),  # Early stopping
        LRScheduler(policy='ReduceLROnPlateau', monitor='valid_loss', factor=0.1, patience=25, min_lr=1e-6)
    ],
    verbose=1
)

# Define the sklearn pipeline with preprocessing and PyTorch model
pipeline = Pipeline([
    ('ibisml-prep', recipe),  # Preprocessing step in IbisML
    ('model', net)  # The PyTorch model wrapped as NeuralNetClassifier via skorch
])

pipeline.fit(X_train, y_train)

log

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 pipeline.fit(X_train, y_train)

File /opt/conda/lib/python3.10/site-packages/sklearn/pipeline.py:405, in Pipeline.fit(self, X, y, **fit_params)
    403     if self._final_estimator != "passthrough":
    404         fit_params_last_step = fit_params_steps[self.steps[-1][0]]
--> 405         self._final_estimator.fit(Xt, y, **fit_params_last_step)
    407 return self

File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:165, in NeuralNetClassifier.fit(self, X, y, **fit_params)
    154 """See ``NeuralNet.fit``.
    155 
    156 In contrast to ``NeuralNet.fit``, ``y`` is non-optional to
   (...)
    160 
    161 """
    162 # pylint: disable=useless-super-delegation
    163 # this is actually a pylint bug:
    164 # https://github.com/PyCQA/pylint/issues/1085
--> 165 return super(NeuralNetClassifier, self).fit(X, y, **fit_params)

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1319, in NeuralNet.fit(self, X, y, **fit_params)
   1316 if not self.warm_start or not self.initialized_:
   1317     self.initialize()
-> 1319 self.partial_fit(X, y, **fit_params)
   1320 return self

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1278, in NeuralNet.partial_fit(self, X, y, classes, **fit_params)
   1276 self.notify('on_train_begin', X=X, y=y)
   1277 try:
-> 1278     self.fit_loop(X, y, **fit_params)
   1279 except KeyboardInterrupt:
   1280     pass

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1172, in NeuralNet.fit_loop(self, X, y, epochs, **fit_params)
   1136 def fit_loop(self, X, y=None, epochs=None, **fit_params):
   1137     """The proper fit loop.
   1138 
   1139     Contains the logic of what actually happens during the fit
   (...)
   1170 
   1171     """
-> 1172     self.check_data(X, y)
   1173     self.check_training_readiness()
   1174     epochs = epochs if epochs is not None else self.max_epochs

File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:141, in NeuralNetClassifier.check_data(self, X, y)
    137         pass
    139 if y is not None:
    140     # pylint: disable=attribute-defined-outside-init
--> 141     self.classes_inferred_ = np.unique(to_numpy(y))

File /opt/conda/lib/python3.10/site-packages/skorch/utils.py:152, in to_numpy(X)
    149     return np.asarray(X)
    151 if not is_torch_data_type(X):
--> 152     raise TypeError("Cannot convert this data type to a numpy array.")

The text was updated successfully, but these errors were encountered:

zy662 · 2025-04-11T04:42:38Z

Use the following code to fit the model：

import ibis_ml as ml
import ibis.expr.datatypes as dt
# Create data frames for the two sets:
train_data, test_data = ml.train_test_split(
    flight_data,
    unique_key=["carrier", "flight", "date"],
    # Put 3/4 of the data into the training set
    test_size=0.25,
    num_buckets=4,
    # Fix the random numbers by setting the seed
    # This enables the analysis to be reproducible when random numbers are used
    random_seed=222,
)
X_train = train_data.drop("arr_delay")
y_train = train_data.arr_delay.cast(dt.int64)

X_test = test_data.drop("arr_delay")
y_test = test_data.arr_delay.cast(dt.int64)

last_mile_preprocessing = ml.Recipe(
    ml.ExpandDate("date", components=["dow", "month"]),
    ml.Drop("date"),
    ml.TargetEncode(ml.nominal()),
    ml.DropZeroVariance(ml.everything()),
    ml.MutateAt("dep_time", ibis._.hour() * 60 + ibis._.minute()),
    ml.MutateAt(ml.timestamp(), ibis._.epoch_seconds()),
    # By default, PyTorch requires that the type of `X` is `np.float32`.
    # https://discuss.pytorch.org/t/mat1-and-mat2-must-have-the-same-dtype-but-got-double-and-float/197555/2
    ml.Cast(ml.numeric(), "float32"),
)
# train preprocessing recipe using training dataset
last_mile_preprocessing.fit(X_train, y_train)

# transform train and test dataset using IbisML recipe
X_train_transformed = last_mile_preprocessing.transform(X_train)
X_test_transformed = last_mile_preprocessing.transform(X_test)

pipe = Pipeline([("flights_rec", last_mile_preprocessing), ("net", net)])
pipe.fit(X_train_transformed, y_train)
pipe.score(X_test_transformed, y_test)

reference: https://ibis-project.org/posts/ibisml/

github-project-automation bot added this to Ibis planning and roadmap Sep 4, 2024

github-project-automation bot moved this to backlog in Ibis planning and roadmap Sep 4, 2024

jitingxu1 changed the title ~~bug:~~ bug: cannot convert y to numpy on kaggle notebook in sklearn pipeline Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline #149

bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline #149

jitingxu1 commented Sep 4, 2024 •

edited

Loading

zy662 commented Apr 11, 2025 •

edited

Loading

bug: cannot convert y to numpy on kaggle notebook in sklearn pipeline #149

bug: cannot convert y to numpy on kaggle notebook in sklearn pipeline #149

Comments

jitingxu1 commented Sep 4, 2024 • edited Loading

zy662 commented Apr 11, 2025 • edited Loading

bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline #149

bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline #149

jitingxu1 commented Sep 4, 2024 •

edited

Loading

zy662 commented Apr 11, 2025 •

edited

Loading