Skip to content

Fix unnecessary ValueError in PairPlot: Caused by unrelated duplicated columns not used in vars #3840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

leakyH
Copy link

@leakyH leakyH commented Apr 6, 2025

A simple demo:

import seaborn as sns
import pandas as pd
df = pd.DataFrame(dict(x=rs.normal(size=60),
                           y=rs.randint(0, 4, size=(60)),
                           z=rs.gamma(3, size=60),
                           z2=rs.gamma(6, size=60),
                           }
df_with_dupe = df.copy()
df_with_dupe.columns = ["x", "y", "z", "z"] #sometimes by mistake, or the z/z2 are not important
sns.pairplot(df_with_dupe, vars=['x', 'y'])  # raise ValueError

The Traceback:

> Traceback (most recent call last):
>   File "/data1/home/----/plotReferenceMap.py", line 153, in <module>
>     sns.pairplot(df_merge,
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/axisgrid.py", line 2119, in pairplot
>     grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue,
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/axisgrid.py", line 1251, in __init__
>     numeric_cols = self._find_numeric_cols(data)
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/axisgrid.py", line 1674, in _find_numeric_cols
>     if variable_type(data[col]) == "numeric":
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/_base.py", line 1498, in variable_type
>     vector = pd.Series(vector)
>   File "/data1/home/--/lib/python3.9/site-packages/pandas/core/series.py", line 367, in __init__
>     if is_empty_data(data) and dtype is None:
>   File "/data1/home/--/lib/python3.9/site-packages/pandas/core/construction.py", line 818, in is_empty_data
>     is_simple_empty = is_list_like_without_dtype and not data
>   File "/data1/home/--/lib/python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
>     raise ValueError(
> ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The error happens inside self._find_numeric_cols(data), which is unnecessary when vars is provided. So I skip it and extend to some other similar scenarios:

  1. no duplication in df.columns, but duplications in vars: Gives a simple warning. It just generates unexpected figures but does not crash.
  2. duplication in df.columns, and one of the duplicants is included in vars: raise ValueError in PairGrid Class, specify the related duplicants.
  3. duplication in df.columns, and vars is not provided: raise ValueError in PairGrid Class, specify the all duplicants.

These tests are all included in the test_axisgrid.py

Please let me know if any other modifications are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant