You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a column contains unused categories, then the scatter plot colors and legend are wrong. Calling df.cat.remove_unused_categories() resolve the issue, and I wonder whether jscatter should call it as well?
See the code to reproduce:
importpandasaspdimportjscatterimportnumpyasnpdefkeep_largest_categories(column: pd.Series, threshold: int) ->pd.Series:
"""Keep on the categories with a values count larger than a threshold"""# 0. Make a copy of the columncolumn=column.copy()
# 1. Calculate the value counts for the specified columncategory_counts=column.value_counts()
# 2. Identify categories with counts below the thresholdlow_count_categories=category_counts[category_counts<threshold].index# 3. Create a boolean mask to identify rows where the category is in low_count_categoriesmask=column.isin(low_count_categories)
# 4. Use the mask to set values in the specified column to NaNcolumn[mask] =Nonereturncolumnn=50categories= [f"cat_{i}"foriinrange(300)]
categories=pd.Categorical(categories)
df=pd.DataFrame({
"x": np.random.rand(n),
"y": np.random.rand(n),
"cat": np.random.choice(categories, size=n),
})
df["cat"] =df["cat"].astype("category")
# "cat" contains 300 categories# now we only keep from cat_0 to cat_10df["cat"] =keep_largest_categories(df["cat"], 2)
# if you dont remove the unused categories then the color and legend will be wrong# df["cat"] = df["cat"].cat.remove_unused_categories()scatter=jscatter.Scatter(
data=df,
x="x",
y="y",
color_by="cat",
size=10,
legend=True,
tooltip=True,
tooltip_properties=["cat"],
tooltip_size="large",
height=200,
width=500,
legend_size="large",
)
scatter.show()
without df.cat.remove_unused_categories()
with df.cat.remove_unused_categories()
The text was updated successfully, but these errors were encountered:
When a column contains unused categories, then the scatter plot colors and legend are wrong. Calling
df.cat.remove_unused_categories()
resolve the issue, and I wonder whetherjscatter
should call it as well?See the code to reproduce:
without
df.cat.remove_unused_categories()
with
df.cat.remove_unused_categories()
The text was updated successfully, but these errors were encountered: