-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: miscellaneous in anonymizer.py #826
refactor: miscellaneous in anonymizer.py #826
Conversation
* (refactor): make methods suposed to be static actually staticmethods; * (fix): inappropriate signatures for several methods; * (refacotr): naming issues;
WalkthroughThe Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on X ? TipsChat with CodeRabbit Bot (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #826 +/- ##
==========================================
+ Coverage 85.11% 85.14% +0.02%
==========================================
Files 88 88
Lines 3809 3816 +7
==========================================
+ Hits 3242 3249 +7
Misses 567 567 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
# create a copy of the dataframe head | ||
df_head = self.head().copy() | ||
df_head = df.head().copy() | ||
|
||
# for each column, check if it contains personal or sensitive information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When anonymizing data, ensure that the conversion of column values to strings (str(df_head[col].iloc[0])
) is robust enough to handle non-string data types without causing unexpected behavior or errors.
Consider using vectorized operations instead of apply
for better performance when anonymizing columns in the dataframe.
@nautics889 great catch, merging! |
There was some mess in anonymizer.py, lots of those came from this compound commit (see pandasai/helpers/anonymizer.py there)
This one must make better. I'm not sure
Anonymizer
class even still works actually, btw :)Summary by CodeRabbit
Refactor
New Features
Bug Fixes