Skip to content

compare.py: Drop "hash" column for average merging #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 12, 2025

Conversation

guy-david
Copy link
Contributor

@guy-david guy-david commented Jun 10, 2025

I thought about asserting that all hashes are the same or pick the one with the best metric, but it doesn't seem useful because this property is for internal usage.

This is the only non-numeric column and otherwise an exception is raised:

Traceback (most recent call last):
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1824, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1885, in _python_apply_general
    values, mutated = self._grouper.apply_groupwise(f, data, self.axis)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py", line 919, in apply_groupwise
    res = f(group)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/frame.py", line 11700, in mean
    result = super().mean(axis, skipna, numeric_only, **kwargs)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12439, in mean
    return self._stat_function(
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12396, in _stat_function
    return self._reduce(
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/frame.py", line 11569, in _reduce
    res = df._mgr.reduce(blk_func)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1500, in reduce
    nbs = blk.reduce(func)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 406, in reduce
    result = func(self.values)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/frame.py", line 11488, in blk_func
    return op(values, axis=axis, skipna=skipna, **kwds)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 147, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 404, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 720, in nanmean
    the_sum = _ensure_numeric(the_sum)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 1686, in _ensure_numeric
    raise TypeError(f"Could not convert {x} to numeric")
TypeError: Could not convert ['ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370'] to numeric

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/guyda/third-party/test-suite/utils/compare.py", line 520, in <module>
    main()
  File "/home/guyda/third-party/test-suite/utils/compare.py", line 422, in main
    lhs_merged = lhs_d.groupby(level=1).apply(config.merge_function)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1846, in apply
    return self._python_apply_general(f, self._obj_with_exclusions)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1885, in _python_apply_general
    values, mutated = self._grouper.apply_groupwise(f, data, self.axis)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py", line 919, in apply_groupwise
    res = f(group)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/frame.py", line 11700, in mean
    result = super().mean(axis, skipna, numeric_only, **kwargs)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12439, in mean
    return self._stat_function(
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12396, in _stat_function
    return self._reduce(
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/frame.py", line 11569, in _reduce
    res = df._mgr.reduce(blk_func)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1500, in reduce
    nbs = blk.reduce(func)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 406, in reduce
    result = func(self.values)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/frame.py", line 11488, in blk_func
    return op(values, axis=axis, skipna=skipna, **kwds)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 147, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 404, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 720, in nanmean
    the_sum = _ensure_numeric(the_sum)
  File "/home/guyda/.venv/lib/python3.9/site-packages/pandas/core/nanops.py", line 1686, in _ensure_numeric
    raise TypeError(f"Could not convert {x} to numeric")
TypeError: Could not convert ['ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370ead8a4d533ec1912ee3d86ace532e370'] to numeric

This is the only non-numeric column and otherwise an exception is raised.
@guy-david guy-david requested a review from MatzeB June 10, 2025 13:28
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Would be nice if there was a unit test, although I think we don't have any tests for compare.py at the moment

@guy-david guy-david merged commit b6566dd into main Jun 12, 2025
1 check passed
@guy-david guy-david deleted the users/guy-david/drop-irreducible-columns-for-mean branch June 12, 2025 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants