Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Census ADRIO returning scrambled results in some cases #127

Closed
JavadocMD opened this issue Jul 5, 2024 · 1 comment · Fixed by #132
Closed

Census ADRIO returning scrambled results in some cases #127

JavadocMD opened this issue Jul 5, 2024 · 1 comment · Fixed by #132
Assignees
Labels
bug Something isn't working

Comments

@JavadocMD
Copy link
Contributor

I noticed that, following the geo scopes integration to the Census ADRIO maker, the "demo/03-counties-GEO" vignette had gone from correct (left, commit f9889e5) to incorrect (right, commit 5d12231) (file history):

Screenshot_20240705_141551

On investigation it became clear that the centroids attribute of the Census ADRIO maker was returning its results in a "scrambled" order (counties were being assigned the centroid of a different county).

Resolving this issue involves several sub-tasks:

  • Fix this attribute,
  • Investigate the other attributes in Census to see if they're vulnerable to this same issue and fix those if any,
  • Write an integration test (as a devlog notebook) that tests (for a small number of nodes, e.g., five counties in Arizona) that the results provided by the ADRIOs are not only the correct type and shape, but that the data values themselves are correct (and in the correct order).
@JavadocMD JavadocMD added the bug Something isn't working label Jul 5, 2024
@JavadocMD
Copy link
Contributor Author

This code snippet demonstrates the issue; "Method 1" duplicates the approach taken by the Census ADRIO maker currently.

from pandas import DataFrame

from epymorph.geography.us_census import CountyScope
from epymorph.geography.us_tiger import get_counties_geo

# Scope
scope = CountyScope.in_states_by_code(["AZ", "NM", "CO", "UT"], year=2020)
# Geo data
gdf = get_counties_geo(2020)
gdf.rename(columns={'GEOID': 'geoid'}, inplace=True)

# Method 1 (current approach): not sorted
df1_raw = gdf[gdf['geoid'].isin(scope.get_node_ids())]
df1 = DataFrame({
    'geoid': df1_raw['geoid'],
    'centroid': df1_raw['geometry'].apply(lambda x: x.centroid.coords[0]),
})
display(df1['centroid'][0:4])

# Method 2 (possible fix): merge into sort
df2_raw = DataFrame({'geoid': scope.get_node_ids()}).merge(gdf, how='left', on='geoid')
df2 = DataFrame({
    'geoid': df2_raw['geoid'],
    'centroid': df2_raw['geometry'].apply(lambda x: x.centroid.coords[0]),
})
display(df2['centroid'][0:4])

Result:

2      (-104.41195788796497, 34.34241351170314)
22       (-106.281555561785, 38.08055275484528)
24    (-111.24450979723022, 41.632253425796634)
30    (-105.74144255453645, 32.613144676719934)
Name: centroid, dtype: object

0    (-109.48884962242164, 35.395528796753005)
1     (-109.75126313669315, 31.87963708628258)
2     (-111.77052095590304, 35.83872482945673)
3      (-110.8116368639038, 33.79970236516231)
Name: centroid, dtype: object

@TJohnsonAZ TJohnsonAZ linked a pull request Jul 11, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants