Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A missing column is not raised as a missing value #329

Open
Swati-Dash opened this issue Jan 29, 2025 · 5 comments
Open

A missing column is not raised as a missing value #329

Swati-Dash opened this issue Jan 29, 2025 · 5 comments

Comments

@Swati-Dash
Copy link

For example in the article-4-direction-area dataset a blank permitted-development-rights field is flagged as a missing-value error but if the column is missing it isn't

@Ben-Hodgkiss Ben-Hodgkiss moved this from Backlog to Sprint Backlog in Infrastructure Jan 29, 2025
@Ben-Hodgkiss Ben-Hodgkiss moved this from Sprint Backlog to Backlog in Infrastructure Jan 29, 2025
@Ben-Hodgkiss
Copy link
Contributor

Thanks for raising @Swati-Dash - I've popped in our backlog to be prioritised accordingly. I'll keep in touch about when I think we might have resource available once I've chatted when @eveleighoj is back.

@Ben-Hodgkiss Ben-Hodgkiss moved this from Backlog to Analysis, Research & Design in Infrastructure Feb 10, 2025
@Ben-Hodgkiss
Copy link
Contributor

Discussed this with @eveleighoj. Currently the column field log shows which columns exist within the data. This can be compared against the spec.

It would be difficult to raise an "issue" against this as issues are set against particular data points, and so enacting this change would result in (potentially) millions of issues being raised which may cause confusion. As a first step, we should look at what the column field log can do via a manual process, or we could bring columns that aren't mapped into the column field log with a new field that shows whether they are in/out of the data provided.

Action to discuss further with @Swati-Dash and @greg-slater once SD returns from leave.

@Swati-Dash
Copy link
Author

aware of the column field log and manual process of identifying missing columns. Problem is if a LPA has not provided a column the dashboard doesn't flag this ( it shows live/green) so they would not go and add that field. We are asking LPAs to provide data iteratively, how do we help them to see columns are missing on the dashboard? if this can be done via query, its great too.

I want to understand why it will raise millions of issues, is it because this will be raised for each resource?

@eveleighoj
Copy link
Contributor

So in terms of how it displays on the dashboard I think there are wider considerations and you'll need to choose the right approach e.g. a couple of approaches (not saying either are correct):

  • you could look at the most recent resource and see if the column-field log has it in
  • you could look at all files provided for that endpoint and see if they've ever given a column
  • you could look and their entities and see if any of them have ever provided that column
  • you could write an expectation that checks if their entities have that column and fail if not and display this

In in terms of how issues would increase. If a resource was missing a column (and a lot of them are) then for every entry (row) in the resource you would get another issue. so if a resource had 1000 rows and 3 issues at the moment then if we added this in and it was missing a column it would have 1003 issues. If it was missing 2 it would have 2003 issues. You can imagine how this gets massive.

In terms of is we need more information in the column-field-log we could potentially add more however we would need to check how everyone is using it right now. We could:

  • Add in fields who aren't matched to columns (so field might be geometry and column would be blank)
  • Add in columns who aren't matched to fields (so column might be LPA_GEOM_POLYGON and field would be blank)

this wouldn't add too many rows but would add a few. I imagine there could be a fair chunk of value though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Analysis, Research & Design
Development

No branches or pull requests

3 participants