Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate old regional analysis results #957

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

trevorgerhardt
Copy link
Member

@trevorgerhardt trevorgerhardt commented Feb 26, 2025

Generate all final results for old regional analyses with a task so that we can eliminate old code which was required to handle them. See #956 for the R5 code that can be simplified.

Regional analysis results had two changes to how they were stored over the years. However, the last change was several years ago. We still handle the old style of results in multiple locations in code, but if we migrate the database entries and pre-generate all of the regional analysis results we would be able to eliminate those code paths. This will make future changes easier to make.

This should be run on staging and production before #956 is merged.

@Override
public void action(ProgressListener progressListener) throws Exception {
DBObject query = QueryBuilder.start().or(
QueryBuilder.start("travleTimePercentiles").is(null).get(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in field name: travelTimePercentiles

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 🙏, fixed in b85e57c

@abyrd
Copy link
Member

abyrd commented Mar 5, 2025

It does seem like a good simplification to get rid of this fallback code for very old formats. Removing some of these fields means old worker versions that rely on them would break, but these are very old versions that are already effectively deprecated and shouldn't be used.

Just thinking through how this works: In recent years we generate multi-cutoff results files, from which single-cutoff files are generated in multiple formats. The multi-cutoff files are never used directly; they are only used to generate single-cutoff files on demand. For old analyses where multi-cutoff results files do not exist (or are named differently because analyses were limited to one set of destinations), this PR doesn't attempt to generate new-style results files. It just hits all code paths where multi-cutoff results would normally be used to derive single-cutoff results files, hitting the fallback code that works in the absence of new-style results files. Once that's done, it should be possible to remove that fallback code. This is assuming we've exhaustively hit all the ways the functions containing the fallback code can be invoked.

What's the planned way to run this on staging and production before #956 is merged? I guess we would perform one deployment of the patched version, observe the logs, and perform another deployment with the post-956 version soon thereafter if everything looked good?

regionalAnalysis.cutoffsMinutes = cutoffs;
regionalAnalysis.travelTimePercentiles = percentiles;
regionalAnalysis.destinationPointSetIds = destinationPointSetIds;
Persistence.regionalAnalyses.put(regionalAnalysis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#956 mentions a MongoDB migration, and it looks like this PR is intended to perform that migration (via Java code) while generating the missing single-cutoff results files. If so we might want to mention that in #956 for future reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the description of #956 to reference this PR and recommend that this one should be merged and run first.

OpportunityDataset destinations = Persistence.opportunityDatasets.get(destinationPointSetId);
for (int cutoffMinutes : cutoffs) {
for (int percentile : percentiles) {
for (FileStorageFormat format : FileStorageFormat.values()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is iterating through all 12 or so FileStorageFormat values and calling getSingleCutoffGrid with all of them. The getSingleCutoffGrid method only recognizes and handles three of those formats (GRID, PNG, and GEOTIFF) and does not write anything to the output stream for any other cases. So it looks like this might leave the output stream open and try to move an underlying empty file into storage with each of the nine other extensions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use only those three formats in b85e57c

).get();
try (DBCursor<RegionalAnalysis> cursor = Persistence.regionalAnalyses.find(query)) {
while (cursor.hasNext()) {
RegionalAnalysis regionalAnalysis = cursor.next();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add some logging in here so we can verify the effects of the migration when it runs. Or maybe better, enable DEBUG level logging when we run the migration to hit the log statements inside getSingleCutoffGrid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few log statements in c2cc8f4

@trevorgerhardt
Copy link
Member Author

What's the planned way to run this on staging and production before #956 is merged? I guess we would perform one deployment of the patched version, observe the logs, and perform another deployment with the post-956 version soon thereafter if everything looked good?

As of right now, I think this approach would be best. There are changes in the dev branch that have not been deployed from PRs #953 and #954, however those changes are very minor.

If we take that approach, #956 could also remove the code in this PR.

Generate all final results for old regional analyses with a task so that we can eliminate old cold to handle them.
Not all `FileStorageFormat`s are valid single cutoff grid formats.
`getSingleCutoffGrid` can be run more than once on the same inputs. We should perform the database changes after all of the files are generated in case there is an error during one of the runs so that we could just run this again if needed.
@trevorgerhardt trevorgerhardt force-pushed the generate-old-regional-analysis-results branch from bafcfab to 353f2de Compare March 5, 2025 10:02
@trevorgerhardt trevorgerhardt enabled auto-merge (rebase) March 5, 2025 10:02
@trevorgerhardt trevorgerhardt requested a review from abyrd March 5, 2025 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants