Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON report aggregate tool #91

Merged
merged 5 commits into from
Mar 4, 2025
Merged

Conversation

disinvite
Copy link
Collaborator

The main reccmp-reccmp script will save the report of the comparison run to a JSON file if you use the --json command line option. The report contains data for each orig. address with the then-current entity name, accuracy score, and flags to mark a function stub and effective match. The HTML display has the same data (also JSON format), but with unified diff output for each address included.

This update adds the reccmp-denoise script to combine multiple JSON reports into a single "noise reduced" report. (Resolves #88.) Provide the list of report files you want to aggregate in the --samples option, and the output file with --output.

Our strategy to create the aggregate is this: for each orig. address in any of the sample reports, use the highest accuracy score from any report. The hierarchy is:

  • 100% match
  • Any effective match
  • Any accuracy score that is not an effective match
  • Any stub

To clarify, what we mean by "noise" is: variations in the generated assembly in in functions not directly impacted by the code change under review.

This update also adds some rigidity for importing and exporting JSON data. This has been working fine for a while, but without any set schema. The only file format in the field is version 1, and this is marked in the JSON file to protect against possible future versions.

We now have the internal ReccmpStatusReport class and some pydantic schemas to serialize and deserialize the JSON report files. We could now restructure the data into a new schema version and this framework should make the conversion stable.

reccmp-denoise also allows you to diff two different report files. This is not possible with reccmp-reccmp because you can only diff the saved report against the current comparison run. If you specify --diff and --samples in the same run, we will diff the first file in the --diff list against the newly generated aggregate report.

@jonschz
Copy link
Collaborator

jonschz commented Feb 26, 2025

Thank you for working on this. I probably won't be able to take a detailed look for a few days. Based on the description I'd say you're good to merge. Leaving it up to you if you want to wait for a full review or if you want to merge early.

@disinvite
Copy link
Collaborator Author

Before merging should we rename this script from reccmp-denoise to reccmp-report or something along those lines? There are more things we can do with these files than just aggregating them. For example: if the JSON export has diffs included, we could generate the HTML report directly from that.

@jonschz
Copy link
Collaborator

jonschz commented Mar 2, 2025

Maybe reccmp-aggregate? -report is also fine with me.

Comment on lines 56 to 72
parser.add_argument(
"--diff", type=Path, metavar="<files>", nargs="+", help="Report files to diff."
)
parser.add_argument(
"--output",
"-o",
type=Path,
metavar="<file>",
help="Where to save the aggregate file.",
)
parser.add_argument(
"--samples",
type=Path,
metavar="<files>",
nargs="+",
help="Report files to aggregate.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If neither of --diff and --samples is passed, then the program will exit silently.
I think this is an error condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! See what you think now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'd just have checked the args namespace.

@disinvite disinvite merged commit 4350c24 into isledecomp:master Mar 4, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for diffing with multiple files
4 participants