-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON report aggregate tool #91
Conversation
Thank you for working on this. I probably won't be able to take a detailed look for a few days. Based on the description I'd say you're good to merge. Leaving it up to you if you want to wait for a full review or if you want to merge early. |
Before merging should we rename this script from |
Maybe |
reccmp/tools/denoise.py
Outdated
parser.add_argument( | ||
"--diff", type=Path, metavar="<files>", nargs="+", help="Report files to diff." | ||
) | ||
parser.add_argument( | ||
"--output", | ||
"-o", | ||
type=Path, | ||
metavar="<file>", | ||
help="Where to save the aggregate file.", | ||
) | ||
parser.add_argument( | ||
"--samples", | ||
type=Path, | ||
metavar="<files>", | ||
nargs="+", | ||
help="Report files to aggregate.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If neither of --diff
and --samples
is passed, then the program will exit silently.
I think this is an error condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! See what you think now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I'd just have checked the args
namespace.
The main
reccmp-reccmp
script will save the report of the comparison run to a JSON file if you use the--json
command line option. The report contains data for each orig. address with the then-current entity name, accuracy score, and flags to mark a function stub and effective match. The HTML display has the same data (also JSON format), but with unified diff output for each address included.This update adds the
reccmp-denoise
script to combine multiple JSON reports into a single "noise reduced" report. (Resolves #88.) Provide the list of report files you want to aggregate in the--samples
option, and the output file with--output
.Our strategy to create the aggregate is this: for each orig. address in any of the sample reports, use the highest accuracy score from any report. The hierarchy is:
To clarify, what we mean by "noise" is: variations in the generated assembly in in functions not directly impacted by the code change under review.
This update also adds some rigidity for importing and exporting JSON data. This has been working fine for a while, but without any set schema. The only file format in the field is version 1, and this is marked in the JSON file to protect against possible future versions.
We now have the internal
ReccmpStatusReport
class and somepydantic
schemas to serialize and deserialize the JSON report files. We could now restructure the data into a new schema version and this framework should make the conversion stable.reccmp-denoise
also allows you to diff two different report files. This is not possible withreccmp-reccmp
because you can only diff the saved report against the current comparison run. If you specify--diff
and--samples
in the same run, we will diff the first file in the--diff
list against the newly generated aggregate report.