Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add response outputs to profile export #478

Merged
merged 3 commits into from
Mar 5, 2024

Conversation

matthewkotila
Copy link
Contributor

@matthewkotila matthewkotila commented Feb 28, 2024

Adds response outputs to the profile export. Here's an example new profile_export.json:

{
    "experiments": [
        {
            "experiment": {
                "mode": "request_rate",
                "value": 1.0
            },
            "requests": [
                {
                    "timestamp": 1709164981471989921,
                    "response_timestamps": [
                        1709164981512336466,
                        1709164981513018359
                    ],
                    "response_outputs": [
                        {"text_output": "<\u0000\u0000\u0000machine learning is a very important part of the business.\n\n"},
                        {"text_output": ""}
                    ]
                },
                {
                    "timestamp": 1709164982471945536,
                    "response_timestamps": [
                        1709164982496197074,
                        1709164982497025684
                    ],
                    "response_outputs": [
                        {"text_output": "=\u0000\u0000\u0000Hello! What is your name?\n\nI'm a student at the University of"},
                        {"text_output": ""}
                    ]
                }
            ],
            "window_boundaries": [
                ...
            ]
        }
    ],
    "version": "0.0.0"
}

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM for short term solution, but a few high level comments.

  1. The implementation of Output() being something that only supports string outputs named "text_output" felt not ideal, but I understand the reasoning for doing it this way in terms of supporting LLM-specific behavior right now.
    • More generically down the line, I might expect the report to include all outputs or something.
  2. The example report you posted looks great! However, it looks like it should also include some information to associate the responses with their requests. Are there plans to include either the corresponding inputs or request_ids etc. to correlate them later? Do we think that would be useful?
  3. Do we have any unit tests for a model that doesn't contain an output named "text_output", just to assert the expected behavior?

I expect some of the other folks may have some feedback/changes requested, so I'll save the approval for now.

NOTE: If you want to respond to any of these, please start a thread by commenting on some arbitrary part of the file and we can discuss in the thread.

@debermudez
Copy link
Contributor

@rmccorm4 the way the json is structured now, we have the responses contained within their respective request. Originally we thought this would be sufficient to enable correlation of 1 request to N responses. Do you think we should have a different approach going forward?

debermudez
debermudez previously approved these changes Feb 29, 2024
@debermudez debermudez dismissed their stale review February 29, 2024 02:25

should wait on @tgerdes to look at it

@debermudez
Copy link
Contributor

Removed approval so @tgerdesnv can get eyes on it too.
LGTM.

Copy link
Collaborator

@tgerdesnv tgerdesnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested. Let me know if you think we need to stick to your current plan in order to meet the deadline

@nnshah1
Copy link
Contributor

nnshah1 commented Mar 1, 2024

I do think we want to consider moving to a file format with https://jsonlines.org/ - this is probably out of scope for this PR and MVP. but json lines will allow us to support large files easier and without keeping things in memory. It's become quite popular for that reason as a way to use for logging.

It would also allow PA to "stream" results one record at a time to other scripts for real time processing.

@debermudez
Copy link
Contributor

I do think we want to consider moving to a file format with https://jsonlines.org/ - this is probably out of scope for this PR and MVP. but json lines will allow us to support large files easier and without keeping things in memory. It's become quite popular for that reason as a way to use for logging.

It would also allow PA to "stream" results one record at a time to other scripts for real time processing.

Took a quick look at the examples you linked. Functionally I think its a small change but I agree out of scope for this MVP.
If I understand the example, I think you still need to hold the complete experiment json object in memory instead of the list of experiments until it is complete. It would allow for us to cut our memory imprint down but makes the outputting of the file a bit more gnarly.

@nnshah1
Copy link
Contributor

nnshah1 commented Mar 1, 2024

I do think we want to consider moving to a file format with https://jsonlines.org/ - this is probably out of scope for this PR and MVP. but json lines will allow us to support large files easier and without keeping things in memory. It's become quite popular for that reason as a way to use for logging.
It would also allow PA to "stream" results one record at a time to other scripts for real time processing.

Took a quick look at the examples you linked. Functionally I think its a small change but I agree out of scope for this MVP. If I understand the example, I think you still need to hold the complete experiment json object in memory instead of the list of experiments until it is complete. It would allow for us to cut our memory imprint down but makes the outputting of the file a bit more gnarly.

What I'm imagining is rather that we have a line at the top with any experiment settings, like a header, and then each line after that is a record. Probably should put a proposal together.

@debermudez
Copy link
Contributor

I do think we want to consider moving to a file format with https://jsonlines.org/ - this is probably out of scope for this PR and MVP. but json lines will allow us to support large files easier and without keeping things in memory. It's become quite popular for that reason as a way to use for logging.
It would also allow PA to "stream" results one record at a time to other scripts for real time processing.

Took a quick look at the examples you linked. Functionally I think its a small change but I agree out of scope for this MVP. If I understand the example, I think you still need to hold the complete experiment json object in memory instead of the list of experiments until it is complete. It would allow for us to cut our memory imprint down but makes the outputting of the file a bit more gnarly.

What I'm imagining is rather that we have a line at the top with any experiment settings, like a header, and then each line after that is a record. Probably should put a proposal together.

Can this be verified via a json schema?
I want the reports locked in with verification via schemas soon.

@matthewkotila matthewkotila force-pushed the matthewkotila-response-output branch from 8601412 to 2eed0dc Compare March 2, 2024 02:39
Copy link
Collaborator

@tgerdesnv tgerdesnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I like this much better.
I didn't do a full thorough review, so please make sure to follow up on anyone else's feedback before merging

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@matthewkotila matthewkotila merged commit ae5d5b6 into feature-genai-pa Mar 5, 2024
3 checks passed
@matthewkotila matthewkotila deleted the matthewkotila-response-output branch March 5, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants