Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata to tracing #375

Merged
merged 1 commit into from
Sep 20, 2024
Merged

Conversation

devjpt23
Copy link
Contributor

@devjpt23 devjpt23 commented Sep 18, 2024

Capture Number of Tokens in Request and Response
When this PR is merged, you will be able to capture metadata from the responses into the trace directory when you run run_demo.py . The structure will be as follows:

── trace
      └── gpt-3.5-turbo << MODEL ID>>
          └── coolstore << APP Name >>
              ├── pom.xml << Source File Path >>
              │   └── single_group << Incident Batch Mode >>
              │       └── 1719673609.8266618 << Start of Request Time Stamp >>
              │           ├── 1 << Incident Batch Number >>
              │           │   ├── 0 << Retry Attempt  >>
              │           │   │   ├── llm_result << Contains the response from the LLM prior to us parsing >>
              │           │   │   ├── token_usage.json<< New metadata file added here >>
              │           │   │   ├── prompt << The formatted prompt prior to sending to LLM >>
              │           │   │   └── prompt_vars.json << The prompt variables which are injected into the prompt template >>
              │           │   ├── params.json << Request parameters >>
              │           │   └── timing << Duration of a Successful Request >>
              └── src
                  └── main
                      ├── java
                      │   └── com
                      │       └── redhat
                      │           └── coolstore
                      │               ├── model
                      │               │   ├── InventoryEntity.java
                      │               │   │   └── single_group
                      │               │   │       └── 1719673609.827135
                      │               │   │           ├── 1
                      │               │   │           │   ├── 0
                      │               │   │           │   │   ├── llm_result
                      │               │   │           │   │   └── token_usage.json << New metadata file added here >>
                      │               │   │           │   ├── prompt
                      │               │   │           │   └── prompt_vars.json
                      │               │   │           ├── params.json
                      │               │   │           └── timing
                      │               │   ├── Order.java
                      │               │   │   └── single_group
                      │               │   │       └── 1719673609.826999
                      │               │   │           ├── 1
                      │               │   │           │   ├── 0
                      │               │   │           │   │   ├── llm_result
                      │               │   │           │   │   └── token_usage.json << New metadata file added here >>
                      │               │   │           │   ├── prompt
                      │               │   │           │   └── prompt_vars.json
                      │               │   │           ├── params.json
                      │               │   │           └── timing

This will aid you in debugging by providing detailed metadata and trace information.

Note: The metadata captured varies across different models. This variation reflects the unique characteristics and capabilities of each model, such as token usage, latency, or other performance metrics. For specific details on the metadata differences across various models, please refer to the following link: LLM Metadata Variations.

@devjpt23 devjpt23 marked this pull request as draft September 18, 2024 16:36
@devjpt23 devjpt23 force-pushed the add-metadata-to-tracing branch from ec1a7ca to 3996321 Compare September 18, 2024 17:45
@jwmatthews jwmatthews self-requested a review September 18, 2024 18:29
@jwmatthews
Copy link
Member

@devjpt23 we need to address 2 process related items with contributing to Kai.

  1. We need to 'sign' our commits through a DCO process:

  2. We need to check that the linter trunk succeeds running through the code

Also when you are ready for this PR to be tested and reviewed, please convert this out of 'Draft'.

@devjpt23 devjpt23 changed the title Add metadata to tracing #373 Add metadata to tracing Sep 19, 2024
@devjpt23 devjpt23 force-pushed the add-metadata-to-tracing branch from 3996321 to 7305a83 Compare September 19, 2024 17:07
Signed-off-by: devjpt23 <devpatel232408@gmail.com>
@devjpt23 devjpt23 force-pushed the add-metadata-to-tracing branch from 7305a83 to a200846 Compare September 19, 2024 17:17
@devjpt23 devjpt23 marked this pull request as ready for review September 19, 2024 17:22
Copy link
Member

@jwmatthews jwmatthews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with IBM BAM I'm seeing token usage data

$ cat token_usage.json
{
"prompt_tokens": 6427,
"completion_tokens": 1244,
"total_tokens": 7671,
"input_token_count": 6427,
"generated_token_count": 1244
}

For BedRock using claude 3.5, there is no token data, we catch the exception and logs show a warning.

WARNING - 2024-09-20 15:05:13,937 - kai.service.kai_application.kai_application - [ kai_application.py:189 - get_incident_solutions_for_file()
] - Key does not exist in the dictionary: 'token_usage'

@jwmatthews
Copy link
Member

I think we may need to experiment with a few other providers and see if/how they report back token usage.
Planning to merge this for now as we continue to experiment with other ways of grabbing this info.

@jwmatthews jwmatthews merged commit ed14dbe into konveyor:main Sep 20, 2024
5 checks passed
@devjpt23
Copy link
Contributor Author

devjpt23 commented Sep 26, 2024

@jwmatthews During testing, I found that capturing the entire response metadata generated, via demo mode, a log file of over 4,000 lines. To optimize logging, I considered targeting specific keys within the response.

devjpt23@5A-E1-06-17-C3-51:~/grok-integrate/kai/logs/trace/meta-llama/llama-3-70b-instruct/coolstore/src/main/java/com/redhat/coolstore/model/InventoryEntity.java/single_group/1727334757.1589327/1/0$ wc -l token_usage.json 
4025 token_usage.json

If we are comfortable with logging long responses can easily fix that.

Please let me know your preference for the most efficient logging approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants