-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata to tracing #375
Conversation
ec1a7ca
to
3996321
Compare
@devjpt23 we need to address 2 process related items with contributing to Kai.
Also when you are ready for this PR to be tested and reviewed, please convert this out of 'Draft'. |
3996321
to
7305a83
Compare
Signed-off-by: devjpt23 <devpatel232408@gmail.com>
7305a83
to
a200846
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed with IBM BAM I'm seeing token usage data
$ cat token_usage.json
{
"prompt_tokens": 6427,
"completion_tokens": 1244,
"total_tokens": 7671,
"input_token_count": 6427,
"generated_token_count": 1244
}
For BedRock using claude 3.5, there is no token data, we catch the exception and logs show a warning.
WARNING - 2024-09-20 15:05:13,937 - kai.service.kai_application.kai_application - [ kai_application.py:189 - get_incident_solutions_for_file()
] - Key does not exist in the dictionary: 'token_usage'
I think we may need to experiment with a few other providers and see if/how they report back token usage. |
@jwmatthews During testing, I found that capturing the entire response metadata generated, via demo mode, a log file of over 4,000 lines. To optimize logging, I considered targeting specific keys within the response. devjpt23@5A-E1-06-17-C3-51:~/grok-integrate/kai/logs/trace/meta-llama/llama-3-70b-instruct/coolstore/src/main/java/com/redhat/coolstore/model/InventoryEntity.java/single_group/1727334757.1589327/1/0$ wc -l token_usage.json
4025 token_usage.json If we are comfortable with logging long responses can easily fix that. Please let me know your preference for the most efficient logging approach. |
Capture Number of Tokens in Request and Response
When this PR is merged, you will be able to capture metadata from the responses into the trace directory when you run run_demo.py . The structure will be as follows:
This will aid you in debugging by providing detailed metadata and trace information.
Note: The metadata captured varies across different models. This variation reflects the unique characteristics and capabilities of each model, such as token usage, latency, or other performance metrics. For specific details on the metadata differences across various models, please refer to the following link: LLM Metadata Variations.