-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] Add test for get_dataset_info
#213231
base: main
Are you sure you want to change the base?
Conversation
get_dataset_info
and synthtrace scenario for generating apache logsget_dataset_info
and synthtrace scenario for generating apache logs
get_dataset_info
and synthtrace scenario for generating apache logsget_dataset_info
Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant) |
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
}; | ||
} | ||
try { | ||
const name = indexPattern === '' ? ['*', '*:*'] : `${indexPattern.split(',')}*`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgieselaar As discussed I've added a trailing wildcard. I think this is better although this still doesn't find zookeeper logs if the user asks "please show me my zookeeper logs" and they use datastreams with default naming, eg. indices will be something like .ds-logs-zookeeper.1-default-2025.03.04-000001
and the data stream will be logs-zookeeper.1-default
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To handle that we'd need both a leading and trailing wildcard (*${indexPattern.split(',')}*
). I wouldn't mind that but you had some concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can:
- first execute with the raw value for
index
(let's saylogs
) - if no indices are resolved, execute it with
logs*
- if still no indices are resolved, execute it with
*logs*
we can also always add a *:...
additionally.
_resolve/index should be relatively fast so I'm not worried about performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_resolve/index should be relatively fast so I'm not worried about performance.
That's also my thinking so in that case, why not just do *logs*
up front to avoid the complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from code complexity it could also be confusing to users how it sometimes includes some indices depending on what other indices they have in their system, eg. if they have a datastream called zookeeper-legacy-metrics
and another logs-zookeeper.1-default
it will only include their logs datastream if they delete the legacy datastream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also my thinking so in that case, why not just do logs up front to avoid the complexity?
Because you get stuff like backing indices. But if we can exclude those (I think we can) it's probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also my thinking so in that case, why not just do logs up front to avoid the complexity?
starting with a more-specific index pattern is not about performance, it's about making sure we don't send a lot of noise to the LLM. We could also filter progressively client side if that makes things easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you get stuff like backing indices. But if we can exclude those (I think we can) it's probably fine.
When would that be a problem? Say, in the example with "zookeeper" can you imagine an index that includes "zookeeper" but should not be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way we do not match backing indices.
💚 Build Succeeded
Metrics [docs]
History |
get_dataset_info