-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Streams] Replay loghub data with synthtrace #212120
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation-wise this looks pretty good to me. Some meta questions:
- Should we rely on the public loghub repo or fork it off? I'm a little worried this breaking at some point because loghub changes its layout. This would also make it easier to expand it by our own means. In both cases we should cite loghub and the paper somewhere appropriate (like a readme file) as by the license
- I'm not so sure about the different speeds. I'm running via
node scripts/synthtrace.js sample_logs --live --kibana=http://localhost:5601 --target=http://localhost:9200 --liveBucketSize=1000
and the liveBucketSize is essentially not considered because it computes its own speed. Can we make it taken into account? Different speeds for different data sets are a nice touch as it mirrors reality, but I would like to control the factor of data intake (and speed everything up by a factor of 1k for example). Maybe that's already possible and I just don't know the right command - I spot-checked some aspects of the refactoring and it makes sense to me, but I didn't dig through everything and as I'm not super familiar with the code base it's likely I'm missing something in there
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
I'm fine with either - but maybe good to do that as a follow-up, I'm not sure what the legal ramifications are.
Yes, totally forgot about this setting, I should be able to use it. Would we use a constant indexing rate for each generator, or keep the relative rate per generator (e.g. Android indexes at a way higher rate than Macbook)? |
Sounds good, then we should add a backlink to the repo and paper and follow up later.
I would prefer the latter, in practice this kind of thing happens all the time. |
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
@flash1293 I've added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🌟
I've been thinking about the clients in synthtrace, makes more sense consolidating them, there was a lot of repeated code. Thanks for this 👏
Discussed offline:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/platform/packages/private/kbn-journeys/services/synthtrace.ts
changes LGTM
if (responseJson.response && responseJson.response.latestVersion) { | ||
return responseJson.response.latestVersion as string; | ||
if (!response.item.latestVersion) { | ||
throw new Error(`Failed to fetch APM package version`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not supporting Synthtrace for 7.x ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally accidentally removed it, but yeah, let's forget about 7.x, or do you have concerns?
…er # Please enter a commit message to explain why this merge is necessary,
💔 Build Failed
Failed CI StepsHistory
cc @dgieselaar |
Download, parse and replay loghub data with Synthtrace, for use in the Streams project. In summary:
@kbn/sample-log-parser
package which parses Loghub sample data, creates valid parsers for extracting and replacing timestamps, using the LLMsample_logs
scenario which uses the parsed data sets to replay Loghub data continuously as if it were live data