Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock bulk queue while processing indexing request #45
Lock bulk queue while processing indexing request #45
Changes from 6 commits
39e2199
0432e2b
eb64f44
8715a4f
fa8f445
d53ee7c
80b143c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a library function, or built-in ruby one?
Is this one thread-safe?
Asking cause of these scary articles:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - how does SINK_LOCK_TIMEOUT interact with retries for sink? Can it timeout before sink finishes retrying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout
is built-in ruby and it appears it wouldn't be thread-safe there 🤦🏻 sorry I never thought something built-in could be so bad, it seems everywhere recommends not to use it.There's no interaction, it's just a flat timeout.
There are two retries that would happen in sink.
mutex.synchronize
just waits for the lock to be lifted.If we remove
Timeout
I think this could be reworked to use amutex.try_lock
>mutex.unlock
block instead ofmutex.synchronize
. If the lock can't be acquired it sleeps for 1 second, up to a max of maybe 120 retries (so approx. two minutes), before throwing an error. How does that sound?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it happens :)
I think indeed using mutex is best. A question raised in my head while reading through it - what happens if Elasticsearch is overloaded and start throttling?
So you have 10 threads that are getting page content and throwing it into single sink. Sink starts to slow down, but do threads keep extracting content? What happens if your timeout it 120 seconds, but sink took 118 seconds to unlock, will all threads waiting for it be able to write into it anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, content extraction should pause during this for all threads. The executors are idling waiting for the lock.
If the remaining executors can do write everything in 2 seconds, yes. The write is usually very fast if there's no flush required, so I think most would go through in this case, but there's always a possibility of some being dropped.
I think that's unavoidable, and as long as it's logged clearly users can do something about it (e.g. look at why Elasticsearch throttles the Crawler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artem-shelkovnikov I've changed things a bit so
Timeout
isn't used, I thought it made more sense to go by lock acquisition attempts instead (this also means if a thread acquires a lock late and takes a long time itself, it won't be unnecessarily terminated early).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a look and haven't found anything broken with this approach. I'm also not super good with concurrent operations, so some stuff might go south regardless, this is something that's just worth testing at some point (Like ingest a large site into a super small Elasticsearch instance and observe), that can of course be done later when you feel it's the right moment!