-
Notifications
You must be signed in to change notification settings - Fork 264
deadlock; recursive locking #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is definitely a bug in how we use fluentd Labels imo. Marking as such. See workaround with concat dev suggestions here: |
https://github.com/fluent-plugins-nursery/fluent-plugin-concat#configuration
Going to remove all the built in concat rules, looks like we need to send timeouts to another label?
|
i was able to workaround this by following the thread i posted above over on the concat repo. I started by removing all the shipped concat filters from the logging and umbrella chart
Then moved the concat filter outside the
Because not all logs need the same concat settings (ie. separator, etc - in the example above my pod needs separator "\n", some don't), I believe we need to expose more multiline settings in the helm chart instead of rendering all concat filters with the same settings block. So what we need to solve this:
|
oh and you'll notice i put the timeout to 2s. I dont think I know of a concat situation in normal ops where we need to wait that long for more lines. I actually think it should be 1s but we should test whether events ever span more than a whole second apart. |
I got the same warn msg : [warn]: #0 dump an error event: error_class=ThreadError error="deadlock; recursive locking" location="/usr/share/gems/gems/fluent-plugin-concat-2.4.0/lib/fluent/plugin/filter_concat.rb:189:in `synchronize'" tag="tail.containers.var.log.containers.kube-proxy-w2hcg_kube-system_kube-proxy-f5c846203755b66df23ed167d65b907a59d8a93893bfad14b2da804563d62a57.log" time=2020-04-03 19:12:27.633148152 +0000 record={"log"=>"I0403 19:12:27.633038 1 bounded_frequency_runner.go:221] sync-runner: ran, next possible in 0s, periodic in 30s\n", "stream"=>"stderr", "source"=>"/var/log/containers/kube-proxy-w2hcg_kube-system_kube-proxy-f5c846203755b66df23ed167d65b907a59d8a93893bfad14b2da804563d62a57.log"} Will there be a fix for it soon ? |
@matthewmodestino any chance you will make a MR with this fix? Since you have figured out the whole issue. We are experiencing the same issue with EKS, and I would like to avoid forking the chart. |
yeah, i’ll try and get it started later today. Do you think I should make the default concat filters optional? eg. ability to toggle them all on or all off? I think thats what I’ll try because I don’t believe they apply in all enviros... |
@matthewmodestino if it is possible to make it optional, the sure. Looks like not too many people experience the same issue. I am a fan of occam's razor - "Entities should not be multiplied without necessity." ;-) BTW. Thanks for looking into this. |
@szymonpk PR is in for review. once the team has a chance to take a look i will add another to expose the "separator" option...and will look for any others we think we need to make multiline logs pretty. |
What happened:
I am pretty new to splunk and deployed splunk connect for kubernetes(1.4.0) in the cluster and see the below error in the agent that runs on the master server. Kube API server logs are not pushed to the splunk server.
2020-03-23 08:31:30 +0000 [warn]: #0 dump an error event: error_class=ThreadError error="deadlock; recursive locking" location="/usr/share/gems/gems/fluent-plugin-concat-2.4.0/lib/fluent/plugin/filter_concat.rb:189:in `synchronize'" tag="tail.containers.var.log.containers.kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log" time=2020-03-23 08:31:25.150968430 +0000 record={"log"=>"I0323 08:31:25.150846 1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n", "stream"=>"stderr", "source"=>"/var/log/containers/kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log"}
2020-03-23 08:31:30 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log:stderr
What you expected to happen:
API server logs gets pushed to the splunk server
Anything else we need to know?:
Application logs from other containers are being pushed to the splunk server.
Environment:
kubectl version
): 1,15,4ruby --version
): ruby 2.5.5p157cat /etc/os-release
): Ubuntu 18.04.2 LTSThe text was updated successfully, but these errors were encountered: