deadlock; recursive locking #358

johnr84 · 2020-03-23T10:51:41Z

What happened:

I am pretty new to splunk and deployed splunk connect for kubernetes(1.4.0) in the cluster and see the below error in the agent that runs on the master server. Kube API server logs are not pushed to the splunk server.

2020-03-23 08:31:30 +0000 [warn]: #0 dump an error event: error_class=ThreadError error="deadlock; recursive locking" location="/usr/share/gems/gems/fluent-plugin-concat-2.4.0/lib/fluent/plugin/filter_concat.rb:189:in `synchronize'" tag="tail.containers.var.log.containers.kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log" time=2020-03-23 08:31:25.150968430 +0000 record={"log"=>"I0323 08:31:25.150846 1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n", "stream"=>"stderr", "source"=>"/var/log/containers/kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log"}

2020-03-23 08:31:30 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log:stderr

What you expected to happen:

API server logs gets pushed to the splunk server

Anything else we need to know?:

Application logs from other containers are being pushed to the splunk server.

Environment:

Kubernetes version (use kubectl version): 1,15,4
Ruby version (use ruby --version): ruby 2.5.5p157
OS (e.g: cat /etc/os-release): Ubuntu 18.04.2 LTS
Splunk version: 8,0,1
SCK Version - 1,4,0

The text was updated successfully, but these errors were encountered:

matthewmodestino · 2020-03-25T12:09:07Z

This is definitely a bug in how we use fluentd Labels imo. Marking as such.

See workaround with concat dev suggestions here:

fluent-plugins-nursery/fluent-plugin-concat#69 (comment)

matthewmodestino · 2020-03-28T18:36:56Z

https://github.com/fluent-plugins-nursery/fluent-plugin-concat#configuration

timeout_label
timeout_label (string) (optional)
The label name to handle events caused by timeout.

Going to remove all the built in concat rules, looks like we need to send timeouts to another label?

<label @SPLUNK>
      # = filters for container logs =
      <filter tail.containers.var.log.containers.dns-controller*dns-controller*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-dns*sidecar*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-dns*dnsmasq*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-apiserver*kube-apiserver*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-controller-manager*kube-controller-manager*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-dns-autoscaler*autoscaler*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-proxy*kube-proxy*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-scheduler*kube-scheduler*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>
      <filter tail.containers.var.log.containers.kube-dns*kubedns*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
        separator ""
        use_first_timestamp true
      </filter>

matthewmodestino · 2020-03-29T15:57:47Z

i was able to workaround this by following the thread i posted above over on the concat repo.

I started by removing all the shipped concat filters from the logging and umbrella chart values.yaml, (these should be optional anyways) then updating configmap sources that we need concat for to point to @label CONCAT (containers and files sources currently allow you to set concat)

<source>
      @id containers.log
      @type tail
      @label @CONCAT
      tag tail.containers.*
      path {{ .Values.fluentd.path | default "/var/log/containers/*.log" }}
      {{- if .Values.fluentd.exclude_path }}
      exclude_path {{ .Values.fluentd.exclude_path | toJson }}
      {{- end }}
      pos_file /var/log/splunk-fluentd-containers.log.pos
      path_key source
      read_from_head true
      <parse>
      {{- if eq .Values.containers.logFormatType "cri" }}
        @type regexp
        expression /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
        time_format  {{ .Values.containers.logFormat | default "%Y-%m-%dT%H:%M:%S.%N%:z" }}
      {{- else if eq .Values.containers.logFormatType "json" }}
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      {{- end }}
        time_key time
        time_type string
        localtime false
      </parse>
    </source>

Then moved the concat filter outside the @SPLUNK label in output.conf

 <label @CONCAT>
     # = filters for container logs =
      {{- range $name, $logDef := .Values.logs }}
      {{- if and $logDef.from.pod $logDef.multiline }}
      <filter tail.containers.var.log.containers.{{ $logDef.from.pod }}*{{ or $logDef.from.container $name }}*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp {{ $logDef.multiline.firstline }}
        flush_interval {{ $logDef.multiline.flushInterval | default "2s" }}
        separator "\n"
        use_first_timestamp true
      </filter>
      {{- end }}
      {{- end }}

      <match **>
        @type relabel
        @label @SPLUNK
      </match>
    </label>
    <label @SPLUNK>
      # Enrich log with k8s metadata
      <filter tail.containers.**>
        @type kubernetes_metadata
        annotation_match [ "^splunk\.com" ]
        de_dot false
      </filter>
      <filter tail.containers.**>
        @type record_transformer
        enable_ruby
        <record>
...
...

Because not all logs need the same concat settings (ie. separator, etc - in the example above my pod needs separator "\n", some don't), I believe we need to expose more multiline settings in the helm chart instead of rendering all concat filters with the same settings block.

So what we need to solve this:

Correct the configmap to update source.containers.conf & source.files.conf to point to a new label called ie. CONCAT which contains the multiline logic in output.conf
Expose more multiline settings in the helm chart, especially separator as not all concat rules need the same treatement.
Remove or make our shipped concat filters optional

matthewmodestino · 2020-03-30T12:23:50Z

oh and you'll notice i put the timeout to 2s. I dont think I know of a concat situation in normal ops where we need to wait that long for more lines. I actually think it should be 1s but we should test whether events ever span more than a whole second apart.

sfrancies · 2020-04-03T19:17:48Z

I got the same warn msg :

[warn]: #0 dump an error event: error_class=ThreadError error="deadlock; recursive locking" location="/usr/share/gems/gems/fluent-plugin-concat-2.4.0/lib/fluent/plugin/filter_concat.rb:189:in `synchronize'" tag="tail.containers.var.log.containers.kube-proxy-w2hcg_kube-system_kube-proxy-f5c846203755b66df23ed167d65b907a59d8a93893bfad14b2da804563d62a57.log" time=2020-04-03 19:12:27.633148152 +0000 record={"log"=>"I0403 19:12:27.633038 1 bounded_frequency_runner.go:221] sync-runner: ran, next possible in 0s, periodic in 30s\n", "stream"=>"stderr", "source"=>"/var/log/containers/kube-proxy-w2hcg_kube-system_kube-proxy-f5c846203755b66df23ed167d65b907a59d8a93893bfad14b2da804563d62a57.log"}
2020-04-03 19:12:32 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.kube-proxy-w2hcg_kube-system_kube-proxy-f5c846203755b66df23ed167d65b907a59d8a93893bfad14b2da804563d62a57.log:stderr

Will there be a fix for it soon ?

szymonpk · 2020-04-09T08:00:40Z

@matthewmodestino any chance you will make a MR with this fix? Since you have figured out the whole issue. We are experiencing the same issue with EKS, and I would like to avoid forking the chart.

matthewmodestino · 2020-04-09T09:43:46Z

yeah, i’ll try and get it started later today. Do you think I should make the default concat filters optional? eg. ability to toggle them all on or all off?

I think thats what I’ll try because I don’t believe they apply in all enviros...

szymonpk · 2020-04-10T04:33:27Z

@matthewmodestino if it is possible to make it optional, the sure. Looks like not too many people experience the same issue. I am a fan of occam's razor - "Entities should not be multiplied without necessity." ;-)

BTW. Thanks for looking into this.

matthewmodestino · 2020-04-10T23:54:42Z

@szymonpk PR is in for review. once the team has a chance to take a look i will add another to expose the "separator" option...and will look for any others we think we need to make multiline logs pretty.

matthewmodestino added bug Something isn't working documentation Docs needed labels Mar 25, 2020

matthewmodestino mentioned this issue Apr 1, 2020

[DOCS] Add multiline logging tutorial #255

Closed

This was referenced Apr 10, 2020

log pod name matching #366

Closed

Fix concat #367

Closed

matthewmodestino linked a pull request Apr 10, 2020 that will close this issue

Fix concat #367

Closed

3 tasks

rockb1017 mentioned this issue Apr 16, 2020

Concat control #369

Merged

rockb1017 closed this as completed Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deadlock; recursive locking #358

deadlock; recursive locking #358

johnr84 commented Mar 23, 2020 •

edited

Loading

matthewmodestino commented Mar 25, 2020 •

edited

Loading

matthewmodestino commented Mar 28, 2020 •

edited

Loading

matthewmodestino commented Mar 29, 2020 •

edited

Loading

matthewmodestino commented Mar 30, 2020

sfrancies commented Apr 3, 2020 •

edited

Loading

szymonpk commented Apr 9, 2020 •

edited

Loading

matthewmodestino commented Apr 9, 2020 •

edited

Loading

szymonpk commented Apr 10, 2020

matthewmodestino commented Apr 10, 2020

deadlock; recursive locking #358

deadlock; recursive locking #358

Comments

johnr84 commented Mar 23, 2020 • edited Loading

matthewmodestino commented Mar 25, 2020 • edited Loading

matthewmodestino commented Mar 28, 2020 • edited Loading

matthewmodestino commented Mar 29, 2020 • edited Loading

matthewmodestino commented Mar 30, 2020

sfrancies commented Apr 3, 2020 • edited Loading

szymonpk commented Apr 9, 2020 • edited Loading

matthewmodestino commented Apr 9, 2020 • edited Loading

szymonpk commented Apr 10, 2020

matthewmodestino commented Apr 10, 2020

johnr84 commented Mar 23, 2020 •

edited

Loading

matthewmodestino commented Mar 25, 2020 •

edited

Loading

matthewmodestino commented Mar 28, 2020 •

edited

Loading

matthewmodestino commented Mar 29, 2020 •

edited

Loading

sfrancies commented Apr 3, 2020 •

edited

Loading

szymonpk commented Apr 9, 2020 •

edited

Loading

matthewmodestino commented Apr 9, 2020 •

edited

Loading