Skip to content

making multi line work? Dead lock recursive locking #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Vince-Cercury opened this issue Mar 18, 2019 · 17 comments
Closed

making multi line work? Dead lock recursive locking #111

Vince-Cercury opened this issue Mar 18, 2019 · 17 comments
Labels
bug Something isn't working

Comments

@Vince-Cercury
Copy link

Vince-Cercury commented Mar 18, 2019

I have some events on multi line

2019-03-18 06:06:48.859  INFO [manage-xxx-service,,,] [10.2.7.19] 1 --- [-15276-thread-1] o.a.k.clients.consumer.ConsumerConfig    : ConsumerConfig values:
        auto.commit.interval.ms = 5000
        auto.offset.reset = latest
        bootstrap.servers = [my-kafka-service:9092]
        check.crcs = true

So I added this line:

  manage-xxx-service:
    <<: *glog
    from:
      pod: my-xxx-service
    multiline:
      firstline: /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/
    sourcetype: kube:my-xxx-service

Which produces this config:

      <filter tail.containers.var.log.containers.my-xxx-service*my-xxx-service*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/
        flush_interval 5s
      </filter>

I'm getting an error "deadlock; recursive locking"

2019-03-18 05:13:54 +0000 [warn]: #0 dump an error event: error_class=ThreadError error="deadlock; recursive locking" location="/usr/local/bundle/gems/fluent-plugin-concat-2.3.0/lib/fluent/plugin/filter_concat.rb:144:in `synchronize'" tag="tail.containers.var.log.containers.my-xxx-service-85855985fc-pgl6g_yyy_my-incident-service-0ee1814dcd3596c96e0bf6c0a2e65a9437cf1b282a95daf41fbd6e8933df1f8f.log" time=

What am I doing wrong?

@Vince-Cercury Vince-Cercury changed the title making multi line work. Dead lock making multi line work? Dead lock recursive locking Mar 18, 2019
@Vince-Cercury
Copy link
Author

Vince-Cercury commented Mar 18, 2019

Note that despite the error, it works. But I don't think I should proceed until error is gone

@matthewmodestino
Copy link
Collaborator

Thanks for reporting this Vince, and appreciate you logging the issue on the concat repo.

fluent-plugins-nursery/fluent-plugin-concat#69

I will assist with following up with the concat devs, in the meantime, I will validate that flipping back to the 1.0.1 logging images doesn't show this error.

The 1.1.0 images bumped the fluentd and concat plugin versions, so it may just be that they don't play nice yet with fluent 1.4.

@matthewmodestino matthewmodestino added the bug Something isn't working label Mar 22, 2019
@Vince-Cercury
Copy link
Author

I can confirm 1.0.1 removed the errors

@matthewmodestino
Copy link
Collaborator

I have updated the issue on the concat repo with reproducible steps for docker.

@matthewmodestino
Copy link
Collaborator

@chaitanyaphalak
Copy link
Contributor

Fixed in 1.2.0

@mebuzz
Copy link

mebuzz commented Aug 23, 2019

I am still having this issue on SplunkConnect for kuberentes(i am on AWS EKS). Do I update the fluentd image to the updated one -1.2.0?

@matthewmodestino
Copy link
Collaborator

yes, update to fluentd-hec image 1.1.1 OR see link to concat issues where there is a label workaround. looks like root cause is due to events being remitted to @splunk label...

@mebuzz
Copy link

mebuzz commented Aug 23, 2019

Thank you for the super quick response @matthewmodestino .
I checked the fluentd-hec-image version. It is updated to 1.1.1. My values.yaml contains the following lines:
image:
name: splunk/fluentd-hec:1.1.1

Should I just update the version here to "1.2.0"??

@matthewmodestino
Copy link
Collaborator

no, there is no 1.2

https://hub.docker.com/r/splunk/fluentd-hec/tags

Can you share the log you are seeing and the config you are running?

@mebuzz
Copy link

mebuzz commented Aug 23, 2019

You mean the pod logs ?? And the config of configMap or the daemonset?

@matthewmodestino
Copy link
Collaborator

yes, I would pick a logging pod and dump the logs. At the head of the log we render the configuration the pod is running, then grab one of the errors you are seeing.

@mebuzz
Copy link

mebuzz commented Aug 23, 2019

I am not seeing any errors in the logs.

@matthewmodestino
Copy link
Collaborator

ok...so let's back up then. Are you setting multiline rules in your configmap? If so, are you seeing the deadlock error?

@mebuzz
Copy link

mebuzz commented Aug 24, 2019

Yes, we have put in this config in the configmap for multiline:
<filter tail.containers.var.log.containers.*.log>
@type concat
key log
timeout_label @splunk
stream_identity_key stream
multiline_start_regexp /^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}\d/
flush_interval 5s

@matthewmodestino
Copy link
Collaborator

and are you seeing errors? also do all your containers actually match that line breaker?

@mebuzz
Copy link

mebuzz commented Aug 26, 2019

I have pushed the settings via configmap, so all containers should be having these settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants