Skip to content

Commit 89fe756

Browse files
authored
Merge pull request #136 from OlehPalanskyi/main
Added a retry logic and a service availability check function for high availability.
2 parents 0762b3f + 3abd9eb commit 89fe756

File tree

3 files changed

+177
-3
lines changed

3 files changed

+177
-3
lines changed

README.OpenSearchInput.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,16 @@
2424
+ [docinfo_fields](#docinfo_fields)
2525
+ [docinfo_target](#docinfo_target)
2626
+ [docinfo](#docinfo)
27+
+ [check_connection](#check_connection)
28+
+ [retry_forever](#retry_forever)
29+
+ [retry_timeout](#retry_timeout)
30+
+ [retry_max_times](#retry_max_times)
31+
+ [retry_type](#retry_type)
32+
+ [retry_wait](#retry_wait)
33+
+ [retry_exponential_backoff_base](#retry_exponential_backoff_base)
34+
+ [retry_max_interval](#retry_max_interval)
35+
+ [retry_randomize](#retry_randomize)
36+
2737
* [Advanced Usage](#advanced-usage)
2838

2939
## Usage
@@ -274,6 +284,88 @@ This parameter specifies whether docinfo information including or not. The defau
274284
docinfo false
275285
```
276286

287+
### check_connection
288+
289+
The parameter for checking on connection availability with Elasticsearch or Opensearch hosts. The default value is `true`.
290+
291+
```
292+
check_connection true
293+
```
294+
### retry_forever
295+
296+
The parameter If true, plugin will ignore retry_timeout and retry_max_times options and retry forever. The default value is `true`.
297+
298+
```
299+
retry_forever true
300+
```
301+
302+
### retry_timeout
303+
304+
The parameter maximum time (seconds) to retry again the failed try, until the plugin discards the retry.
305+
If the next retry is going to exceed this time limit, the last retry will be made at exactly this time limit..
306+
The default value is `72h`.
307+
72hours == 17 times with exponential backoff (not to change default behavior)
308+
309+
```
310+
retry_timeout 72 * 60 * 60
311+
```
312+
313+
### retry_max_times
314+
315+
The parameter maximum number of times to retry the failed try. The default value is `5`
316+
317+
```
318+
retry_max_times 5
319+
```
320+
321+
### retry_type
322+
323+
The parameter needs for how long need to wait (time in seconds) to retry again:
324+
`exponential_backoff`: wait in seconds will become large exponentially per failure,
325+
`periodic`: plugin will retry periodically with fixed intervals (configured via retry_wait). The default value is `:exponential_backoff`
326+
Periodic -> fixed :retry_wait
327+
Exponential backoff: k is number of retry times
328+
c: constant factor, @retry_wait
329+
b: base factor, @retry_exponential_backoff_base
330+
k: times
331+
total retry time: c + c * b^1 + (...) + c*b^k = c*b^(k+1) - 1
332+
333+
```
334+
retry_type exponential_backoff
335+
```
336+
337+
### retry_wait
338+
339+
The parameter needs for wait in seconds before the next retry to again or constant factor of exponential backoff. The default value is `5`
340+
341+
```
342+
retry_wait 5
343+
```
344+
345+
### retry_exponential_backoff_base
346+
347+
The parameter The base number of exponential backoff for retries. The default value is `2`
348+
349+
```
350+
retry_exponential_backoff_base 2
351+
```
352+
353+
### retry_max_interval
354+
355+
The parameter maximum interval (seconds) for exponential backoff between retries while failing. The default value is `nil`
356+
357+
```
358+
retry_max_interval nil
359+
```
360+
361+
### retry_randomize
362+
363+
The parameter If true, the plugin will retry after randomized interval not to do burst retries. The default value is `false`
364+
365+
```
366+
retry_randomize false
367+
```
368+
277369
## Advanced Usage
278370

279371
OpenSearch Input plugin and OpenSearch output plugin can combine to transfer records into another cluster.

lib/fluent/plugin/in_opensearch.rb

Lines changed: 78 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
require 'faraday/excon'
3030
require 'fluent/log-ext'
3131
require 'fluent/plugin/input'
32+
require 'fluent/plugin_helper'
3233
require_relative 'opensearch_constants'
3334

3435
module Fluent::Plugin
@@ -39,7 +40,7 @@ class UnrecoverableRequestFailure < Fluent::UnrecoverableError; end
3940
DEFAULT_STORAGE_TYPE = 'local'
4041
METADATA = "@metadata".freeze
4142

42-
helpers :timer, :thread
43+
helpers :timer, :thread, :retry_state
4344

4445
Fluent::Plugin.register_input('opensearch', self)
4546

@@ -80,6 +81,23 @@ class UnrecoverableRequestFailure < Fluent::UnrecoverableError; end
8081
config_param :docinfo_fields, :array, :default => ['_index', '_type', '_id']
8182
config_param :docinfo_target, :string, :default => METADATA
8283
config_param :docinfo, :bool, :default => false
84+
config_param :check_connection, :bool, :default => true
85+
config_param :retry_forever, :bool, default: true, desc: 'If true, plugin will ignore retry_timeout and retry_max_times options and retry forever.'
86+
config_param :retry_timeout, :time, default: 72 * 60 * 60, desc: 'The maximum seconds to retry'
87+
# 72hours == 17 times with exponential backoff (not to change default behavior)
88+
config_param :retry_max_times, :integer, default: 5, desc: 'The maximum number of times to retry'
89+
# exponential backoff sequence will be initialized at the time of this threshold
90+
config_param :retry_type, :enum, list: [:exponential_backoff, :periodic], default: :exponential_backoff
91+
### Periodic -> fixed :retry_wait
92+
### Exponential backoff: k is number of retry times
93+
# c: constant factor, @retry_wait
94+
# b: base factor, @retry_exponential_backoff_base
95+
# k: times
96+
# total retry time: c + c * b^1 + (...) + c*b^k = c*b^(k+1) - 1
97+
config_param :retry_wait, :time, default: 5, desc: 'Seconds to wait before next retry , or constant factor of exponential backoff.'
98+
config_param :retry_exponential_backoff_base, :float, default: 2, desc: 'The base number of exponential backoff for retries.'
99+
config_param :retry_max_interval, :time, default: nil, desc: 'The maximum interval seconds for exponential backoff between retries while failing.'
100+
config_param :retry_randomize, :bool, default: false, desc: 'If true, output plugin will retry after randomized interval not to do burst retries.'
83101

84102
include Fluent::Plugin::OpenSearchConstants
85103

@@ -92,6 +110,7 @@ def configure(conf)
92110

93111
@timestamp_parser = create_time_parser
94112
@backend_options = backend_options
113+
@retry = nil
95114

96115
raise Fluent::ConfigError, "`password` must be present if `user` is present" if @user && @password.nil?
97116

@@ -138,6 +157,15 @@ def backend_options
138157
raise Fluent::ConfigError, "You must install #{@http_backend} gem. Exception: #{ex}"
139158
end
140159

160+
def retry_state(randomize)
161+
retry_state_create(
162+
:input_retries, @retry_type, @retry_wait, @retry_timeout,
163+
forever: @retry_forever, max_steps: @retry_max_times,
164+
max_interval: @retry_max_interval, backoff_base: @retry_exponential_backoff_base,
165+
randomize: randomize
166+
)
167+
end
168+
141169
def get_escaped_userinfo(host_str)
142170
if m = host_str.match(/(?<scheme>.*)%{(?<user>.*)}:%{(?<password>.*)}(?<path>@.*)/)
143171
m["scheme"] +
@@ -176,12 +204,29 @@ def get_connection_options(con_host=nil)
176204
host.merge!(user: @user, password: @password) if !host[:user] && @user
177205
host.merge!(path: @path) if !host[:path] && @path
178206
end
179-
207+
live_hosts = @check_connection ? hosts.select { |host| reachable_host?(host) } : hosts
180208
{
181-
hosts: hosts
209+
hosts: live_hosts
182210
}
183211
end
184212

213+
def reachable_host?(host)
214+
client = OpenSearch::Client.new(
215+
host: ["#{host[:scheme]}://#{host[:host]}:#{host[:port]}"],
216+
user: host[:user],
217+
password: host[:password],
218+
reload_connections: @reload_connections,
219+
request_timeout: @request_timeout,
220+
resurrect_after: @resurrect_after,
221+
reload_on_failure: @reload_on_failure,
222+
transport_options: { ssl: { verify: @ssl_verify, ca_file: @ca_file, version: @ssl_version } }
223+
)
224+
client.ping
225+
rescue => e
226+
log.warn "Failed to connect to #{host[:scheme]}://#{host[:host]}:#{host[:port]}: #{e.message}"
227+
false
228+
end
229+
185230
def emit_error_label_event(&block)
186231
# If `emit_error_label_event` is specified as false, error event emittions are not occurred.
187232
if emit_error_label_event
@@ -292,6 +337,25 @@ def is_existing_connection(host)
292337
return true
293338
end
294339

340+
def update_retry_state(error=nil)
341+
if error
342+
unless @retry
343+
@retry = retry_state(@retry_randomize)
344+
end
345+
@retry.step
346+
#Raise error if the retry limit has been reached
347+
raise "Hit limit for retries. retry_times: #{@retry.steps}, error: #{error.message}" if @retry.limit?
348+
#Retry if the limit hasn't been reached
349+
log.warn("failed to connect or search.", retry_times: @retry.steps, next_retry_time: @retry.next_time.round, error: error.message)
350+
sleep(@retry.next_time - Time.now)
351+
else
352+
unless @retry.nil?
353+
log.debug("retry succeeded.")
354+
@retry = nil
355+
end
356+
end
357+
end
358+
295359
def run
296360
return run_slice if @num_slices <= 1
297361

@@ -302,6 +366,9 @@ def run
302366
run_slice(slice_id)
303367
end
304368
end
369+
rescue Faraday::ConnectionFailed, OpenSearch::Transport::Transport::Error => error
370+
update_retry_state(error)
371+
retry
305372
end
306373

307374
def run_slice(slice_id=nil)
@@ -321,7 +388,15 @@ def run_slice(slice_id=nil)
321388
end
322389

323390
router.emit_stream(@tag, es)
391+
clear_scroll(scroll_id)
392+
update_retry_state
393+
end
394+
395+
def clear_scroll(scroll_id)
324396
client.clear_scroll(scroll_id: scroll_id) if scroll_id
397+
rescue => e
398+
# ignore & log any clear_scroll errors
399+
log.warn("Ignoring clear_scroll exception", message: e.message, exception: e.class)
325400
end
326401

327402
def process_scroll_request(scroll_id)

test/plugin/test_in_opensearch.rb

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ class OpenSearchInputTest < Test::Unit::TestCase
3939
CONFIG = %[
4040
tag raw.opensearch
4141
interval 2
42+
check_connection false
4243
]
4344

4445
def setup
@@ -190,6 +191,7 @@ def test_configure
190191
user john
191192
password doe
192193
tag raw.opensearch
194+
check_connection false
193195
}
194196
instance = driver(config).instance
195197

@@ -228,6 +230,7 @@ def test_single_host_params_and_defaults
228230
user john
229231
password doe
230232
tag raw.opensearch
233+
check_connection false
231234
}
232235
instance = driver(config).instance
233236

@@ -249,6 +252,7 @@ def test_single_host_params_and_defaults_with_escape_placeholders
249252
user %{j+hn}
250253
password %{d@e}
251254
tag raw.opensearch
255+
check_connection false
252256
}
253257
instance = driver(config).instance
254258

@@ -271,6 +275,7 @@ def test_legacy_hosts_list
271275
path /es/
272276
port 123
273277
tag raw.opensearch
278+
check_connection false
274279
}
275280
instance = driver(config).instance
276281

@@ -295,6 +300,7 @@ def test_hosts_list
295300
user default_user
296301
password default_password
297302
tag raw.opensearch
303+
check_connection false
298304
}
299305
instance = driver(config).instance
300306

@@ -323,6 +329,7 @@ def test_hosts_list_with_escape_placeholders
323329
user default_user
324330
password default_password
325331
tag raw.opensearch
332+
check_connection false
326333
}
327334
instance = driver(config).instance
328335

0 commit comments

Comments
 (0)