Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.2] Adding ES verification step + explicit best-effort index creation during ES Sink initialization (#192) #207

Merged
merged 2 commits into from
Feb 6, 2025

Conversation

github-actions[bot]
Copy link

@github-actions github-actions bot commented Feb 6, 2025

Backports the following commits to 0.2:

…ing ES Sink initialization (#192)

### Closes #53 and
#172

This is a continuation of issue #53 but also closes #172.
This PR will add the following steps during the initialization of the ES
Sink:
- A verification step that checks if crawler can reach the Elasticsearch
instance provided in configs
- An explicit attempt to create the output_index should the index ping
fail (index ping step was added in #186 )

Thus, the flow during init will be like:
`verify ES connection`--> `if all good, verify the output_index `--> `if
index does not exist, attempt to create the index` --> `if index
creation fails, system exit`

Additional background:
While working on this, I discovered that _technically speaking_, the
_bulk command that Crawler uses to upsert documents is capable of
auto-creating the index if it doesn't exist. However, this is dependent
on the user having `auto_configure`, `create_index`, or `manage` index
privileges.

Therefore, while we may not _need_ an explicit index creation attempt,
it is good to have because we can then explicitly log that it happened,
and also provide a safe point to fail out at should something go wrong
vs. waiting for _bulk to be called, at which point a crawl would have
already begun.

### Checklists

#### Pre-Review Checklist
- [x] This PR does NOT contain credentials of any kind, such as API keys
or username/passwords (double check `crawler.yml.example` and
`elasticsearch.yml.example`)
- [x] This PR has a meaningful title
- [x] This PR links to all relevant GitHub issues that it fixes or
partially addresses
- If there is no GitHub issue, please create it. Each PR should have a
link to an issue
- [x] this PR has a thorough description
- [x] Covered the changes with automated tests
- [x] Tested the changes locally
- [x] Added a label for each target release version (example: `v0.1.0`)
- [x] Considered corresponding documentation changes

### Related Pull Requests
#186
@navarone-feekery navarone-feekery merged commit ff8b56d into 0.2 Feb 6, 2025
2 checks passed
@navarone-feekery navarone-feekery deleted the backport/0.2/pr-192 branch February 6, 2025 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants