Skip to content

Commit

Permalink
Merge branch 'main' into navarone/add-faux-gem-license-headers
Browse files Browse the repository at this point in the history
  • Loading branch information
navarone-feekery authored May 24, 2024
2 parents 7078ee3 + fc0f854 commit 1ca60ea
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 1 deletion.
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,38 @@ And from Docker.
```bash
$ docker exec -it crawler bin/crawler crawl config/my-crawler.yml
```

### Connecting to Elasticsearch

If you set the `output_sink` value to `elasticsearch`, Crawler will attempt to bulk index crawl results into Elasticsearch.
To facilitate this connection, Crawler needs to have either an API key or a username/password configured to access the Elasticsearch instance.
If using an API key, ensure that the API key has read and write permissions to access the index configured in `output_index`.

- [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html) for managing API keys for more details
- [elasticsearch.yml.example](config/elasticsearch.yml.example) file for all of the available Elasticsearch configurations for Crawler

Here is an example of creating an API key with minimal permissions for Crawler.
This will return a JSON with an `encoded` key.
The value of `encoded` is what Crawler can use in its configuration.

```bash
POST /_security/api_key
{
"name": "my-api-key",
"role_descriptors": {
"my-crawler-role": {
"cluster": ["all"],
"indices": [
{
"names": ["my-crawler-index-name"],
"privileges": ["all"]
}
]
}
},
"metadata": {
"application": "my-crawler"
}
}
```
4 changes: 3 additions & 1 deletion config/elasticsearch.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@
#elasticsearch.port: 9200
#
#
## The API key for Elasticsearch connection.
## The encoded API key for Elasticsearch connection.
## Using `api_key` is recommended instead of `username`/`password`.
## Ensure this API key has read and write access to the configured
## `output_index` in the Crawler config
#elasticsearch.api_key: 1234
#
#
Expand Down

0 comments on commit 1ca60ea

Please sign in to comment.