Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API key documentation #26

Merged
merged 1 commit into from
May 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,38 @@ And from Docker.
```bash
$ docker exec -it crawler bin/crawler crawl config/my-crawler.yml
```

### Connecting to Elasticsearch

If you set the `output_sink` value to `elasticsearch`, Crawler will attempt to bulk index crawl results into Elasticsearch.
To facilitate this connection, Crawler needs to have either an API key or a username/password configured to access the Elasticsearch instance.
If using an API key, ensure that the API key has read and write permissions to access the index configured in `output_index`.

- [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html) for managing API keys for more details
- [elasticsearch.yml.example](config/elasticsearch.yml.example) file for all of the available Elasticsearch configurations for Crawler

Here is an example of creating an API key with minimal permissions for Crawler.
This will return a JSON with an `encoded` key.
The value of `encoded` is what Crawler can use in its configuration.

```bash
POST /_security/api_key
{
"name": "my-api-key",
"role_descriptors": {
"my-crawler-role": {
"cluster": ["all"],
"indices": [
{
"names": ["my-crawler-index-name"],
"privileges": ["all"]
}
]
}
},
"metadata": {
"application": "my-crawler"
}
}

```
4 changes: 3 additions & 1 deletion config/elasticsearch.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@
#elasticsearch.port: 9200
#
#
## The API key for Elasticsearch connection.
## The encoded API key for Elasticsearch connection.
## Using `api_key` is recommended instead of `username`/`password`.
## Ensure this API key has read and write access to the configured
## `output_index` in the Crawler config
#elasticsearch.api_key: 1234
#
#
Expand Down