Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CLI instructions and remove old CLI files #36

Merged
merged 4 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions bin/console

This file was deleted.

101 changes: 0 additions & 101 deletions bin/crawl

This file was deleted.

14 changes: 7 additions & 7 deletions docs/CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
Configuration files live in the [config](../config) directory.
There are two kinds of configuration files:

1. Crawler configurations (provided in CLI with `--crawler-config`)
2. Elasticsearch configurations (provided in CLI with `--es-config`)
1. Crawler configurations (provided as a positional argument)
2. Elasticsearch configurations (provided as an optional argument with `--es-config`)

There are two configuration files to allow crawl jobs to share Elasticsearch instance configuration.
There two configuration file arguments allow crawl jobs to share Elasticsearch instance configuration.
There are no enforced pathing or naming for these files.
They are differentiated only by how they are provided to the CLI when running a crawl.

Expand All @@ -16,7 +16,7 @@ Crawler configuration files are required for all crawl jobs.
If `elasticsearch` is the output sink, the elasticsearch instance configuration can also be included in a crawler configuration file.
If the elasticsearch configuration is provided this way, it will override any configuration provided in an elasticsearch configuration file.

These are provided in the CLI as an argument for the option `--crawl-config`.
These are provided in the CLI as a positional argument, e.g. `bin/crawler crawl path/to/my-crawler.yml`.

## Elasticsearch configuration files

Expand All @@ -27,7 +27,7 @@ This configuration is also optional.
All of the configuration in this file can be provided in a crawler configuration file as well.
The crawler config is loaded after the Elasticsearch config, so any Elasticsearch settings in the crawler config will take priority.

These are provided in the CLI as an argument for the option `--es-config`.
These are provided in the CLI as a named argument for the option `--es-config`, e.g. `bin/crawler crawl path/to/my-crawler.yml --es-config=/path/to/elasticsearch.yml`

## Configuration files in Docker

Expand All @@ -46,13 +46,13 @@ The order of the opts is not important.
When performing a crawl with only a crawl config:

```shell
$ bin/crawl --crawl-config config/my-crawler.yml
$ bin/crawler crawl config/my-crawler.yml
```

When performing a crawl with only both a crawl config and an Elasticsearch config:

```shell
$ bin/crawl --crawl-config config/my-crawler.yml --es-config config/elasticsearch.yml
$ bin/crawler crawl config/my-crawler.yml --es-config config/elasticsearch.yml
```

## Example configurations
Expand Down