Clean up config docs

elastic · Aug 29, 2024 · 1f6fd28 · 1f6fd28
1 parent 0c1dbbf
commit 1f6fd28
Show file tree

Hide file tree

Showing 4 changed files with 31 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -72,8 +72,6 @@ If using an API key, ensure that the API key has read and write permissions to a
   ```
 </details>
 
-
-
 #### Running Open Crawler from Docker
 
 > [!IMPORTANT]
@@ -142,7 +140,23 @@ Open Crawler has a Dockerfile that can be built and run locally.
 
 ### Configuring Crawlers
 
-See [CONFIG.md](docs/CONFIG.md) for in-depth details on Open Crawler configuration files.
+Crawler has template configuration files that contain every configuration available.
+
+- [config/crawler.yml.example](config/crawler.yml.example)
+- [config/elasticsearch.yml.example](config/elasticsearch.yml.example)
+
+To use these files, make a copy in the same directory without the `.example` suffix:
+
+```bash
+$ cp config/crawler.yml.example config/crawler.yml
+```
+
+Then remove the `#` comment-out characters from the configurations that you need.
+
+Crawler can be configured using two config files, a Crawler configuration and an Elasticsearch configuration.
+The Elasticsearch configuration file is optional.
+It exists to allow users with multiple crawlers to only need a single Elasticsearch configuration.
+See [CONFIG.md](docs/CONFIG.md) for more details on these files.
 
 ### Scheduling Recurring Crawl Jobs
 

diff --git a/config/crawler.yml.example b/config/crawler.yml.example
@@ -79,13 +79,14 @@
 #
 ## Authentication configurations.
 ##     Only required if a site has some form of authentication.
-#auth.domain: https://parksaustralia.gov.au
+#auth.domain: https://my-auth-domain.com
 #auth.type: basic
 #auth.username: user
 #auth.password: pass
 #
 ## Whether document metadata from certain content types will be indexed or not.
 ##     This does not allow binary content to be indexed from these files, only metadata.
+##     See docs/features/BINARY_CONTENT_EXTRACTION.md for more details.
 #binary_content_extraction_enabled: true
 #binary_content_extraction_mime_types:
 #  - application/pdf
@@ -96,7 +97,7 @@
 #
 ## ------------------------------- Logging -------------------------------------
 #
-# The log level for system logs. Defaults to `info`
+## The log level for system logs. Defaults to `info`
 #log_level: info
 #
 # Whether or not event logging is enabled for output to the shell running Crawler.

diff --git a/config/examples/parks-australia.yml b/config/examples/parks-australia.yml
@@ -1,4 +1,9 @@
 # This is a sample config file for crawling the parksaustralia.gov.au website writing output to an ES index
+#
+# The configuration options in this example are not exhaustive. To see all possible configuration options,
+# reference the config templates:
+# - config/crawler.yml.example
+# - config/elasticsearch.yml.example
 
 # Domains allowed for the crawl
 domains:

diff --git a/docs/CONFIG.md b/docs/CONFIG.md
@@ -3,10 +3,12 @@
 Configuration files live in the [config](../config) directory.
 There are two kinds of configuration files:
 
-1. Crawler configurations (provided as a positional argument)
-2. Elasticsearch configurations (provided as an optional argument with `--es-config`)
+- Crawler configuration - [config/crawler.yml.example](../config/crawler.yml.example)
+- Elasticsearch configuration - [config/elasticsearch.yml.example](../config/elasticsearch.yml.example)
+
+The Elasticsearch configuration file is optional.
+It exists to allow users with multiple crawlers to only need a single Elasticsearch configuration.
 
-There two configuration file arguments allow crawl jobs to share Elasticsearch instance configuration.
 There are no enforced pathing or naming for these files.
 They are differentiated only by how they are provided to the CLI when running a crawl.
 
@@ -49,7 +51,7 @@ When performing a crawl with only a crawl config:
 $ bin/crawler crawl config/my-crawler.yml
 ```
 
-When performing a crawl with only both a crawl config and an Elasticsearch config:
+When performing a crawl with both a crawl config and an Elasticsearch config:
 
 ```shell
 $ bin/crawler crawl config/my-crawler.yml --es-config config/elasticsearch.yml