[0.2] Add CHANGELOG.md and upgrade to beta (#121) (#125)

# Backport This will backport the following commits from `main` to `0.2`: - [Add CHANGELOG.md and upgrade to beta (#121)](#121)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)
elastic · Sep 3, 2024 · c72e5d5 · c72e5d5
1 parent 3c0132d
commit c72e5d5
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -4,11 +4,11 @@ This repository contains code for the Elastic Open Web Crawler.
 Open Crawler enables users to easily ingest web content into Elasticsearch.
 
 > [!IMPORTANT]
-> _The Open Crawler is currently in **tech-preview**_.
-Tech-preview features are subject to change and are not covered by the support SLA of generally available (GA) features.
+> _The Open Crawler is currently in **beta**_.
+Beta features are subject to change and are not covered by the support SLA of generally available (GA) features.
 Elastic plans to promote this feature to GA in a future release.
 
-_Open Crawler `v0.1` is confirmed to be compatible with Elasticsearch `v8.13.0` and above._
+_Open Crawler `v0.2` is confirmed to be compatible with Elasticsearch `v8.13.0` and above._
 
 ### User workflow
 

diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -0,0 +1,19 @@
+# Open Crawler Changelog
+
+## Legend
+
+- 🚀 Feature
+- 🐛 Bugfix
+- 🔨 Refactor
+
+## `v0.2.0`
+
+- 🚀 Crawl jobs can now be scheduled using the CLI command `bin/crawler schedule`. See [CLI.md](./CLI.md#crawler-schedule).
+- 🚀 Crawler can now extract binary content from files. See [BINARY_CONTENT_EXTRACTION.md](./features/BINARY_CONTENT_EXTRACTION.md).
+- 🚀 Crawler will now purge outdated documents from the index at the end of the crawl. This is enabled by default. You can disable this by adding `purge_docs_enabled: false` to the crawler's yaml config file.
+- 🚀 Crawl rules can now be configured, allowing specified URLs to be allowed/denied. See [CRAWL_RULES.md](./features/CRAWL_RULES.md).
+- 🚀 Extraction rules using CSS, XPath, and URL selectors can now be applied to crawls. See [EXTRACTION_RULES.md](./features/EXTRACTION_RULES.md).
+- 🔨 The configuration field `content_extraction_enabled` is now `binary_content_extraction_enabled`.
+- 🔨 The configuration field `content_extraction_mime_types` is now `binary_content_extraction_mime_types`.
+- 🔨 The Elasticsearch document field `body_content` is now `body`.
+- 🔨 The format for config files has changed, so existing crawler configurations will not work. The new format can be referenced in the [crawler.yml.example](../config/crawler.yml.example) file.