From c72e5d5f4484e72733adaa6cc85d50617fdf0e08 Mon Sep 17 00:00:00 2001
From: Navarone Feekery <13634519+navarone-feekery@users.noreply.github.com>
Date: Tue, 3 Sep 2024 16:27:19 +0200
Subject: [PATCH] [0.2] Add CHANGELOG.md and upgrade to beta (#121) (#125)

# Backport

This will backport the following commits from `main` to `0.2`:
- [Add CHANGELOG.md and upgrade to beta
(#121)](https://github.com/elastic/crawler/pull/121)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)
---
 README.md         |  6 +++---
 docs/CHANGELOG.md | 19 +++++++++++++++++++
 2 files changed, 22 insertions(+), 3 deletions(-)
 create mode 100644 docs/CHANGELOG.md

diff --git a/README.md b/README.md
index eb2294f..db5760e 100644
--- a/README.md
+++ b/README.md
@@ -4,11 +4,11 @@ This repository contains code for the Elastic Open Web Crawler.
 Open Crawler enables users to easily ingest web content into Elasticsearch.
 
 > [!IMPORTANT]
-> _The Open Crawler is currently in **tech-preview**_.
-Tech-preview features are subject to change and are not covered by the support SLA of generally available (GA) features.
+> _The Open Crawler is currently in **beta**_.
+Beta features are subject to change and are not covered by the support SLA of generally available (GA) features.
 Elastic plans to promote this feature to GA in a future release.
 
-_Open Crawler `v0.1` is confirmed to be compatible with Elasticsearch `v8.13.0` and above._
+_Open Crawler `v0.2` is confirmed to be compatible with Elasticsearch `v8.13.0` and above._
 
 ### User workflow
 
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
new file mode 100644
index 0000000..9b3e085
--- /dev/null
+++ b/docs/CHANGELOG.md
@@ -0,0 +1,19 @@
+# Open Crawler Changelog
+
+## Legend
+
+- 🚀 Feature
+- 🐛 Bugfix
+- 🔨 Refactor
+
+## `v0.2.0`
+
+- 🚀 Crawl jobs can now be scheduled using the CLI command `bin/crawler schedule`. See [CLI.md](./CLI.md#crawler-schedule).
+- 🚀 Crawler can now extract binary content from files. See [BINARY_CONTENT_EXTRACTION.md](./features/BINARY_CONTENT_EXTRACTION.md).
+- 🚀 Crawler will now purge outdated documents from the index at the end of the crawl. This is enabled by default. You can disable this by adding `purge_docs_enabled: false` to the crawler's yaml config file.
+- 🚀 Crawl rules can now be configured, allowing specified URLs to be allowed/denied. See [CRAWL_RULES.md](./features/CRAWL_RULES.md).
+- 🚀 Extraction rules using CSS, XPath, and URL selectors can now be applied to crawls. See [EXTRACTION_RULES.md](./features/EXTRACTION_RULES.md).
+- 🔨 The configuration field `content_extraction_enabled` is now `binary_content_extraction_enabled`.
+- 🔨 The configuration field `content_extraction_mime_types` is now `binary_content_extraction_mime_types`.
+- 🔨 The Elasticsearch document field `body_content` is now `body`.
+- 🔨 The format for config files has changed, so existing crawler configurations will not work. The new format can be referenced in the [crawler.yml.example](../config/crawler.yml.example) file.