-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
# Backport This will backport the following commits from `main` to `0.2`: - [Add CHANGELOG.md and upgrade to beta (#121)](#121) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)
- Loading branch information
1 parent
3c0132d
commit c72e5d5
Showing
2 changed files
with
22 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Open Crawler Changelog | ||
|
||
## Legend | ||
|
||
- π Feature | ||
- π Bugfix | ||
- π¨ Refactor | ||
|
||
## `v0.2.0` | ||
|
||
- π Crawl jobs can now be scheduled using the CLI command `bin/crawler schedule`. See [CLI.md](./CLI.md#crawler-schedule). | ||
- π Crawler can now extract binary content from files. See [BINARY_CONTENT_EXTRACTION.md](./features/BINARY_CONTENT_EXTRACTION.md). | ||
- π Crawler will now purge outdated documents from the index at the end of the crawl. This is enabled by default. You can disable this by adding `purge_docs_enabled: false` to the crawler's yaml config file. | ||
- π Crawl rules can now be configured, allowing specified URLs to be allowed/denied. See [CRAWL_RULES.md](./features/CRAWL_RULES.md). | ||
- π Extraction rules using CSS, XPath, and URL selectors can now be applied to crawls. See [EXTRACTION_RULES.md](./features/EXTRACTION_RULES.md). | ||
- π¨ The configuration field `content_extraction_enabled` is now `binary_content_extraction_enabled`. | ||
- π¨ The configuration field `content_extraction_mime_types` is now `binary_content_extraction_mime_types`. | ||
- π¨ The Elasticsearch document field `body_content` is now `body`. | ||
- π¨ The format for config files has changed, so existing crawler configurations will not work. The new format can be referenced in the [crawler.yml.example](../config/crawler.yml.example) file. |