-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add CHANGELOG.md and upgrade to beta (#121)
- Add `CHANGELOG.md` with changes taken from `release_note` PRs - It's not automated yet, but one day - Change tech-preview to beta
- Loading branch information
1 parent
5a56cd4
commit 99e9678
Showing
2 changed files
with
22 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Open Crawler Changelog | ||
|
||
## Legend | ||
|
||
- π Feature | ||
- π Bugfix | ||
- π¨ Refactor | ||
|
||
## `v0.2.0` | ||
|
||
- π Crawl jobs can now be scheduled using the CLI command `bin/crawler schedule`. See [CLI.md](./CLI.md#crawler-schedule). | ||
- π Crawler can now extract binary content from files. See [BINARY_CONTENT_EXTRACTION.md](./features/BINARY_CONTENT_EXTRACTION.md). | ||
- π Crawler will now purge outdated documents from the index at the end of the crawl. This is enabled by default. You can disable this by adding `purge_docs_enabled: false` to the crawler's yaml config file. | ||
- π Crawl rules can now be configured, allowing specified URLs to be allowed/denied. See [CRAWL_RULES.md](./features/CRAWL_RULES.md). | ||
- π Extraction rules using CSS, XPath, and URL selectors can now be applied to crawls. See [EXTRACTION_RULES.md](./features/EXTRACTION_RULES.md). | ||
- π¨ The configuration field `content_extraction_enabled` is now `binary_content_extraction_enabled`. | ||
- π¨ The configuration field `content_extraction_mime_types` is now `binary_content_extraction_mime_types`. | ||
- π¨ The Elasticsearch document field `body_content` is now `body`. | ||
- π¨ The format for config files has changed, so existing crawler configurations will not work. The new format can be referenced in the [crawler.yml.example](../config/crawler.yml.example) file. |