From cb5e9894b8e2bbb2a2e949aeaa436ea6caf3b33b Mon Sep 17 00:00:00 2001 From: Navarone Feekery <13634519+navarone-feekery@users.noreply.github.com> Date: Fri, 21 Feb 2025 15:15:32 +0100 Subject: [PATCH] Update FEATURE_COMPARISON.md (#215) Full HTML extraction is available in latest --- docs/FEATURE_COMPARISON.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/FEATURE_COMPARISON.md b/docs/FEATURE_COMPARISON.md index b0cd4ee..24267be 100644 --- a/docs/FEATURE_COMPARISON.md +++ b/docs/FEATURE_COMPARISON.md @@ -19,7 +19,7 @@ The following table compares the features of Open Crawler to those of Elastic Cr | Crawler directives — `robots.txt`, sitemaps, robots meta tags, canonical URLs, nofollow links | [Yes](./features/CRAWLER_DIRECTIVES.md) | [Yes](https://www.elastic.co/guide/en/enterprise-search/current/crawler-content.html) | | Scheduling | [Yes](../README.md#scheduling-recurring-crawl-jobs) | [Yes](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-schedule) | | Extraction using data attributes and meta tags | No, _planned for `v0.3`_ | [Yes](https://www.elastic.co/guide/en/enterprise-search/current/crawler-content.html#crawler-content-meta-tags-content-extraction) | -| Full HTML extraction | No, _planned for `v0.3`_ | [Yes](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-html-storagedocuments) | +| Full HTML extraction | Yes | [Yes](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-html-storagedocuments) | | Event logging in Elasticsearch | No, _planned for `v0.3`_ | [Yes](https://www.elastic.co/guide/en/enterprise-search/current/crawler-view-events-logs.html) | | Duplicate content handling | No | [Yes](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-duplicate-documents) | | Crawl result history and metadata | No | Yes |