Skip to content

Commit

Permalink
Merge branch 'main' into navarone/validate-cli-bugfix
Browse files Browse the repository at this point in the history
  • Loading branch information
navarone-feekery authored May 29, 2024
2 parents 1b01392 + 8a3b7a0 commit 0d593e4
Show file tree
Hide file tree
Showing 30 changed files with 196 additions and 1 deletion.
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ items that may help during the review.-->
- [ ] Added a label for each target release version (example: `v0.1.0`)
- [ ] Considered corresponding documentation changes
- [ ] Contributed any configuration settings changes to the configuration reference
- [ ] Ran `make notice` if any dependencies have been added

#### Changes Requiring Extra Attention

Expand Down
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,38 @@ And from Docker.
```bash
$ docker exec -it crawler bin/crawler crawl config/my-crawler.yml
```

### Connecting to Elasticsearch

If you set the `output_sink` value to `elasticsearch`, Crawler will attempt to bulk index crawl results into Elasticsearch.
To facilitate this connection, Crawler needs to have either an API key or a username/password configured to access the Elasticsearch instance.
If using an API key, ensure that the API key has read and write permissions to access the index configured in `output_index`.

- [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html) for managing API keys for more details
- [elasticsearch.yml.example](config/elasticsearch.yml.example) file for all of the available Elasticsearch configurations for Crawler

Here is an example of creating an API key with minimal permissions for Crawler.
This will return a JSON with an `encoded` key.
The value of `encoded` is what Crawler can use in its configuration.

```bash
POST /_security/api_key
{
"name": "my-api-key",
"role_descriptors": {
"my-crawler-role": {
"cluster": ["all"],
"indices": [
{
"names": ["my-crawler-index-name"],
"privileges": ["all"]
}
]
}
},
"metadata": {
"application": "my-crawler"
}
}
```
4 changes: 3 additions & 1 deletion config/elasticsearch.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@
#elasticsearch.port: 9200
#
#
## The API key for Elasticsearch connection.
## The encoded API key for Elasticsearch connection.
## Using `api_key` is recommended instead of `username`/`password`.
## Ensure this API key has read and write access to the configured
## `output_index` in the Crawler config
#elasticsearch.api_key: 1234
#
#
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'active_support'
require 'json'
require 'rack/mount'
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/element/atom_feed.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

module Faux
module Element
class AtomFeed < Base
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/element/base.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

module Faux
module Element
class Base
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/element/fixture.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

module Faux
module Element
class Fixture < Base
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/element/page.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

module Faux
module Element
class Page < Base
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/element/path_with_content_length.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'active_support/core_ext/numeric'

module Faux
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/element/robots.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

module Faux
module Element
class Robots < Base
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/element/sitemap.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'stringio'
require 'nokogiri'
require 'zlib'
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/helpers/url.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

module Faux
module Helpers
module Url
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/lib/faux/middleware/reporter.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

module Faux
module Middleware

Expand Down
5 changes: 5 additions & 0 deletions vendor/faux/lib/faux/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#
module Faux
VERSION = '0.1.0'
end
6 changes: 6 additions & 0 deletions vendor/faux/lib/site.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

# frozen_string_literal: true

require 'rack'
Expand Down
5 changes: 5 additions & 0 deletions vendor/faux/sites/fixture_site.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#
class FixtureSite < Faux::Base
fixture '/' do
path 'spec/fixtures/simple.html'
Expand Down
5 changes: 5 additions & 0 deletions vendor/faux/sites/robots_txt_respect_rules.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#
class RobotsTxtRespectRules < Faux::Base
page '/' do
body do
Expand Down
5 changes: 5 additions & 0 deletions vendor/faux/sites/simple_site.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#
class SimpleSite < Faux::Base
page '/' do
head { atom_to '/feed' }
Expand Down
5 changes: 5 additions & 0 deletions vendor/faux/sites/sitemap_pointing_to_sitemaps.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#
class SitemapPointingToSitemaps < Faux::Base
robots do
user_agent '*'
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/element/atom_feed_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Element::AtomFeed do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/element/base_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Element::Base do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/element/fixture_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Element::Fixture do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/element/page_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Element::Page do
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Element::PathWithContentLength do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/element/robots_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Element::Robots do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/element/sitemap_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Element::Sitemap do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/middleware/reporter_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Middleware::Reporter do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux/site_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Site do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/faux_spec.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'spec_helper'

describe Faux::Base do
Expand Down
6 changes: 6 additions & 0 deletions vendor/faux/spec/spec_helper.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the MIT License;
# see LICENSE file in the project root for details
#

require 'bundler/setup'
require 'rspec'
require 'rack/test'
Expand Down

0 comments on commit 0d593e4

Please sign in to comment.