Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifically report where Cloudflare block our requests #859

Merged
merged 2 commits into from
Jul 18, 2024

Conversation

brucebolt
Copy link
Member

Some linked websites use Cloudflare Browser Integrity Check to stop automated robots from accessing their site.

This works by responding with a 403 status code and a payload that contains some JavaScript. The JavaScript carries out some checks before reloading the page with a 200 status code.

Our crawler is unable to interact with the JavaScript, so Cloudflare block our access to the site.

In order to stop users thinking these links are actually broken, this change reports a different error when we are blocked by Cloudflare.

Trello card

@brucebolt brucebolt force-pushed the improve-cloudflare-reports branch from 49edb18 to ce0e271 Compare July 17, 2024 11:17
In the `has errors` shared test example, we were passing in a string to
check against the errors but not actually checking it.

This meant you could add any string and the test would always pass,
provided an error existed.

Therefore updating the shared example to actually accept a parameter,
check the error is returned and update the messages to reflect what the
class is actually returning to users.
@brucebolt brucebolt force-pushed the improve-cloudflare-reports branch from ce0e271 to ef5892b Compare July 17, 2024 11:18
@brucebolt brucebolt marked this pull request as ready for review July 17, 2024 12:07
Some linked websites use [Cloudflare Browser Integrity
Check](https://developers.cloudflare.com/waf/tools/browser-integrity-check/)
to stop automated robots from accessing their site.

This works by responding with a 403 status code and a payload that
contains some JavaScript. The JavaScript carries out some checks before
reloading the page with a 200 status code.

Our crawler is unable to interact with the JavaScript, so Cloudflare
block our access to the site.

In order to stop users thinking these links are actually broken, this
change reports a different warning (not an error) when we are blocked by
Cloudflare.
@brucebolt brucebolt force-pushed the improve-cloudflare-reports branch from ef5892b to 93073c1 Compare July 17, 2024 12:58
Copy link
Contributor

@ChrisBAshton ChrisBAshton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⭐

@brucebolt brucebolt merged commit cae2622 into main Jul 18, 2024
9 checks passed
@brucebolt brucebolt deleted the improve-cloudflare-reports branch July 18, 2024 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants