You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you crawl a page that throws a 403 unauthorized error the crawler just hangs and stays there indefinitely. It ignores all timeouts and doesn't throw any erros.
If the current behavior is a bug, please provide the steps to reproduce
If you take the current crawler and try to run from a remote server on Digital Ocean for sites that implement blocking of bots the returned 403 error does not trigger the error promise. This can be replicated with any best buy URL as an example.
What is the expected behavior?
Sites that return 403 unauthorized errors should trigger the onError function and move on to the next URL to be crawled.
What is the motivation / use case for changing the behavior?
If a site implements this type of blocking it would halt your entire crawl process without triggering any kind of notification that this URL failed.
The text was updated successfully, but these errors were encountered:
What is the current behavior?
When you crawl a page that throws a 403 unauthorized error the crawler just hangs and stays there indefinitely. It ignores all timeouts and doesn't throw any erros.
If the current behavior is a bug, please provide the steps to reproduce
If you take the current crawler and try to run from a remote server on Digital Ocean for sites that implement blocking of bots the returned 403 error does not trigger the error promise. This can be replicated with any best buy URL as an example.
What is the expected behavior?
Sites that return 403 unauthorized errors should trigger the onError function and move on to the next URL to be crawled.
What is the motivation / use case for changing the behavior?
If a site implements this type of blocking it would halt your entire crawl process without triggering any kind of notification that this URL failed.
The text was updated successfully, but these errors were encountered: