Best Approach to Assigning User Agent and Proxy Based on Data List in Crawlee? #2801
Unanswered
ferdysopian
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working on a scraping task where I have a list of workers, each with associated cookies, user agent, and proxy. I want each worker to scrape up to 5 URLs.
Current Problem:
I can currently set the user agent and proxy globally, but I need to dynamically assign the appropriate user agent and proxy to each worker based on their specific configuration. Rather than defining RequestQueue, ProxyConfiguration, Configuration, and PlaywrightCrawler separately for each worker, I would like to define them once and then retrieve the specific information (such as user agent and proxy) from request.userData for each request. How can I achieve this?
My Current Approach:
I'm looping through the list of workers, assigning 5 URLs per worker. Here’s my current example code:
Question:
Is there a better or more efficient way to assign user agents and proxies for each worker dynamically, instead of setting them globally or using the current worker task-loop approach?
Maybe something like this:
Is there a better approach to implement dynamic handling of proxies and user agents based on workers or requests? I'm open to suggestions!
Beta Was this translation helpful? Give feedback.
All reactions