Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Camoufox-based crawler template #2842

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Feb 12, 2025

Adds a Camoufox-based crawler template (camoufox-ts).

Compared to the basic playwright-ts template, camoufox-ts uses the camoufox-js package, which finds the correct latest Camoufox binary in GitHub Releases assets, downloads it and passes the correct launch options to it.

The main.ts script is modified to run the downloaded binary with the correct launchOptions.
Related to #2836

@barjin barjin self-assigned this Feb 12, 2025
@github-actions github-actions bot added this to the 108th sprint - Tooling team milestone Feb 12, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 12, 2025
@barjin
Copy link
Contributor Author

barjin commented Feb 12, 2025

Todo:

  • automatize running npm run download-camoufox (maybe put it as postinstall?)
  • pass custom fingerprint-modifying options to Camoufox
  • maybe store binaries in a system- (or user-)wide location (~/.crawlee/binaries?)

@barjin barjin added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 12, 2025
@barjin
Copy link
Contributor Author

barjin commented Feb 14, 2025

Example code:

import { launchOptions } from 'camoufox-js';
import { PlaywrightCrawler } from 'crawlee';
import { firefox } from 'playwright';

const startUrls = ['https://crawlee.dev'];

const crawler = new PlaywrightCrawler({
    requestHandler: async ({ page, enqueueLinks }) => {
        await page.click('h2');
        await page.click('h3');

        await enqueueLinks();
    },
    maxConcurrency: 1,
    launchContext: {
        launcher: firefox,
        launchOptions: await launchOptions({
            headless: false,
            block_images: true,
            fonts: ['Times New Roman'],
            custom_fonts_only: true,
            humanize: true,
        }),
    },
});

await crawler.run(startUrls);

Execution:

Peek.2025-02-14.16-35.mp4

As set, the browser loads no images, uses only one system-installed font (aside from the ones loaded from the page directly) and uses the humanizing script to move the cursor.

@barjin barjin requested a review from B4nan February 14, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant