WebCrawler

What does WebCrawler do?

Scrape the web according to:

The url to start scraping from.
The maximum depth to crawl down to from the start url.
The max number of pages for the entire scrape job.

(Stop crawling a job when it reaches maxDepth or maxPages, whichever comes first.)

Every scraped page has:

title - The document.title of the page.
depth - Current depth being scraped.
url - The URL that was scraped.
links - All hrefs in the anchor tags in the page.

Architecture:

New scrape job flow

Quick Start:

run git clone https://github.com/PerachBD/WebCrawler.git
run npm i && npm start

Tecnolegis:

NodeJS
React
Express
Web Storage

Main packages:

Socket.IO - enables real-time bidirectional event-based communication.
Lowdb - Small JSON database for Node, Electron and the browser. Powered by Lodash.
node-html-parser - Fast HTML Parser is a very fast HTML parser. Which will generate a simplified DOM tree, with basic element query support.

For future expansion:

Save running time for overlapping scrape jobs.
Calculating the number of "workers", dynamically depending on the loads and the number of scrape jobs to be performed.
Add option to delete, pause and continue scrape job

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.vscode		.vscode
Documentation		Documentation
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebCrawler

What does WebCrawler do?

Scrape the web according to:

Every scraped page has:

Architecture:

New scrape job flow

Quick Start:

Tecnolegis:

Main packages:

For future expansion:

Snapshots:

About

Releases

Packages

Languages

aitAlmeida/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

WebCrawler

What does WebCrawler do?

Scrape the web according to:

Every scraped page has:

Architecture:

New scrape job flow

Quick Start:

Tecnolegis:

Main packages:

For future expansion:

Snapshots:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages