Releases: sebsuenkler/scraper-and-summarizer
Releases · sebsuenkler/scraper-and-summarizer
Scraper-and-AI-powered-Summarizer v0.1
Release v0.1 - [Initial Release]
This release brings together web scraping capabilities with AI-driven summarization to help you quickly understand the content of web pages.
🌟 Highlights / Features included in this Release:
(This section is based on your README. If this is an UPDATE, modify this list to show only what's NEW or CHANGED in vX.Y.Z compared to the previous version. If it's the first release, this list is fine.)
- 🚀 Web Content Scraping: Utilizes Selenium with an undetectable browser configuration to fetch web page content effectively.
- 🧠 AI-Powered Summarization: Generates concise summaries of the scraped text using AI.
- Defaults to Microsoft's phi-4 model via the Nebius API.
- Customizable: Easily modify
summarizer.py
(theget_response()
method) to use other models or APIs (like different Hugging Face models, OpenAI's ChatGPT, etc.).
- 🌐 Language Detection: Automatically detects the language of the content for accurate summarization.
- 🔗 Robust URL Handling: Automatically encodes special characters in URLs, allowing you to scrape pages with complex addresses.
- 🍪 Cookie Handling: Includes the "I Still Don't Care About Cookies" browser extension (from OhMyGuus/I-Still-Dont-Care-About-Cookies) to help bypass cookie consent pop-ups during scraping.
- 💻 Simple CLI Usage: Easy-to-use command-line interface to specify the URL and optional output file.
- 📄 Installation: Standard Python package installation process via
pip install -e .
.
✨ Live Demo
Try the summarizer live here: https://suenkler-ai.de/summarizer
📝 Description
The Web page scraper and AI-powered summarizer is a Python tool designed to:
- Scrape the main textual content from a given URL using Selenium.
- Process the text using a powerful AI language model (configurable, defaults to
phi-4
). - Provide a concise summary of the original content.
It's useful for quickly grasping the key points of articles, blog posts, or other web pages without reading them in full.
🚀 Getting Started & Usage
To use the tool, run the scraper
command from your terminal after installation:
# Summarize and print to console
scraper --url "[https://example.com/path](https://example.com/path) with spaces"
# Summarize and save to a file
scraper --url "[https://example.com/search?q=special+query&lang=en](https://example.com/search?q=special+query&lang=en)" --output summary.txt