Skip to content

Releases: sebsuenkler/scraper-and-summarizer

Scraper-and-AI-powered-Summarizer v0.1

17 Apr 21:32
f044dea
Compare
Choose a tag to compare

Release v0.1 - [Initial Release]

This release brings together web scraping capabilities with AI-driven summarization to help you quickly understand the content of web pages.

🌟 Highlights / Features included in this Release:

(This section is based on your README. If this is an UPDATE, modify this list to show only what's NEW or CHANGED in vX.Y.Z compared to the previous version. If it's the first release, this list is fine.)

  • 🚀 Web Content Scraping: Utilizes Selenium with an undetectable browser configuration to fetch web page content effectively.
  • 🧠 AI-Powered Summarization: Generates concise summaries of the scraped text using AI.
    • Defaults to Microsoft's phi-4 model via the Nebius API.
    • Customizable: Easily modify summarizer.py (the get_response() method) to use other models or APIs (like different Hugging Face models, OpenAI's ChatGPT, etc.).
  • 🌐 Language Detection: Automatically detects the language of the content for accurate summarization.
  • 🔗 Robust URL Handling: Automatically encodes special characters in URLs, allowing you to scrape pages with complex addresses.
  • 🍪 Cookie Handling: Includes the "I Still Don't Care About Cookies" browser extension (from OhMyGuus/I-Still-Dont-Care-About-Cookies) to help bypass cookie consent pop-ups during scraping.
  • 💻 Simple CLI Usage: Easy-to-use command-line interface to specify the URL and optional output file.
  • 📄 Installation: Standard Python package installation process via pip install -e ..

✨ Live Demo

Try the summarizer live here: https://suenkler-ai.de/summarizer

📝 Description

The Web page scraper and AI-powered summarizer is a Python tool designed to:

  1. Scrape the main textual content from a given URL using Selenium.
  2. Process the text using a powerful AI language model (configurable, defaults to phi-4).
  3. Provide a concise summary of the original content.

It's useful for quickly grasping the key points of articles, blog posts, or other web pages without reading them in full.

🚀 Getting Started & Usage

To use the tool, run the scraper command from your terminal after installation:

# Summarize and print to console
scraper --url "[https://example.com/path](https://example.com/path) with spaces"

# Summarize and save to a file
scraper --url "[https://example.com/search?q=special+query&lang=en](https://example.com/search?q=special+query&lang=en)" --output summary.txt