Chat with Screen is a Python application that allows users to capture screenshots, analyze captured screenshots using AI vision models. It supports both KoboldCPP and Transformers backends for image analysis (the default Transformers-model is Molmo-7B-O-0924). Users can engage in a chat-like interaction based on the analysis. The application features a translucent overlay interface for easy interaction with the underlying desktop.
ChatWithScreen-Demo.mp4
- Capture full screen or selected region screenshots
- Analyze screenshots using KoboldCPP server
- Chat-like interface for interaction with AI based on screen content
- Translucent overlay for seamless desktop integration
- Resizable overlay window
- Right-click context menu for quick actions
- Chat history memory option for contextual conversations
- Drag and Drop for images
- Python 3.8+
- PyQt5
- Pillow
- requests
- KoboldCPP server (for gguf-models)
- transformers (for Transformers-models)
- torch (for Transformers-models)
You can install Chat with Screen using either conda or pip. Choose the method that best suits your environment.
-
Clone the repository:
git clone https://github.com/PasiKoodaa/Chat-with-Screen cd chat-with-screen
-
Create a new conda environment:
conda create -n chat-with-screen python==3.9
-
Activate the environment:
conda activate chat-with-screen
-
Install the required packages:
pip install -r requirements.txt
-
Install torch if you use Transformers-models
https://pytorch.org/get-started/locally/
-
Clone the repository:
git clone https://github.com/PasiKoodaa/Chat-with-Screen cd chat-with-screen
-
Create a new virtual environment:
python -m venv venv
-
Activate the environment:
- On Windows:
venv\Scripts\activate
- On Linux:
source venv/bin/activate
- On Windows:
-
Install the required packages:
pip install -r requirements.txt
-
Install torch if you use Transformers-models
https://pytorch.org/get-started/locally/
-
Ensure that your KoboldCPP server is running and accessible at the URL specified in the
KOBOLDCPP_URL
variable in the script. -
Run the application:
python main.py
-
Use the interface to capture screenshots, analyze them, and chat with the AI about the screen content.
-
Toggle the chat history memory on or off using the "Memory" button. When enabled, the AI will consider the last four question-answer pairs for context.
-
Right-click on the overlay to access additional options such as selecting a region or resizing the overlay.
-
Click the 'X' button in the top-right corner or use the right-click menu to close the application.
You can modify the KOBOLDCPP_URL
variable in the script to point to your KoboldCPP server if it's running on a different address or port.
The chat history memory feature allows the AI to maintain context across multiple interactions. When enabled:
- The last four question-answer pairs are stored in memory.
- These pairs are included in the context for new questions, allowing for more coherent and contextual conversations.
- You can toggle this feature on or off at any time using the "Memory" button in the interface.
- When disabled, the chat history is cleared, and each question is treated independently.
This feature is particularly useful for in-depth discussions or multi-step analyses of screen content.