Skip to content

Latest commit

 

History

History
58 lines (48 loc) · 2.16 KB

README.md

File metadata and controls

58 lines (48 loc) · 2.16 KB

DuomeScraper

This project scrapes the Italian words (& any other languages available on Duolingo) from the Duome website (https://duome.eu/vocabulary/en/it) using Playwright, then it downloads phonetics from GoogleTextToSpeech (gTTS) & creates Anki flashcards.

DuomeScraperVideo.mp4

🔰

How to run
  1. Download main.py & requirements.txt and put them inside a folder
  2. Create a virtual environment:
    python -m venv VEnv
        
  3. Activate virtual environment:
    • 🪟 Windows CMD:
      VEnv\Scripts\activate
              
    • 🐧 Linux:
      source VEnv/bin/activate
              
  4. Install dependencies:
    pip install -r requirements.txt
        
  5. Install playwright (⚠️ code uses Microsoft Edge browser, you can change that to chromium if you don't want to download msedge):
    playwright install && playwright install msedge
        
  6. Read the code, you may need to personalize some variables, then run the main.py & wait to get the final .apkg file
  7. Open Anki application...
    On Android: From top-right, click on and select Import ➡️ Deck package (.apkg)
    On Desktop: File ➡️ Import... ➡️ Choose .apkg file

⚠️ Known (possible) issues
  • If all word elements didn't load all at once, we should scroll down to retrieve all the words. However, this feature has not been implemented yet, as the website displays all words at once (all necessary elements are visible after load).
  • Some languages, like German, don't have definitions. When accessing the definition element, an exception may be raised.