CorpusAPI

This project is an API developed in Python using FastAPI. Its main goal is to create an API capable of returning text extracted from various sources to feed the database of LLMs that use the RAG technique.

Currently, the API can:

Extract and translate subtitles from YouTube videos.
Transcribe audio files.
Extract text from web pages.

The API is designed to be efficient and modular, facilitating integrations with other systems and allowing scalability as needed.

🚀 Features

Extraction of subtitles from YouTube videos.
Transcription of audio files.
Extraction of text from web pages.

🛠 Getting Started

Follow the steps below to set up and run the API on your local machine.

📋 Prerequisites

Before starting, make sure you have the following installed:

Python 3.x
Pip (Python package manager)

📥 Installation

Clone the repository

  git clone https://github.com/daniel-trindade/corpusAPI.git
  cd corpusAPI

Create a virtual environment (optional but recommended)

  python -m venv venv

  source venv/bin/activate  # Linux/Mac
  venv\Scripts\activate  # Windows

Install dependencies

  pip install -r requirements.txt

Run the API

  fastapi dev app/main.py

The API will be running at http://127.0.0.1:8000 (or another specified port).

🧪 Docs

You can view the documentation when the API is running:

Swagger UI (automatically available with FastAPI)

🤝 Contributing

If you wish to contribute to the project, feel free to open issues or pull requests!

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
app		app
scrapers		scrapers
.gitignore		.gitignore
LICENSE		LICENSE
funny.py		funny.py
handlers.py		handlers.py
notes.txt		notes.txt
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CorpusAPI

🚀 Features

🛠 Getting Started

📋 Prerequisites

📥 Installation

🧪 Docs

🤝 Contributing

📜 License

About

Releases

Packages

Contributors 2

Languages

License

daniel-trindade/corpusAPI

Folders and files

Latest commit

History

Repository files navigation

CorpusAPI

🚀 Features

🛠 Getting Started

📋 Prerequisites

📥 Installation

🧪 Docs

🤝 Contributing

📜 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages