ragman

Retrieval augmented generation for manual pages

Todo

Embeddings are simply every 512 tokens. Ideally CLS tokens should be generated for individual statements with the maximum available context.
Actually load everything into a Milvus collection.
Search functionality
Permit using alternate models
PyTorch (or alternative) interface for alternative models (for either preprocessing or embedding itself).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
tools/evaluation		tools/evaluation
.gitignore		.gitignore
.python-version		.python-version
.vimspector.json		.vimspector.json
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test.py		test.py
textprocessor.py		textprocessor.py