Amharic N-Gram Language Model for Auto-Complete

This project implements an N-Gram language model for the Amharic language using only Python and NumPy, designed to provide auto-completion functionality. The model employs a simple N-Gram probabilistic approach to predict and suggest the most probable next words based on input sequences.

Features

N-Gram Based Predictions: Supports unigram, bigram, trigram and n-gram models to generate context-aware suggestions.
Amharic Language Support: Handles the structure and highly morphological nature of Amharic text.
Tokenization: Includes an Amharic-specific tokenizer to handle words and punctuation correctly.
Smoothing Techniques: Implements smoothing methods (e.g., Laplace smoothing) to address the issue of zero probabilities.
Scalable Design: Can be trained on large datasets for improved accuracy.

Installation

Clone this repository:

git clone https://github.com/yordanoswuletaw/amharic-ngram-autocomplete.git

Navigate to the project directory:
```
cd amharic-ngram-autocomplete
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Notebooks

Amharic Auto Complete

Repository Structure

├── .vscode/
│   └── settings.json            # VS Code settings for environment setup
├── .github/
│   └── workflows/
│       ├── unittests.yml        # CI/CD pipeline for unit tests
├── .gitignore                   # Ignored files and folders
├── requirements.txt             # Dependencies for the project
├── README.md                    # Documentation of the repository
├── data/                        # Dataset for training, dev and testing
├── src/                         # Source code for analysis and processing
├── notebooks/
│   ├── __init__.py              # Package initialization
│   └── README.md                # Documentation for the notebooks
├── tests/
│   ├── __init__.py              # Test initialization
└── scripts/
    ├── __init__.py              # Scripts package initialization
    └── README.md                # Documentation for scripts

Requirements

Python 3.8+
Required Python libraries (see requirements.txt)

Examples

Input: አበበ በሶ
Suggestions: በላ

Future Work

Integrate neural language models for amharic langauge tokenization.
Expand support for additional Amharic linguistic features and dialects.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue to suggest improvements or report bugs.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Special thanks to:

The Amharic NLP community for providing open-source datasets.

amharic-ngram-autocomplete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Amharic N-Gram Language Model for Auto-Complete

Features

Installation

Notebooks

Repository Structure

Requirements

Examples

Future Work

Contributing

License

Acknowledgments

amharic-ngram-autocomplete

Files

README.md

Latest commit

History

README.md

File metadata and controls

Amharic N-Gram Language Model for Auto-Complete

Features

Installation

Notebooks

Repository Structure

Requirements

Examples

Future Work

Contributing

License

Acknowledgments

amharic-ngram-autocomplete