Ava is a custom-designed Transformer-based Causal Language Model (CLM) developed with full control over architecture, configuration, and training. It is intended for experimentation, educational use, and lightweight deployments, especially in resource-constrained environments.
The goal of Ava is to enable:
- Full-stack development of a language model from scratch
- Deep understanding of transformer internals
- Flexible fine-tuning on local or personal datasets
- Easy customization and modular expansion
Ava is suitable for researchers, developers, and hobbyists interested in building LLMs without the constraints of large-scale frameworks.
- Transformer Decoder Stack: Implements multi-head attention, feedforward networks, and residual connections.
- Rotary Positional Embeddings: Enhances context handling without fixed positional encodings.
- Flexible Configurations: Supports model sizes from 100M to 100B parameters via
AvaConfig
. - LoRA (Low-Rank Adaptation): Enables parameter-efficient fine-tuning.
- Quantization Support: Reduces model size for low-memory inference.
- Custom Dataset Handling: Optimized for conversational and pretraining datasets.
Ava includes a complete training workflow:
- Data Preparation: Conversational data is loaded from JSON and tokenized.
- Model Configuration: Users choose a predefined size (
100m
,1b
,7b
, etc.) or customize one. - Training Loop: Modular trainer with validation, checkpointing, and optional evaluation.
- Evaluation & Generation: Supports text generation with temperature, top-k, and top-p sampling.
- Tokenizer: Plug in your own tokenizer (e.g., BPE for custom languages).
- LoRA Fine-Tuning: Target specific layers to update with LoRA.
- Model Quantization: Use 8-bit weights for faster inference on CPUs.
- Streaming Support: Integrate with streamers for interactive generation.
- Local AI assistants
- Chatbots for under-resourced languages
- Educational demos and research
- Small-scale AGI experiments
- Edge and offline deployments
To train Ava, users need:
- PyTorch environment with GPU (or CPU for smaller models)
- JSON-formatted dataset
- Pretrained tokenizer (or train your own)
- Training script using the provided trainer module
The model outputs checkpoints and can be resumed or evaluated at any point.
Ava is intended for educational and personal use. Contributions are welcome to enhance architecture, training stability, and downstream applications.
- Vaswani et al. (2017). Attention is All You Need.
- Hu et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models.
- Press et al. (2021). Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.
If you find Ava-LLM valuable and want to support its growth, you can donate USDT directly!
USDT (TRC20) Wallet Address:
TGNZUGsTb5PLCPVyN9Tc22QrFa1M69ZDJ5
Every donation, no matter how small, helps keep Ava independent and evolving.
Thank you for believing in open, independent AI development! 🚀
Created and maintained by Nika Kudukhashvili. Ava represents an ongoing project in building fully independent and explainable language models.
For questions or contributions, visit: Github