Stars
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A high-throughput and memory-efficient inference and serving engine for LLMs
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you hav…
jllllll / flash-attention
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention - Windows wheels
An OAI compatible exllamav2 API that's both lightweight and fast
Convert your videos to densepose and use it on MagicAnimate
Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥
ChrisHayduk / qlora-multi-gpu
Forked from artidoro/qloraQLoRA with Enhanced Multi GPU Support
Windows compile of bitsandbytes for use in text-generation-webui.
Windows Virtual Desktop, AutoHotkey, Windows 11 support, Windows Server 2022, switch desktop, move window(wintitle) to current desktop; createDesktop, PinWindow, getCount, getDesktopNumOfWindow -> …
Tools for merging pretrained large language models.
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
A fast inference library for running LLMs locally on modern consumer-class GPUs
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
serp-ai / bark-with-voice-clone
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model - With the ability to clone voices
8-bit CUDA functions for PyTorch