A General-purpose Task-parallel Programming System using Modern C++
-
Updated
Feb 13, 2025 - C++
A General-purpose Task-parallel Programming System using Modern C++
Sample codes for my CUDA programming book
CUDA Core Compute Libraries
Thin, unified, C++-flavored wrappers for the CUDA APIs
TinyChatEngine: On-Device LLM Inference Library
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Safe rust wrapper around CUDA toolkit
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
A simple GPU hash table implemented in CUDA using lock free techniques
A self-learning tutorail for CUDA High Performance Programing.
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
From zero to hero CUDA for accelerating maths and machine learning on GPU.
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
An implementation of HIP that works on CPUs, across OSes.
CUDA kernel author's tools
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
Speed up image preprocess with cuda when handle image or tensorrt inference
Install CUDA on Windows11 using WSL2
Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.
To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."