-
Notifications
You must be signed in to change notification settings - Fork 0
๐ Advanced Guide to GPUs and Parallel Computing
Amir M. Parvizi edited this page Nov 19, 2024
·
1 revision
Feature | CPU | GPU | TPU | FPGA |
---|---|---|---|---|
Purpose | General | Graphics/Parallel | AI/ML | Configurable |
Clock Speed | โก High | ๐ธ Medium | ๐ธ Medium | ๐ธ Medium |
Cores | ๐ธ Few | โก Many | โก Many | ๐ Variable |
Cache | โก High | ๐ธ Low | ๐ธ Medium | ๐ธ Low |
Latency | โก Low | ๐ธ High | ๐ธ Medium | โก Very Low |
Throughput | ๐ธ Low | โก High | โก High | โก Very High |
Power Usage | ๐ธ Medium | ๐ธ High | ๐ธ Medium |
From Gaming to AI Revolution
graph LR
A[1990s] --> B[GeForce]
B --> C[CUDA]
C --> D[Tesla]
D --> E[Modern GPUs]
graph TD
A[Parallel Processing] --> B[Matrix Operations]
B --> C[High Throughput]
C --> D[Faster Training]
A --> E[Multiple Cores]
E --> F[Concurrent Execution]
sequenceDiagram
participant CPU
participant GPU
CPU->>CPU: Allocate Memory
CPU->>GPU: Copy Data
GPU->>GPU: Execute Kernel
GPU->>CPU: Return Results
-
Kernel
: GPU-specific functions -
Thread/Block/Grid
: Execution hierarchy -
GEMM
: Matrix multiplication operations -
Host/Device
: CPU/GPU terminology
graph TD
A[Global Memory]
A1[Shared Memory]
A2[L2 Cache]
A1a[Registers]
A1b[L1 Cache]
B[Host Memory]
A --> A1
A --> A2
A1 --> A1a
A1 --> A1b
B --> A