Cheat sheet

CUDA cheat sheet

Function type qualifiers

__device__ : executed on the device. Callable from the device only.

__global__ : executed on the device. Callable from the host or from the device for devices of compute capability 3.x or higher. Must have void return type.

__host__ : executed on the host. Callable from the host only (equivalent to declaring the function without any qualifiers).

Built-in kernel variables

gridDim.[x,y,z] : 3 dimensional vector containing the dimensions of the grid. This is a constant that is set at kernel launch time. If not set explicitly each dimension defaults to 1.

blockIdx.[x,y,z] : 3 dimensional vector containing the block index within the grid. This is a dynamic value that depends on which block calls it.

blockDim.[x,y,z] : 3 dimensional vector containing the dimensions of the thread block. This is set at kernel launch time. If not set explicitly each dimension defaults to 1.

threadIdx.[x,y,z] : 3 dimensional vector specifying the thread index within the thread block. Dynamic value depending on which thread calls it.

Important Functions

Kernel Launch

void Kernel_name<<< gridsize, blocksize >>>(arg1,arg2,…);

Memory Management

cudaMalloc( void **devPtr, size_t size ); : allocate memory

cudaFree( void *devPtr ); : free memory

cudaMemcpy( void *dst, const void *src, size_t size, enum cudaMemcpyKind kind ); : copies data between host and device.

kind is an enum that can be :

cudaMemcpyHostToDevice
cudaMemcpyDeviceToHost

Error checking

cudaGetLastError(void); : returns the last error from a runtime call.

char* cudaGetErrorString( cudaError_t code ); : returns the description string for an error code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly