Complex number handling in Torch-TensorRT #3456

apbose · 2025-04-01T06:46:32Z

apbose
Apr 1, 2025
Collaborator

Complex number handling in Torch-TensorRT

TL;DR

This RFC proposes the addition of complex number support in Torch-TensorRT. TensorRT does not support complex numbers, but with the use of rotary embeddings in positional embeddings, complex numbers play an important role on how these embeddings are applied.

Goal

To support the multi-GPU example of Llama 3 model running end to end

Use case

Through this feature we intend to demonstrate the end to end forward pass of torchTRT compiled llama3 distributed model in multi GPU. Below illustrates how complex numbers are inputs to the llama3 model

freq_cis = precompute_freqs_cis(dim: int, end: int, theta: float = 10000.0)
xq, xk = apply_rotary_emb(xq, xk, freqs_cis=freqs_cis)

def precompute_freqs_cis(dim: int, end: int, theta: float = 10000.0) -> torch.Tensor:
    freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))
    t = torch.arange(end, device=freqs.device)
    freqs = torch.outer(t, freqs).float()
    return torch.polar(torch.ones_like(freqs), freqs)  # complex64

def apply_rotary_emb(
    xq: torch.Tensor,
    xk: torch.Tensor,
    freqs_cis: torch.Tensor,
) -> Tuple[torch.Tensor, torch.Tensor]:
   
    xq_ = torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2))
    xk_ = torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2))
    freqs_cis = reshape_for_broadcast(freqs_cis, xq_)
    xq_out = torch.view_as_real(xq_ * freqs_cis).flatten(3)
    xk_out = torch.view_as_real(xk_ * freqs_cis).flatten(3)
    return xq_out.type_as(xq), xk_out.type_as(xk)

The query and key vectors are viewed as complex, while the freq vectors are computed in the polar form with complex frequency.

The reason we encounter this only for distributed examples is because when we compile the model using
torch.compile(distributed_model, backend = torch_tensorrt)
The distributed tensors are hoisted to inputs when model is wrapped with aot_autograd leading to complex inputs to torchTRT compiled graph.
Ref- pytorch/pytorch#136289

Implementation Stages

Complex unpacking

Convert the complex numbers into a tuple of real and imaginary parts. Complex number denoted by x+iy, should be provided as input in the form of (x,y)
This involves modifying the meta data shape and data type of the complex nodes. Also the subsequent operations with these complex numbers as input

Numeric truncation

In the above complex64 should be unpacked to a tuple of float32. Similarly complex128 should be unpacked to a tuple of float32. For which the truncate_flag has to be used

Function signature modification

Identify the boundary of the operations affected by the complex inputs. Below is an example of how it looks like in llama3 model for the rotary embedding operation
eg:

input1 -> complex freq vector which is reshaped via slice operation
input2 -> torch tensor viewed as complex via view_as_complex op by slicing the last dimension by 2, with last dimension now 2
complex output = complex_mul(input1, input2)
final_output = view_as_real(complex_output)

The signature of these complex operations needs to be modified so that there are no graph breaks, and it handles the complex unpacking also

Unification of pre_lowering and post_lowering pass for distributed and non distributed

The pre_lowering and post_lowering needs to uniform across both distributed and non distributed cases.

Diagram

In the above there has to be additional handling in the torch TRT runtime. All the above will be called via an API in the post lowering passes.

API changes

Detection stage

find_complex_op_subgraphs(gm: torch.fx.GraphModule) in torch_tensorrt/dynamo/lowering/passes/pass_utils.py

def find_complex_op_subgraphs(gm: torch.fx.GraphModule):
    #find complex op subgraphs by the datatype of metadata of nodes
    #return (complex_op_subgraphs)

find_complex_indices(torch_inputs) in torch_tensorrt/dynamo/utils.py

def find_complex_indices(torch_inputs):
    #find complex indices from the torch compile inputs
    #return indices

Decomposition stage

reshape_complex_placeholder_nodes(indices) in torch_tensorrt/dynamo/lowering/passes/reshape_complex_placeholder_nodes'

def reshape_complex_placeholder_nodes(indices, flag, terminating_op):
    #for each index in indices:
          change the shape and metadata of the arg_{index} node
          flag will determine if the node has to be changed from complex128->float32, else error out
    #in the complex_op_subgraphs identified in 1, modify the meta data of the downstream operations

Graph Rewrite stage

complex_graph_rewrite(complex_op_subgraphs) in torch_tensorrt/dynamo/lowering/passes/complex_graph_rewrite

def complex_graph_rewrite(complex_op_subgraphs):
    #for each operation in complex_op_subgraphs:
          change_operation_signature(complex_operation)
    #the above can either be done inplace or subgraph copy can be modified and stitched back to main FX graph

All the above needs to be called sequentially in the torch_tensorrt/dynamo/backend/backends.py

Further to be explored are the changes in the runtimes in _PythonTorchTensorRTModule.py and _TorchTensorRTModule.py since we are modifying the inputs

narendasan · 2025-04-15T21:32:43Z

narendasan
Apr 15, 2025
Collaborator

@apbose is there a way to create a hard subset of complex that we can support easily and grow from there?

1 reply

apbose Apr 15, 2025
Collaborator Author

For complex operations of llama3, slice, reshape and torch.mul() between the two complex inputs are the operations. The find_complex_boundary() above should return these ops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex number handling in Torch-TensorRT #3456

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Complex number handling in Torch-TensorRT #3456

apbose Apr 1, 2025 Collaborator

Complex number handling in Torch-TensorRT

TL;DR

Goal

Use case

Implementation Stages

Diagram

API changes

Detection stage

Decomposition stage

Graph Rewrite stage

Replies: 1 comment · 1 reply

narendasan Apr 15, 2025 Collaborator

apbose Apr 15, 2025 Collaborator Author

apbose
Apr 1, 2025
Collaborator

Replies: 1 comment 1 reply

narendasan
Apr 15, 2025
Collaborator

apbose Apr 15, 2025
Collaborator Author