[PHI][CINN] Implement int64_t version of FastDivMod #72530

lshpku · 2025-04-28T03:54:46Z

PR Category

Operator Mechanism

PR Types

Improvements

Description

实现int64_t版本的FastDivMod，修复大tensor Reduce的部分问题

核心修改：phi/kernels/funcs/fast_divmod.h 模仿int版本实现了FastDivMod<int64_t>类
注：实际上IndexType计算时使用的是uint32_t和uint64_t，但是为了和PHI原有的用法兼容，这里还是用int和int64_t作为模板参数

其他使用了FastDivMod的文件修改：

文件	原用法	新用法
datamover_primitives.h	FastDivMod	FastDivMod<int>（删掉了原来重复的实现）
pooling.cu	FastDivMod	FastDivMod<int>
stack_and_unstack.h	GeneralDivMod<IndexT>	替换成FastDivMod<IndexT>
index_calculator.h	FastDivMod	升级成FastDivMod<IndexType>
transpose_function.cu.h	FastDivMod	FastDivMod<int>
interpolate_function.h	FastDivMod	FastDivMod<int>

说明：

大部分文件还是维持原来的用法，继续使用FastDivMod<int>，因为FastDivMod<int64_t>对性能有一定影响，需要评估才知道是否有必要升级，在此之前先维持现状
index_calculator.h升级为FastDivMod<IndexType>，这是本PR主要修复的问题
stack_and_unstack.h本来已经用了GeneralDivMod<IndexT>，所以在本PR里直接替换成FastDivMod<IndexT>，GeneralDivMod是之前不支持int64_t所以用来fallback成普通除法的包装类，现在原生支持int64_t后就用不着了

修复case示例：

paddle.sum(Tensor([2, 2, (1 << 30), 2], 'float16'), axis=(1, 3))      # 原来对，现在对
paddle.sum(Tensor([2, 2, (1 << 30) + 1, 2], 'float16'), axis=(1, 3))  # 原来错，现在对

性能影响：
确实对性能有影响，以paddle.sum(Tensor([256, 1024, 192, 192], 'float16'), axis=(0, 2, 3))为例，int版本用时28ms，int64_t版本用时46ms，直接增加60%（当然主要原因是IndexCalculator实现有问题，这个需要另外的PR来修），因此Kernel在调用FastDivMod时需要谨慎评估是否需要使用int64_t的版本

Pcard-85711

paddle-bot · 2025-04-28T03:54:51Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wanghuancoder

LGTM

[PHI] Implement int64_t version of FastDivMod

bbbe8a4

lshpku force-pushed the fast-divmod-int64 branch from e4aeef6 to bbbe8a4 Compare April 28, 2025 05:45

wanghuancoder approved these changes Apr 28, 2025

View reviewed changes

zyfncg approved these changes Apr 28, 2025

View reviewed changes

lshpku merged commit 8750974 into PaddlePaddle:develop Apr 29, 2025
41 of 42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PHI][CINN] Implement int64_t version of FastDivMod #72530

[PHI][CINN] Implement int64_t version of FastDivMod #72530

lshpku commented Apr 28, 2025 •

edited

Loading

paddle-bot bot commented Apr 28, 2025

wanghuancoder left a comment

[PHI][CINN] Implement int64_t version of FastDivMod #72530

[PHI][CINN] Implement int64_t version of FastDivMod #72530

Conversation

lshpku commented Apr 28, 2025 • edited Loading

PR Category

PR Types

Description

paddle-bot bot commented Apr 28, 2025

wanghuancoder left a comment

Choose a reason for hiding this comment

lshpku commented Apr 28, 2025 •

edited

Loading