[PHI][CINN] Implement int64_t version of FastDivMod #72530
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Improvements
Description
实现int64_t版本的FastDivMod,修复大tensor Reduce的部分问题
核心修改:
phi/kernels/funcs/fast_divmod.h
模仿int版本实现了FastDivMod<int64_t>类注:实际上IndexType计算时使用的是uint32_t和uint64_t,但是为了和PHI原有的用法兼容,这里还是用int和int64_t作为模板参数
其他使用了FastDivMod的文件修改:
说明:
index_calculator.h
升级为FastDivMod<IndexType>,这是本PR主要修复的问题stack_and_unstack.h
本来已经用了GeneralDivMod<IndexT>,所以在本PR里直接替换成FastDivMod<IndexT>,GeneralDivMod是之前不支持int64_t所以用来fallback成普通除法的包装类,现在原生支持int64_t后就用不着了修复case示例:
性能影响:
确实对性能有影响,以
paddle.sum(Tensor([256, 1024, 192, 192], 'float16'), axis=(0, 2, 3))
为例,int版本用时28ms,int64_t版本用时46ms,直接增加60%(当然主要原因是IndexCalculator实现有问题,这个需要另外的PR来修),因此Kernel在调用FastDivMod时需要谨慎评估是否需要使用int64_t的版本Pcard-85711