-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Refa: Optimize pptx shape extraction to reduce content loss #6703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@zhudongwork Thanks for your issue. Could you please upload a pptx file with the problems. And we can quickly locate the bug. |
|
Thank you for submitting the code. You have resolved the error issues we had before, but the effect of the version you generated in the specific text parsing is not as good as our original code. We hope you can further modify your code.
to
|
@zhudongwork @KevinHuSh
|
…ow#6703) ### What problem does this PR solve? When parsing pptx files, some shapes do not contain the `shape_type` attribute, which causes the original code to throw an exception during extraction, leading to failure in content extraction. This optimization introduces handling logic for such anomalous shapes, providing a safer and more robust processing mechanism. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [x] Performance Improvement - [ ] Other (please describe):
What problem does this PR solve?
When parsing pptx files, some shapes do not contain the
shape_type
attribute, which causes the original code to throw an exception during extraction, leading to failure in content extraction. This optimization introduces handling logic for such anomalous shapes, providing a safer and more robust processing mechanism.Type of change