Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(autoware_lidar_bevfusion): implementation of bevusion using tensorrt #10024

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

knzo25
Copy link
Contributor

@knzo25 knzo25 commented Jan 27, 2025

Description

This PR introduces BEVFusion to autoware using TensorRT.
I would like to ask reviewers to let the "integration" into the pipeline/launchers to a posterior PR 🙏

Related links

Parent Issue:

  • Link

How was this PR tested?

Notes for reviewers

The onnx files can be found here: TIER IV INTERNAL LINK. The models will be uploaded to a public link as the last part of the review (we are currently facing issues about the best way to distribute them without affecting CI/CD and image sizes...)

Since this package introduces early fusion, it can not directly be integrated into autoware (the lidar-only model can). Such integration should be relegated to the next PR to avoid increasing unnecessarily the number of stakeholders on this PR.

To test the PR, I recommend using the taxi project (will omit the launch command), and launch bevfusion separatedly.

ros2 launch autoware_lidar_bevfusion lidar_bevfusion.launch.xml

For now, the models must be placed in the config folder, and to change the modality (default is camera-lidar), the `yaml' file can be modified. This is the yaml file parameters needed for the lidar-only model.

/**:
  ros__parameters:
    # modality
    sensor_fusion: false
    # non-network params
    max_camera_lidar_delay: 0.12
    # plugins
    plugins_path: $(find-pkg-share autoware_lidar_bevfusion)/plugins/libautoware_tensorrt_plugins.so
    # network
    trt_precision: fp16
    cloud_capacity: 2000000
    onnx_path: "$(var model_path)/bevfusion_lidar_v2.onnx"
    engine_path: "$(var model_path)/bevfusion_lidar_v2.engine"
    # pre-process params
    densification_num_past_frames: 0
    densification_world_frame_id: map
    # post-process params
    circle_nms_dist_threshold: 0.5
    iou_nms_target_class_names: ["CAR"]
    iou_nms_search_distance_2d: 10.0
    iou_nms_threshold: 0.1
    yaw_norm_thresholds: [0.3, 0.3, 0.3, 0.3, 0.0] # refers to the class_names
    score_threshold: 0.1

Interface changes

None.

Effects on system behavior

None.

…ion_ros2

Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
@knzo25 knzo25 requested a review from scepter914 January 27, 2025 05:14
@knzo25 knzo25 self-assigned this Jan 27, 2025
@github-actions github-actions bot added component:perception Advanced sensor data processing and environment understanding. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) labels Jan 27, 2025
Copy link

github-actions bot commented Jan 27, 2025

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@xmfcx
Copy link
Contributor

xmfcx commented Jan 27, 2025

@knzo25
Copy link
Contributor Author

knzo25 commented Jan 27, 2025

@xmfcx
Although both use viewtransform, they are different papers and modalities
https://arxiv.org/pdf/2112.11790
https://arxiv.org/pdf/2205.13542

@knzo25
Copy link
Contributor Author

knzo25 commented Feb 21, 2025

As TensorRT was upgraded and spconv was added (autowarefoundation/autoware#5794), I will be opening this PR 🎉

@knzo25 knzo25 marked this pull request as ready for review February 21, 2025 05:01
@knzo25 knzo25 added the run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) label Feb 21, 2025
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
@knzo25 knzo25 requested a review from kminoda as a code owner February 25, 2025 01:26
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
@knzo25 knzo25 removed the request for review from kminoda February 25, 2025 01:28
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
@github-actions github-actions bot added the type:documentation Creating or refining documentation. (auto-assigned) label Feb 25, 2025
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
Signed-off-by: Kenzo Lobos-Tsunekawa <kenzo.lobos@tier4.jp>
Copy link

codecov bot commented Feb 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 26.26%. Comparing base (c3134c2) to head (fb1ac42).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10024      +/-   ##
==========================================
+ Coverage   26.24%   26.26%   +0.02%     
==========================================
  Files        1378     1378              
  Lines      107445   107468      +23     
  Branches    41428    41433       +5     
==========================================
+ Hits        28194    28222      +28     
+ Misses      76433    76425       -8     
- Partials     2818     2821       +3     
Flag Coverage Δ *Carryforward flag
differential 3.32% <ø> (?)
differential-cuda 2.23% <ø> (?)
total 26.26% <ø> (+0.02%) ⬆️ Carriedforward from e00b2af

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@knzo25
Copy link
Contributor Author

knzo25 commented Mar 11, 2025

@amadeuszsz
As discussed internally, I pasted the links and how to execute the model. Please let me know if you need anything 🙏

@freejumperd
Copy link

@knzo25 thank you for this great work. If I may ask a quick question. In terms of bevfusion inference through trt, what's the main difference between the version under your development:https://github.com/knzo25/autoware.universe/tree/feat/bevfusion/perception/autoware_lidar_bevfusion
And the official NVDA AI IOT one:
https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-BEVFusion/src/bevfusion
And I assume there is no model modification between autoware version and MIT original one? Please suggest. Thank you!

@knzo25
Copy link
Contributor Author

knzo25 commented Mar 16, 2025

@freejumperd

The main difference with NVIDA AI IOT's implementation is that they use a closed source shared library for inference (sparse convolutions), even if it is a public binary. Another PR with that implementation may be sent later by another group of contributors as far as I understand. A big difference in terms of dvelopment and deployment of models, is that this PR handles models generated by our ml stack (https://github.com/tier4/AWML/tree/main/projects/BEVFusion) directly.

With respect to the original implementation, those questions may be better suited for our ml stack rather than the inference node here. But in a few words, the lidar model more or less remains the same, although we have evaluated bigger models for offline purposes (we have also tweaked some minor things that have increased performance). The camera-lidar model was too focused on nuscenes, so there have been a few other developments as well, but nothing so big.

@freejumperd
Copy link

@knzo25 thanks for active response knzo. So in short, just like the centerpoint inference node already published through autoware universe, the actualy inference cpp and cu code has been modifed and optimised by you and rest of community ? The nvda one may be suitable for exact original model trained on nuscense dataset but certainly not as efficient as for autoware retrained centerpoint or bevfusion should we say? And when would you expect this bevfusion trt inference + ros2 node to be available open source ?

@knzo25
Copy link
Contributor Author

knzo25 commented Mar 16, 2025

@freejumperd
More information about the nvidia's implementation vs. this approach was presented in an a previous issue (do not have the link now though). I would separate nvidia's inference code with the original MIT implementation though. I just meant that the original config files and some of the processing in the camera-lidar pipeline do not work well with our vehicle's config.

As for when this will be available open source? In a way, it already is, since you can use this branch under the Apache license. As for when will this be merged, it is really up to the reviewers. If you want to play a part in that, we really would appreciate it!

@freejumperd
Copy link

@knzo25 thanks for explaining. I am certainly interested and plan to go through the recently published AWML pipeline and then running inference test to check FPS etc.
One last thing if I may ask here (more relavent on model side then trt). I was playing with original MIT model+weights but with our own dataset and modified data process pipeline, without touching the model architecture (different number for cameras, different intrinsic +extrinsic 6DOF, different model of top lidar etc) the zero shot performance was extremely bad and almost like no detection. So I wonder for the AWML version trained with T4+nuscense dataset (e.g. both with same amount of 6 cameras + 1 top lidar), if again I test with our own dataset/pipeline (e.g. 7 cameras, 2 lidars etc) and using the available pth (without touching the model), would you expect the AWML bevfusion model zoo would provide good generallizaiton ability? Or if e.g. the sensor suites are different (number of camera/lidar, 6DOF etc), retrain the model with own dataset would be a must ? If so then the only best way if I want to use the open pth, is to make your vehicle sensor suites as close as possible to nuscense / T4 taxi ?

@knzo25
Copy link
Contributor Author

knzo25 commented Mar 16, 2025

@freejumperd
(Current) sensor fusion models, especially early fusion ones, are extremely dependent on the vehicle configuration. That is especially true for BEVFusion since it unprojects points from the camera to the BEV space. That being said, in my experience, for this method, the image features do not an impact so strong that there are no detections.
If I had to say, I would suspect an error on your pipeline (beware of intensity profiles and so on). Further support would require seeing the data and formal consulting hours (If our institutions have an agreement, we can continue this conversation on other channels).
Regarding whether retraining is a must, it really depends how you handle the data pipeline to reduce the mismatches (to some degree this can be done), but since we are not really training a foundational model here, I would always recommend fine tuning or retraining.

@freejumperd
Copy link

@knzo25 thanks again, what's been discussed are really helpful! Let me leverage with AWML pipeline first to revise our own data pipeline. Will come back to you later for any potential collaboration on the perception topics.

@knzo25
Copy link
Contributor Author

knzo25 commented Mar 16, 2025

@freejumperd
Ahh, I just forgot something about the open source's repository implementation of BEVFusion. While the repository was finally open-sourced, the final pipeline for the camera-lidar model and export logic have not been updated (was developing in a branch on the private repo). I should send a PR soon-ish.

@freejumperd
Copy link

@knzo25 amazing! You guys really have done some great work for promoting such advanced early fusion architecture for open source community. Really looking forward to learn more out of it. And also looking for any coming good work leveraging with VLMs 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) type:documentation Creating or refining documentation. (auto-assigned)
Projects
Status: To Triage
Development

Successfully merging this pull request may close these issues.

4 participants