ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

isaaccorley · 2025-04-23T00:32:13Z

One of the wonderful surprises of torchvision's detector models is that a GeneralizedRCNNTransform gets added under the hood which defaults to ImageNet RGB mean/std normalize + dynamic resizing in the range of (800, 1333).

This PR fixes this by loading pretrained weights but overriding this transform to simply subtract 0 and divide by 1 which is a no-op and changes the dynamic resize to allow for a min/max input shape in the range of (1, 4096).

Alternatives considered:

I attempted to simply replace model.transform with nn.Identity() but this doesn't work because the detection models pass multiple args to the transform which will throw an error.

Fixes #2749

Copilot

Pull Request Overview

This PR fixes multispectral support in the ObjectDetectionTask by overriding the default transform parameters in the detector models. The changes update the initialization of FasterRCNN, FCOS, and RetinaNet with custom parameters (min_size, max_size, image_mean, and image_std) that enable multispectral inputs, and a new test is added to validate this functionality.

Updated transform parameters for multispectral support in three detection model constructors.
Added a new test in tests/trainers/test_detection.py to check multispectral behavior.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
torchgeo/trainers/detection.py	Updated model constructors to override transform parameters for multispectral data.
tests/trainers/test_detection.py	Added a test case to validate multispectral support with a non-RGB input channel.

adamjstewart · 2025-04-23T07:08:31Z

Can you also check instance segmentation?

robmarkcole · 2025-04-23T09:01:50Z

FYI I confirmed no issues with OD on my 4 channel dataset

robmarkcole · 2025-04-23T09:28:22Z

OK, whilst the error is resolved, the loss for OD models I train is always zero. Is there a validated results I can reproduce? Note this might just be my datasets which I recently updated for the new format

isaaccorley · 2025-04-24T02:47:49Z

OK, whilst the error is resolved, the loss for OD models I train is always zero. Is there a validated results I can reproduce? Note this might just be my datasets which I recently updated for the new format

This by default was resizing all imagery to a min of 800. What transforms are you using to preprocess your imagery?

isaaccorley · 2025-04-24T02:56:20Z

@adamjstewart fixed the instance segmentation task. It had the same issue.

robmarkcole · 2025-04-24T08:22:14Z

@isaaccorley can you elaborate on This by default was resizing all imagery to a min of 800?
Example transforms:

        self.train_aug = Ka.AugmentationSequential(    
                Ka.Normalize(mean=self.mean, std=self.std),
                Ka.Resize(self.chip_size),
                Ka.RandomHorizontalFlip(p=0.5),
                Ka.RandomVerticalFlip(p=0.5),
                Ka.RandomRotation(degrees=(90.0, 90.0), p=0.25),
                data_keys=None,
                keepdim=True,
        )

        test_transforms: List[Ka.AugmentationBase2D] = [
            Ka.Normalize(mean=self.mean, std=self.std),
            Ka.Resize(self.chip_size),
        ]

where typically chip_size = 224

isaaccorley · 2025-04-24T11:46:21Z

@isaaccorley can you elaborate on This by default was resizing all imagery to a min of 800?

Example transforms:


        self.train_aug = Ka.AugmentationSequential(    

                Ka.Normalize(mean=self.mean, std=self.std),

                Ka.Resize(self.chip_size),

                Ka.RandomHorizontalFlip(p=0.5),

                Ka.RandomVerticalFlip(p=0.5),

                Ka.RandomRotation(degrees=(90.0, 90.0), p=0.25),

                data_keys=None,

                keepdim=True,

        )



        test_transforms: List[Ka.AugmentationBase2D] = [

            Ka.Normalize(mean=self.mean, std=self.std),

            Ka.Resize(self.chip_size),

        ]

where typically chip_size = 224

Torchvision Faster-RCNN and MaskRCNN has a GeneralizedRCNNTransform transform module inside the model itself that will perform normalize + resizing to a minimum image size of 800. So any image you pass in <800 will be resized to 800. See the code here.

One trick that works well for object detection in remote sensing is to simply resize your small patches to be larger. This may be why you're getting poor performance.

robmarkcole · 2025-04-24T11:54:44Z

@isaaccorley good to know! Perhaps we should document this?

isaaccorley · 2025-04-24T12:00:14Z

@isaaccorley good to know! Perhaps we should document this?

This PR basically removes this transform, so a user can decide which normalize and resize Kornia transform they want to do themselves.

robmarkcole · 2025-04-24T17:25:19Z

I've ruled out issues with my dataset and the remaining differences I see beween my legacy implementation and this implementation are details such as the anchor sizes I've utilised. I suggest we merge this approach and then as a follow up (and pending a suitable test dataset) work on further optimisations in another PR

adamjstewart

LGTM, thanks for the fix!

adamjstewart · 2025-04-30T17:22:41Z

Except for the tests...

isaaccorley requested review from adamjstewart, ashnair1 and Copilot April 23, 2025 00:32

isaaccorley self-assigned this Apr 23, 2025

Copilot AI reviewed Apr 23, 2025

View reviewed changes

github-actions bot added testing Continuous integration testing trainers PyTorch Lightning trainers labels Apr 23, 2025

adamjstewart changed the title ~~Fix Multispectral Support in ObjectDetectionTask~~ ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images Apr 24, 2025

adamjstewart added this to the 0.7.1 milestone Apr 28, 2025

adamjstewart approved these changes Apr 30, 2025

View reviewed changes

isaaccorley added 3 commits May 1, 2025 13:07

fix transform bug in detection trainer when using channels other than 3

8736597

add multispectral support to instance segmentation

0b0ffb9

add grayscale tests to detection

e4c6832

isaaccorley force-pushed the trainers/multispectral-object-detection branch from 7c10564 to e4c6832 Compare May 1, 2025 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

isaaccorley commented Apr 23, 2025 •

edited

Loading

Copilot AI left a comment

adamjstewart commented Apr 23, 2025

robmarkcole commented Apr 23, 2025

robmarkcole commented Apr 23, 2025 •

edited

Loading

isaaccorley commented Apr 24, 2025

isaaccorley commented Apr 24, 2025 •

edited

Loading

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

adamjstewart left a comment

adamjstewart commented Apr 30, 2025

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

Are you sure you want to change the base?

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

Conversation

isaaccorley commented Apr 23, 2025 • edited Loading

Alternatives considered:

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

adamjstewart commented Apr 23, 2025

robmarkcole commented Apr 23, 2025

robmarkcole commented Apr 23, 2025 • edited Loading

isaaccorley commented Apr 24, 2025

isaaccorley commented Apr 24, 2025 • edited Loading

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart commented Apr 30, 2025

isaaccorley commented Apr 23, 2025 •

edited

Loading

robmarkcole commented Apr 23, 2025 •

edited

Loading

isaaccorley commented Apr 24, 2025 •

edited

Loading