Skip to content

SpaceNet2 doesn't work #2683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
calebrob6 opened this issue Mar 27, 2025 · 4 comments
Open

SpaceNet2 doesn't work #2683

calebrob6 opened this issue Mar 27, 2025 · 4 comments
Labels
datasets Geospatial or benchmark datasets

Comments

@calebrob6
Copy link
Member

Description

SpaceNet2 dataset doesn't work with real data.

Setup:

mkdir data
cd data
aws s3 cp --no-sign-request s3://spacenet-dataset/spacenet/SN2_buildings/tarballs/SN2_buildings_train_AOI_2_Vegas.tar.gz .
tar xzf SN2_buildings_train_AOI_2_Vegas.tar.gz

Steps to reproduce:

from torchgeo.datasets import SpaceNet2

ds = SpaceNet2(
    root='data/',
    split = 'train',
    aois = [2],
    image = "RGB-PanSharpen",
)

You can also see this by doing less SN2_buildings_train_AOI_2_Vegas.tar.gz which gives:

Image

This file structure is not compatible with what the SpaceNet2 dataset is looking for.

Steps to reproduce

see above

Version

0.6.2

@calebrob6
Copy link
Member Author

calebrob6 commented Mar 27, 2025

I'm pretty sure https://github.com/microsoft/torchgeo/blob/main/torchgeo/datasets/spacenet.py#L277 should just be:

product_glob = os.path.join(self.root, self.directory_glob)

@adamjstewart
Copy link
Collaborator

This is frustrating because I still have the data I downloaded 8 months ago when working on #2203 and it is compatible. Which means the data structure changed in the last 8 months, and possibly all of our SpaceNet datasets need to be changed. I really wish AWS had some kind of stable versioned checksummable download option.

Feel free to submit a PR to fix these. You probably don't need to actually download anything to check the directory structure for the other versions, you can just use aws ls.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Mar 27, 2025
@calebrob6
Copy link
Member Author

The .tar.gz files I show above match the checksums we list in the dataset. The already un-tarred data on AWS is in a different directory structure though (which seems to be what the dataset object expects). Separately, our _verify method creates directories in self.root regardless of whether you want to download the data or not or whether the dataset doesn't exists.

I'll open a PR to fix SpaceNet2, but because everything is crammed into this super class I'm not sure if it will break others.

@adamjstewart
Copy link
Collaborator

Also see #2366 for more SpaceNet woes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

2 participants