Organization of the datasets #2

vunh · 2021-07-07T14:14:16Z

Could you please clarify how to organize the dataset together with the metadata files (e.g "train_full_list.txt", "val_full_list.txt", "unlabel_list.txt",...)?

greatwallet · 2021-07-10T13:28:51Z

According to supplementary file, CelebAMask-HQ dataset is divided as unlabeled set (28K frames) and mask-annotated-set (2K frames), where the latter is also divided into 1.5K for training and 500 frames as test-set.

They also conducted ablation study on the number of annotated training images (30, 150, 1500).

However, I think there are some details about dataset split that need to be explained.

Are the split of unlabeled/labeled and train/test sampled randomly, uniformly or sequentially?
As mentioned by how to split train/test switchablenorms/CelebAMask-HQ#19 (comment), there have been definitions of train/val/test split in original CelebAMask-HQ. So, do you split your dataset based on that?

lidaiqing · 2021-07-13T14:32:47Z

thank you for pointing it out @greatwallet ! I will update the data split file and instructions on preparing datasets soon, for your reference, the data split is 0-27999 as unlabeled data, 28000-29499 as training, 29500-29999 as testing. Hope it does not block your research!

mfredriksz · 2021-07-13T21:12:22Z

Has anyone been able to find the metadata files themselves ("train_full_list.txt", "val_full_list.txt", "unlabel_list.txt"...etc)?

I can't find them anywhere in the repo or in the CelebAMask directory I downloaded, and thus cannot run the scripts that require them.

kartheekmedathati · 2021-07-19T16:55:46Z

Has any faced this error message?
"ImportError: cannot import name 'CelebAMaskDataset' from 'dataloader'"

mfredriksz · 2021-07-19T17:51:02Z

@kartheekmedathati I encountered that error as well and had to change the import statement to:
from dataloader.dataset import CelebAMaskDataset

tommy-qichang · 2021-11-17T23:01:52Z

thank you for pointing it out @greatwallet ! I will update the data split file and instructions on preparing datasets soon, for your reference, the data split is 0-27999 as unlabeled data, 28000-29499 as training, 29500-29999 as testing. Hope it does not block your research!

hi @lidaiqing I just wondering if you are able to upload the splitting file? Also, if we are using a different dataset, what's the format of the splitting file as such we could run the code? Thanks.

AbdouMechraoui · 2021-12-09T15:14:54Z

Was anyone able to figure out the organization of the datasets? From what I gathered from the code, it seems like:

|- CelebA-HQ
  |- unlabel_data 
     |- unlabel_list.txt (list of images basename 0..27999) 
     |- image/
  |- label_data
     |- train_full_list.txt (list of images/labels basename 28000..29499)
     |- val_full_list.txt 
     |- test_list.txt (list of images/labels basename 29500..29999)
     |- image/
     |- label/

I'm getting FileNotFoundError, when running prepare_inception.py. For some reason, the loader appends label to the unlabel_data path => /CelebAMask-HQ/CelebACropped/unlabel_data/label/0_r.jpg'.

lidaiqing · 2021-12-09T15:31:43Z

Hi @AbdouMechraoui , for preparing the inception, the path should point to the image folder, it calculate stats over 50K images and store it in --output path you defined. Please let me know if you have further questions.

AbdouMechraoui · 2021-12-09T15:43:45Z

Hi @AbdouMechraoui , for preparing the inception, the path should point to the image folder, it calculate stats over 50K images and store it in --output path you defined. Please let me know if you have further questions.

Indeed, I get that error after feeding the image folder. This's what I run CUDA_VISIBLE_DEVICES=3, python prepare_inception.py --output /projects/semanticGAN_code/output_dir --dataset_name celeba-mask /projects/datasets/CelebAMask-HQ/CelebACropped/.
CelebACropped is the image folder, am I missing something?

It would be really helpful to have some documentation on the structure of the dataset to be fed to the model, what does the structure of the image folder look like? Is it similar to my earlier comment?

AbdouMechraoui · 2021-12-10T10:37:04Z

Was anyone able to figure out the organization of the datasets? From what I gathered from the code, it seems like:
|- CelebA-HQ
  |- unlabel_data 
     |- unlabel_list.txt (list of images basename 0..27999) 
     |- image/
  |- label_data
     |- train_full_list.txt (list of images/labels basename 28000..29499)
     |- val_full_list.txt 
     |- test_list.txt (list of images/labels basename 29500..29999)
     |- image/
     |- label/
I'm getting FileNotFoundError, when running prepare_inception.py. For some reason, the loader appends label to the unlabel_data path => /CelebAMask-HQ/CelebACropped/unlabel_data/label/0_r.jpg'.

In case you set up your dataset as I mentioned earlier. Also, change the code in dataset.py Ln[162] to Ln[163], and Ln[190]-[194] from:

def __getitem__(self, idx):
        if idx >= self.data_size:
            idx = idx % (self.data_size)
        img_idx = self.idx_list[idx]
        img_pil = Image.open(os.path.join(self.img_dir, img_idx)).convert('RGB').resize((self.resolution, self.resolution))
        mask_pil = Image.open(os.path.join(self.label_dir, img_idx)).convert('L').resize((self.resolution, self.resolution), resample=0)
        
        if self.is_label:
            if (self.phase == 'train' or self.phase == 'train-val') and self.aug:
                augmented = self.aug_t(image=np.array(img_pil), mask=np.array(mask_pil))
                aug_img_pil = Image.fromarray(augmented['image'])
                # apply pixel-wise transformation
                img_tensor = self.preprocess(aug_img_pil)

                mask_np = np.array(augmented['mask'])
                labels = self._mask_labels(mask_np)

                mask_tensor = torch.tensor(labels, dtype=torch.float)
                mask_tensor = (mask_tensor - 0.5) / 0.5

            else:
                img_tensor = self.preprocess(img_pil)
                mask_np = np.array(mask_pil)
                labels = self._mask_labels(mask_np)

                mask_tensor = torch.tensor(labels, dtype=torch.float)
                mask_tensor = (mask_tensor - 0.5) / 0.5
            
            return {
                'image': img_tensor,
                'mask': mask_tensor
            }
        else:
            img_tensor = self.unlabel_transform(img_pil)
            return {
                'image': img_tensor,
            }

To this:

def __getitem__(self, idx):
        if idx >= self.data_size:
            idx = idx % (self.data_size)
            img_idx = self.idx_list[idx]

        if self.is_label:
        # images should be read here
            img_pil = Image.open(os.path.join(self.img_dir, img_idx)).convert('RGB').resize((self.resolution, self.resolution))
            mask_pil = Image.open(os.path.join(self.label_dir, img_idx)).convert('L').resize((self.resolution, self.resolution), resample=0)
            if (self.phase == 'train' or self.phase == 'train-val') and self.aug:
                augmented = self.aug_t(image=np.array(img_pil), mask=np.array(mask_pil))
                aug_img_pil = Image.fromarray(augmented['image'])
                # apply pixel-wise transformation
                img_tensor = self.preprocess(aug_img_pil)

                mask_np = np.array(augmented['mask'])
                labels = self._mask_labels(mask_np)

                mask_tensor = torch.tensor(labels, dtype=torch.float)
                mask_tensor = (mask_tensor - 0.5) / 0.5

            else:
                img_tensor = self.preprocess(img_pil)
                mask_np = np.array(mask_pil)
                labels = self._mask_labels(mask_np)

                mask_tensor = torch.tensor(labels, dtype=torch.float)
                mask_tensor = (mask_tensor - 0.5) / 0.5
            
            return {
                'image': img_tensor,
                'mask': mask_tensor
            }
        else:
            img_pil = Image.open(os.path.join(self.img_dir, img_idx)).convert('RGB').resize((self.resolution, self.resolution))

            # Avoids Nontype object is not callable error
            if self.unlabel_transform is None:
                img_tensor = self.preprocess(img_pil)                
            else:
                img_tensor = self.unlabel_transform(img_pil)

            return {
                'image': img_tensor,
            }

linyu0219 · 2021-12-13T08:01:08Z

@AbdouMechraoui Nice work !

mohammadrezanaderi4 · 2021-12-13T10:49:27Z

Could you please also clarify the way you split JSRT, ISIC and LITS datasets into train and test sets?

SarthakJShetty · 2022-01-08T06:05:54Z

Also, how do we merge the different labels into a single mask? Is there some script to do this? It seems tedious otherwise to figure out which of (and how) the 19 classes belong to the 8 new classes that were created in this paper.

vunh · 2022-02-17T22:04:35Z

thank you for pointing it out @greatwallet ! I will update the data split file and instructions on preparing datasets soon, for your reference, the data split is 0-27999 as unlabeled data, 28000-29499 as training, 29500-29999 as testing. Hope it does not block your research!

Hi @lidaiqing , in addition to Training and Testing data, do you have Validation data (for val_full_list.txt)?

whyydsforever · 2022-03-12T17:12:31Z

@AbdouMechraoui I use your structures you mentioned above but it turned out to be
Traceback (most recent call last):
File "semanticGAN/prepare_inception.py", line 88, in
pools, logits = extract_features(args, loader, inception, device)
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "semanticGAN/prepare_inception.py", line 39, in extract_features
for data in pbar:
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1229, in _process_data
data.reraise()
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch_utils.py", line 434, in reraise
raise exception
UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\utils\data_utils\worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\wu.conda\envs\pytorch\lib\site-packages\torch\utils\data\dataset.py", line 308, in getitem
return self.datasets[dataset_idx][sample_idx]
File "D:\FYP\semanticGAN_code-main\semanticGAN\dataloader\dataset.py", line 186, in getitem
img_pil = Image.open(os.path.join(self.img_dir, img_idx)).convert('RGB').resize((self.resolution, self.resolution))
UnboundLocalError: local variable 'img_idx' referenced before assignment

Mona77 · 2022-06-16T08:19:28Z

Was anyone able to figure out the organization of the datasets? From what I gathered from the code, it seems like:
|- CelebA-HQ
  |- unlabel_data 
     |- unlabel_list.txt (list of images basename 0..27999) 
     |- image/
  |- label_data
     |- train_full_list.txt (list of images/labels basename 28000..29499)
     |- val_full_list.txt 
     |- test_list.txt (list of images/labels basename 29500..29999)
     |- image/
     |- label/
I'm getting FileNotFoundError, when running prepare_inception.py. For some reason, the loader appends label to the unlabel_data path => /CelebAMask-HQ/CelebACropped/unlabel_data/label/0_r.jpg'.

@AbdouMechraoui @lidaiqing I have downloaded the CelebAMask-HQ from https://github.com/switchablenorms/CelebAMask-HQ but label directory does not exist. There is a directory called CelebAMask-HQ-mask-anno with the following subdirrectory for body parts: 0 1 10 11 12 13 14 2 3 4 5 6 7 8 9. Do you know how can I construct labels and populate label directory?

ArlenCHEN · 2022-06-27T20:37:36Z

@Mona77 You need to use the g_mask.py here to merge the parts to complete masks. See the instruction here for more details.

debuluoyizhe · 2022-11-22T08:18:28Z

Could you please upload these files "unlabel_list.txt","train_full_list.txt", "val_full_list.txt", "unlabel_list.txt''
"? I want to know their details,thanks.

tphankr · 2023-07-14T09:53:08Z

Can everyone run this code?Plz share with us. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organization of the datasets #2

Organization of the datasets #2

vunh commented Jul 7, 2021

greatwallet commented Jul 10, 2021

lidaiqing commented Jul 13, 2021

mfredriksz commented Jul 13, 2021 •

edited

Loading

kartheekmedathati commented Jul 19, 2021

mfredriksz commented Jul 19, 2021

tommy-qichang commented Nov 17, 2021

AbdouMechraoui commented Dec 9, 2021

lidaiqing commented Dec 9, 2021

AbdouMechraoui commented Dec 9, 2021 •

edited

Loading

AbdouMechraoui commented Dec 10, 2021

linyu0219 commented Dec 13, 2021

mohammadrezanaderi4 commented Dec 13, 2021

SarthakJShetty commented Jan 8, 2022

vunh commented Feb 17, 2022

whyydsforever commented Mar 12, 2022 •

edited

Loading

Mona77 commented Jun 16, 2022

ArlenCHEN commented Jun 27, 2022

debuluoyizhe commented Nov 22, 2022

tphankr commented Jul 14, 2023

Organization of the datasets #2

Organization of the datasets #2

Comments

vunh commented Jul 7, 2021

greatwallet commented Jul 10, 2021

lidaiqing commented Jul 13, 2021

mfredriksz commented Jul 13, 2021 • edited Loading

kartheekmedathati commented Jul 19, 2021

mfredriksz commented Jul 19, 2021

tommy-qichang commented Nov 17, 2021

AbdouMechraoui commented Dec 9, 2021

lidaiqing commented Dec 9, 2021

AbdouMechraoui commented Dec 9, 2021 • edited Loading

AbdouMechraoui commented Dec 10, 2021

linyu0219 commented Dec 13, 2021

mohammadrezanaderi4 commented Dec 13, 2021

SarthakJShetty commented Jan 8, 2022

vunh commented Feb 17, 2022

whyydsforever commented Mar 12, 2022 • edited Loading

Mona77 commented Jun 16, 2022

ArlenCHEN commented Jun 27, 2022

debuluoyizhe commented Nov 22, 2022

tphankr commented Jul 14, 2023

mfredriksz commented Jul 13, 2021 •

edited

Loading

AbdouMechraoui commented Dec 9, 2021 •

edited

Loading

whyydsforever commented Mar 12, 2022 •

edited

Loading