Identifying Mislabeled Data using the Area Under the Margin Ranking

Implementation of the research paper Identifying Mislabeled Data using the Area Under the Margin Ranking.

Original paper: https://arxiv.org/pdf/2001.10528v4

This technique can be used to identify mislabeled or difficult samples in a dataset. These samples can then be relabeled or removed to improve the final performance of a model trained on the data.

Project structure

identify_mislabeled_data.ipynb is an example showing how to apply AUM Ranking to identify mislabeled samples in a dataset. It outputs TensorBoard logs to runs/, which can be viewed with tensorboard --logdir runs/.
aum_ranking.py contains all the code specific to AUM Ranking.
models.py defines the ResNet-32 model used in the AUM paper.
test_aum_ranking.py contains tests for aum_ranking.py.

Setup

1. Virtual environment

Ensure you have Python installed, create a virtual environment and activate it.

2. Install PyTorch packages

With the virtual environment activated, run

pip install -r requirements_pytorch.txt [--index-url INDEX_URL]

The --index-url should only be specified if advised by https://pytorch.org/get-started/locally/.

3. Install remaining packages

Now run

pip install -r requirements_main.txt

to install the remaining packages.

You should now be able to run identify_mislabeled_data.ipynb.

4. (Optional) Install dev packages

If you want to be able to run the tests, then run

pip install -r requirements_dev.txt

to install pytest.

To run the tests, run the command pytest . (including the full stop).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Identifying Mislabeled Data using the Area Under the Margin Ranking

Project structure

Setup

1. Virtual environment

2. Install PyTorch packages

3. Install remaining packages

4. (Optional) Install dev packages

Files

README.md

Latest commit

History

README.md

File metadata and controls

Identifying Mislabeled Data using the Area Under the Margin Ranking

Project structure

Setup

1. Virtual environment

2. Install PyTorch packages

3. Install remaining packages

4. (Optional) Install dev packages