This is a Pytorch implementation of paper A Simple neural network module for relational reasoning with interactive GUI Question-Answering Interface.
- tqdm
- tensorbardX
- numpy
- pytorch
- matplotlib
- colorlog
- PyQt5
- cv2
- PIL
- progressbar
Sort-of-CLEVR is simplified version of CLEVR proposed by the authors.This is composed of 10000 images and 20 questions (8 relational questions and 12 non-relational questions) per each image. 6 colors (blue, green, red, yellow, magenta, cyan) are assigned to randomly chosen shape (square or circle), and placed in a image.
Non-relational questions are composed of 3 subtypes:
Shape of certain colored object
Horizontal location of certain colored object : whether it is on the left side of the image or right side of the image
Vertical location of certain colored object : whether it is on the upside of the image or downside of the image
Theses questions are "non-relational" because the agent only need to focus on certain object.
Relational questions are composed of 3 subtypes:
Color of the object which is closest to the certain colored object
Color of the object which is furthest to the certain colored object
These questions are "relational" because the agent has to consider the relations between objects.
Questions are encoded into a vector of size of 13 : 6 for one-hot vector for certain color among 6 colors, 2 for one-hot vector of relational/non-relational questions. 5 for one-hot vector of 5 questions.
go to directory "DataGenerator" and run the following command in terminal:
python sort_of_clevr_generator.py
A folder "datasets/Sort-of-CLEVR_default" will be created which contains two files: data.hy and id.txt
data.hy contains images questions and answers while id.txt contains id corresponds to each triplet (image, question, answer).
go to the root directory and run the following command in terminal:
python trainer.py
Note that training a RN module requires a GPU installed on your local machine. Based on my experience, the training time is approximately 30 minutes using a GTX 1060 (6GB) graphic card. The train:valid:test ratio is 75%:15%:15%. The overall test accuracy is approximately 95.933%.
Figures below provide a breif view of convergence rate. For every epoch we save the current model and use the 80-epochs model as our final model
go to the directory "InteractiveUI" and run the following command:
python ui_main.py