Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Anjaney1999 authored Sep 4, 2020
1 parent 2485dbf commit 10e60ad
Showing 1 changed file with 0 additions and 10 deletions.
10 changes: 0 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,3 @@
***This project is still in progress***

This repository contains a Pytorch implementation of an image captioning model that is inspired by the Show, Attend and Tell paper (https://arxiv.org/abs/1502.03044) and the Sequence Generative Adversarial Network (SeqGAN) paper (https://arxiv.org/abs/1609.05473). This model has been proposed in the "Improving Image Captioning with Conditional Generative Adversarial Nets" paper (https://arxiv.org/abs/1805.07112). The only difference is that I will not be using evaluation metrics as feedback for the generator during adversarial training.
For readability and convenience, the 3 main stages of the training process have been divided into 3 python scripts:
* train_mle.py: pretraining the generator using Maximum Likelihood estimation (essentially the same as the Show, Attent and Tell model)
* pretrain_discriminator.py: pretrain the discriminator, which is a GRU that takes the features of an image as its first input followed by its corresponding caption
* train_pg.py: adversarial training using policy gradients as proposed in the SeqGAN paper

The code is functional (if I do find bugs, I will try to fix it immediately). For the MLE stage, I have been able to get a BLEU-4 score of 0.197 on the validation set (without beam search) for Flickr8k. For the adversarial training stage, I was able to get the BLEU-4 score to rise to 0.21 from 0.197; however, I am still tuning the hyperparameters of the model. Research papers proposing similar models have been vague about how many iterations the discriminator is trained on for every iteration that the generator is trained on during adversarial training i.e., the generator-discriminator iterations ratio. During adversarial training, it is better to train the discriminator on more iterations than the generator (I am yet to find the ideal ratio). I have currently set the ratio to 10:1, mostly based on observations provided in https://arxiv.org/abs/1804.00861.

To run the program:
* Place the data splits (http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip) provided by Andrej Karpathy in the karpathy_splits folder
* Place images from flickr8k, flickr30, or MSCOCO within the images folder (make sure to place it in the correct subdirectory). For MSCOCO, place the train2014 and val2014 folders as is into the correct image folder
Expand Down

0 comments on commit 10e60ad

Please sign in to comment.