How to use the pre-trained model to generate images from specific captions?

![Screenshot from your paper](https://user-images.githubusercontent.com/41807109/84665970-10f72480-af53-11ea-9ea0-f921cdf5fe45.png)
For example, how can I generate an image that is corresponding to the caption "a person skateboarding in the street with some people looking on"?