Description
Issue Description
I am trying to incrementally train word2vec model and analyse the time and vector space difference as compared to the model obtained through batch training. So far I only found the word2vec uptraining example relevant to this issue and was wondering what should be the data input for the subsequent incremental training after the first training.
In the Word2VecUptrainingExample, the same raw_text file is being used for both the first and the second training. Am I right to say that for the subsequent incremental trainings, the input data should always include the very original set of data plus whatever data that is newly added?
Also, is it possible to conduct incremental training on paragraph vectors? I have tried with DocumentIterator with trainWordVector set to TRUE, but the nearestWords test shows document index among the results.
Lastly, I found it very strange that for all my incremental trainings with previously trained word2vec model loaded, nearestWords test always show the same result as what the loaded word2vec model would show. There is certainly something missing here, please advise.