Skip to content

Latest commit

 

History

History
16 lines (13 loc) · 1.43 KB

CHANGELOG.md

File metadata and controls

16 lines (13 loc) · 1.43 KB

Changelog

📢 Release v1.0.3

  • 🚨 The IndicProcessor class has been re-written in Cython for faster implementation. This gives us atleast +10 lines/s.
  • A new visualize argument as been added to preprocess_batch to track the processing with a tqdm bar.

📢 Release v1.0.2

  • The repository has been renamed to IndicTransToolkit.
  • 🚨 The custom tokenizer is now removed from the repository. Please revert to a previous commit (v1.0.1) to use it (strongly discouraged). The official (and only tokenizer) is available on HF along with the models.

📢 Release v1.0.0

  • The PreTrainedTokenizer for IndicTrans2 is now available on HF 🎉🎉 Note that, you still need the IndicProcessor to pre-process the sentences before tokenization.
  • 🚨 In favor of the standard PreTrainedTokenizer, we deprecated the custom tokenizer. However, this custom tokenizer will still be available here for backward compatibility, but no further updates/bug-fixes will be provided.
  • The indic_evaluate function is now consolidated into a concrete IndicEvaluator class.
  • The data collation function for training is consolidated into a concrete IndicDataCollator class.
  • A simple batching method is now available in the IndicProcessor.