Pre-compute Coefficients for Common Languages in CharAugmenter #10

LSanselme · 2024-01-16T19:51:59Z

Issue description

Textnoisr uses a coefficient to take into account repetitions in consecutive letters in natural language.

As @felix-martel-prl said in #7 (review) :

The next step would be to pre-compute this coefficient for a range of common languages
CharAugmenter(language="en") is better than CharAugmenter(natural_language_swap_correction=1.052).

It could indeed enhance readability, and make the code more easily usable for non-English languages.

Suggested Implementation Steps:

Identify a set of common languages for pre-computation.
Implement a mechanism to store and retrieve pre-computed coefficients.
Update the CharAugmenter module to use pre-computed coefficients when available.

The text was updated successfully, but these errors were encountered:

LSanselme added the enhancement New feature or request label Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-compute Coefficients for Common Languages in CharAugmenter #10

Pre-compute Coefficients for Common Languages in CharAugmenter #10

LSanselme commented Jan 16, 2024

Pre-compute Coefficients for Common Languages in CharAugmenter #10

Pre-compute Coefficients for Common Languages in CharAugmenter #10

Comments

LSanselme commented Jan 16, 2024

Issue description

Suggested Implementation Steps: