Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi!
Vietnamese doesn't separate words with spaces like most other languages that use the Latin alphabet[1], so the current spaces morphemizer is unsuitable.
[1] Fun read https://www.tandfonline.com/doi/pdf/10.1080/00437956.1963.11659787
I wasn't able to find a small library that would do word segmentation for Vietnamese like Jieba does for Chinese. To bundle pyvi in-code like Jieba has been bundled would require bundling many larger dependencies (e.g. Numpy).
So, if merged like this, it's unfortunately a burden on the end user to get the Vietnamese support working. On the other hand, if they don't want it, it won't appear or impact their usage.
If this gets included I'll look into packaging pyvi and it's dependencies as a separate addon like has been done for Mecab, licences permitting. That would make the installation more straight-forward and avoid forcing use of the source version of Anki.