Add Vietnamese support using pyvi #113

kurtisc · 2020-04-21T22:58:43Z

Hi!

Vietnamese doesn't separate words with spaces like most other languages that use the Latin alphabet[1], so the current spaces morphemizer is unsuitable.

[1] Fun read https://www.tandfonline.com/doi/pdf/10.1080/00437956.1963.11659787

I wasn't able to find a small library that would do word segmentation for Vietnamese like Jieba does for Chinese. To bundle pyvi in-code like Jieba has been bundled would require bundling many larger dependencies (e.g. Numpy).

So, if merged like this, it's unfortunately a burden on the end user to get the Vietnamese support working. On the other hand, if they don't want it, it won't appear or impact their usage.

If this gets included I'll look into packaging pyvi and it's dependencies as a separate addon like has been done for Mecab, licences permitting. That would make the installation more straight-forward and avoid forcing use of the source version of Anki.

kurtisc · 2020-08-15T19:37:42Z

Rebased on master and confirmed working when #125 is merged.

With regards to #145: I do have a test for this morphemizer, so hopefully that fulfils @shanrauf's comment.

ianki · 2020-11-09T20:41:55Z

Would you mind rebasing again, so I can see if the tests pass? I'll submit after.

smartlitchi · 2020-11-13T10:19:49Z

I am really interested in this

sedosido · 2021-08-15T20:41:12Z

I haven’t been able to build anki from scratch to import pyvi (I think because my hardware is a little old). Is there any other way I can get vietnamese parsing to work with morphman?

Irio mentioned this pull request Jul 1, 2020

Support new languages #145

Open

kurtisc force-pushed the master branch from 4a792d4 to e371210 Compare August 15, 2020 19:17

kurtisc mentioned this pull request Aug 15, 2020

Add Vietnamese support using pyvi kurtisc/MorphMan#2

Merged

kurtisc force-pushed the master branch from e371210 to 4c9fb36 Compare November 13, 2020 10:44

Add Vietnamese support using pyvi

5bbb518

kurtisc force-pushed the master branch from 4c9fb36 to 5bbb518 Compare November 13, 2020 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Vietnamese support using pyvi #113

Add Vietnamese support using pyvi #113

Uh oh!

kurtisc commented Apr 21, 2020

Uh oh!

kurtisc commented Aug 15, 2020

Uh oh!

ianki commented Nov 9, 2020

Uh oh!

smartlitchi commented Nov 13, 2020

Uh oh!

sedosido commented Aug 15, 2021

Uh oh!

Uh oh!

Add Vietnamese support using pyvi #113

Are you sure you want to change the base?

Add Vietnamese support using pyvi #113

Uh oh!

Conversation

kurtisc commented Apr 21, 2020

Uh oh!

kurtisc commented Aug 15, 2020

Uh oh!

ianki commented Nov 9, 2020

Uh oh!

smartlitchi commented Nov 13, 2020

Uh oh!

sedosido commented Aug 15, 2021

Uh oh!

Uh oh!