Skip to content

Add support for spaCy. #231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Add support for spaCy. #231

wants to merge 2 commits into from

Conversation

rteabeault
Copy link

Fixes #162

  • Previous attempt was to create an AnkiSpacy addon that was a package manager
    for installing spacy and its models and notifying other addons. This posed
    several issues (mainly in windows). See Issues with Japanese model installation rteabeault/AnkiSpacy#7
  • This solution is essentially a continuation of the @ianki solution here
    Add initial support for SpaCy and SudachiPy parsers. #193
  • It uses a python executable path to execute spaCy to discover what models are installed.
  • It uses the same python path to run a subprocess that listens on stdin and
    uses spaCy to parse the passed text.
  • An unfortunate side affect of this is there is currently not a good way to
    kill the subprocess after a recalc. This may not be an issue in practice but
    in the future it may be good to have a open/close for morphemizers. These could
    be used to initialize the subprocess and then close it after.
  • Created a MorphemizerRegistry that contains all registered morphemizers.
    Adding and removing morphemizers fires events. The MorphemizerComboBox listens
    to these events to keep itself updated.
  • A fake_aqt was being added to sys.modules for tests but this was done in
    all_tests.py. This meant you could not run the tests individually. Moved
    all modifications of sys.modules into fake_aqt and added some additional
    sys.modules needed for new tests.

@rteabeault
Copy link
Author

@ianki, @thinkingbox12 New pull request for spaCy support. Tests are passing locally for me but failing the automated build. I will look into that shortly but I wanted you both to take a look at this.

@nlovell1
Copy link

Sorry for the delayed reply. It seems to me that this new version is not communicating with the spacy addon. When going into morphman to change the preferences for recalc, it could not see the installed models. I tried this on a fresh profile as well.

Let me know if there's anything else besides that specifically you want checked.

@rteabeault
Copy link
Author

rteabeault commented Jan 12, 2021

@thinkingbox12 did you read the instructions in the readme? This does not use the spacy addon that I wrote.

@nlovell1
Copy link

Nope. My fault. Will read the readme and try again tomorrow.

@rteabeault
Copy link
Author

I have installed ubuntu 18.04 and python 3.7 and am still unable to reproduce this test failure. I will continue to investigate.

@nlovell1
Copy link

Will try ubuntu today. Got caught up with other things, sorry about the delay.

@nlovell1
Copy link

nlovell1 commented Jan 17, 2021

Well, everything seems to be working fine for me. Ubuntu 20.04.1 LTS 64bit etc...
using Python 3.8.5
Could download the model properly and link properly in terminal. Not sure if this has anything to do with it at all, but I kept the old SpaCy package manager in the profile. Don't think it makes a difference though because obviously, python couldn't see my prior installed models through the Spacy Package manager.
Could recalc properly with a few Japanese notes, morph count updated. Reading known.db also made sense to me. TLDR everything good on my end.
Sorry again for the delay.

Russell Teabeault added 2 commits January 17, 2021 22:04
Fixes #162

- Previous attempt was to create an AnkiSpacy addon that was a package manager
for installing spacy and its models and notifying other addons. This posed
several issues (mainly in windows). See rteabeault/AnkiSpacy#7
- This solution is essentially a continuation of the @ianki solution here
#193
- It uses a python executable path to execute spaCy to discover what models are installed.
- It uses the same python path to run a subprocess that listens on stdin and
uses spaCy to parse the passed text.
- An unfortunate side affect of this is there is currently not a good way to
kill the subprocess after a recalc. This may not be an issue in practice but
in the future it may be good to have a open/close for morphemizers. These could
be used to initialize the subprocess and then close it after.
- Created a MorphemizerRegistry that contains all registered morphemizers.
Adding and removing morphemizers fires events. The MorphemizerComboBox listens
to these events to keep itself updated.
- A `fake_aqt` was being added to sys.modules for tests but this was done in
all_tests.py. This meant you could not run the tests individually. Moved
all modifications of sys.modules into fake_aqt and added some additional
sys.modules needed for new tests.
@rteabeault
Copy link
Author

Tests fixed. @ianki Can you please take a look? Thanks!

@nlovell1
Copy link

Any updates on merging this into the default MorphMan version?

@nlovell1
Copy link

nlovell1 commented Mar 19, 2021

I'm getting this exception after a new Install to windows after a while. @rteabeault
any guesses?

Error
An error occurred. Please start Anki while holding down the shift key, which will temporarily disable the add-ons you have installed.
If the issue only occurs when add-ons are enabled, please use the Tools > Add-ons menu item to disable some add-ons and restart Anki, repeating until you discover the add-on that is causing the problem.
When you've discovered the add-on that is causing the problem, please report the issue on the add-on support site.
Debug info:
Anki 2.1.35 (84dcaa86) Python 3.8.0 Qt 5.14.2 PyQt 5.14.2
Platform: Windows 10
Flags: frz=True ao=True sv=1
Add-ons, last update check: 2021-03-18 23:12:43

Caught exception:
Traceback (most recent call last):
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\__init__.py", line 17, in onMorphManRecalc
    main.main()
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 573, in main
    allDb = mkAllDb(cur)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 195, in mkAllDb
    ms = getMorphemes(morphemizer, fieldValue, ts)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemes.py", line 166, in getMorphemes
    ms = morphemizer.getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemizer.py", line 51, in getMorphemesFromExpr
    morphs = self._getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\deps\spacy\morphemizer.py", line 40, in _getMorphemesFromExpr
    self.proc.stdin.flush()
OSError: [Errno 22] Invalid argument

@nlovell1
Copy link

Another exception, either getting this one or the last one. tried reinstalling spacy and models many times, with no luck. Is SpaCy still in interest of being developed? I've been looking into some cool Japanese features in the meantime.

Caught exception:
Traceback (most recent call last):
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\__init__.py", line 17, in onMorphManRecalc
    main.main()
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 573, in main
    allDb = mkAllDb(cur)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 195, in mkAllDb
    ms = getMorphemes(morphemizer, fieldValue, ts)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemes.py", line 166, in getMorphemes
    ms = morphemizer.getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemizer.py", line 51, in getMorphemesFromExpr
    morphs = self._getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\deps\spacy\morphemizer.py", line 41, in _getMorphemesFromExpr
    morphs = json.loads(self.proc.stdout.readline())
  File "json\__init__.py", line 357, in loads
  File "json\decoder.py", line 337, in decode
  File "json\decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@nlovell1
Copy link

nlovell1 commented Mar 30, 2021

@rteabeault The problem above results from an encoding problem to terminal (on the Japanese model, specifically) in Windows. My instinct tells me this behavior has resulted from a new Windows feature update- the terminal effectively is displaying Japanese characters as the Unicode 'unknown character' glyph, so when they get passed through to SudachiPy, it fails, and an exception results.

I am suspecting that changing the region and locale to Japan so that the terminal supports UTF-8 and Japanese glyphs might solve the problem, but this has not been tested yet, and is probably an ineffective solution for most users of this addon.

The most recent version of this repo works just fine on Ubuntu.

I am interested in development for Spacy 3.0, which might simplify the link process, as it was revamped and considered obsolete. AFAIK some of the syntax is changed slightly, and doesn't work currently.

EDIT

Oddly enough though, on Ubuntu, when upgrading from sudachipy 0.4.5 (which worked) to 0.4.9, I got the same exception that I did on Windows. Upgrading once again on Ubuntu to 0.5.2 resolved the issue. Is this coincidental?

@RawToast
Copy link

Just wondering, if Spacey provides better analysis than MeCab then perhaps it would be better as a new add-on? I almost see Morphman as an add-on for Japanese and not other languages (that all came later).

There's a lot in this repo and maybe by replacing MeCab and only using Spacey lots of code could be removed and the codebase simplified?

@ianki
Copy link
Collaborator

ianki commented Jan 17, 2022

Hey guys, sorry for the long wait on this. What's the current state of this support? Should I look to merge this?

@ianki
Copy link
Collaborator

ianki commented Jan 17, 2022

I was able to rebase this, and it seems to work OK after fixing handling of new lines in the expressions.

@ghost
Copy link

ghost commented Feb 2, 2022

Hey all...What do I have to do to merge this into my morphman installation?

@Vilhelm-Ian
Copy link

Just wondering, if Spacey provides better analysis than MeCab then perhaps it would be better as a new add-on? I almost see Morphman as an add-on for Japanese and not other languages (that all came later).

There's a lot in this repo and maybe by replacing MeCab and only using Spacey lots of code could be removed and the codebase simplified?

This is not true. People use morphman for other languages and there is no reason why they shouldn't benefit for Spacy

@rteabeault rteabeault closed this by deleting the head repository Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add spaCy support for MorphMan
5 participants