-
Notifications
You must be signed in to change notification settings - Fork 62
Add support for spaCy. #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ianki, @thinkingbox12 New pull request for spaCy support. Tests are passing locally for me but failing the automated build. I will look into that shortly but I wanted you both to take a look at this. |
Sorry for the delayed reply. It seems to me that this new version is not communicating with the spacy addon. When going into morphman to change the preferences for recalc, it could not see the installed models. I tried this on a fresh profile as well. Let me know if there's anything else besides that specifically you want checked. |
@thinkingbox12 did you read the instructions in the readme? This does not use the spacy addon that I wrote. |
Nope. My fault. Will read the readme and try again tomorrow. |
I have installed ubuntu 18.04 and python 3.7 and am still unable to reproduce this test failure. I will continue to investigate. |
Will try ubuntu today. Got caught up with other things, sorry about the delay. |
Well, everything seems to be working fine for me. Ubuntu 20.04.1 LTS 64bit etc... |
Fixes #162 - Previous attempt was to create an AnkiSpacy addon that was a package manager for installing spacy and its models and notifying other addons. This posed several issues (mainly in windows). See rteabeault/AnkiSpacy#7 - This solution is essentially a continuation of the @ianki solution here #193 - It uses a python executable path to execute spaCy to discover what models are installed. - It uses the same python path to run a subprocess that listens on stdin and uses spaCy to parse the passed text. - An unfortunate side affect of this is there is currently not a good way to kill the subprocess after a recalc. This may not be an issue in practice but in the future it may be good to have a open/close for morphemizers. These could be used to initialize the subprocess and then close it after. - Created a MorphemizerRegistry that contains all registered morphemizers. Adding and removing morphemizers fires events. The MorphemizerComboBox listens to these events to keep itself updated. - A `fake_aqt` was being added to sys.modules for tests but this was done in all_tests.py. This meant you could not run the tests individually. Moved all modifications of sys.modules into fake_aqt and added some additional sys.modules needed for new tests.
Tests fixed. @ianki Can you please take a look? Thanks! |
Any updates on merging this into the default MorphMan version? |
I'm getting this exception after a new Install to windows after a while. @rteabeault
|
Another exception, either getting this one or the last one. tried reinstalling spacy and models many times, with no luck. Is SpaCy still in interest of being developed? I've been looking into some cool Japanese features in the meantime.
|
@rteabeault The problem above results from an encoding problem to terminal (on the Japanese model, specifically) in Windows. My instinct tells me this behavior has resulted from a new Windows feature update- the terminal effectively is displaying Japanese characters as the Unicode 'unknown character' glyph, so when they get passed through to SudachiPy, it fails, and an exception results. I am suspecting that changing the region and locale to Japan so that the terminal supports UTF-8 and Japanese glyphs might solve the problem, but this has not been tested yet, and is probably an ineffective solution for most users of this addon. The most recent version of this repo works just fine on Ubuntu. I am interested in development for Spacy 3.0, which might simplify the link process, as it was revamped and considered obsolete. AFAIK some of the syntax is changed slightly, and doesn't work currently. EDIT Oddly enough though, on Ubuntu, when upgrading from sudachipy 0.4.5 (which worked) to 0.4.9, I got the same exception that I did on Windows. Upgrading once again on Ubuntu to 0.5.2 resolved the issue. Is this coincidental? |
Just wondering, if Spacey provides better analysis than MeCab then perhaps it would be better as a new add-on? I almost see Morphman as an add-on for Japanese and not other languages (that all came later). There's a lot in this repo and maybe by replacing MeCab and only using Spacey lots of code could be removed and the codebase simplified? |
Hey guys, sorry for the long wait on this. What's the current state of this support? Should I look to merge this? |
I was able to rebase this, and it seems to work OK after fixing handling of new lines in the expressions. |
Hey all...What do I have to do to merge this into my morphman installation? |
This is not true. People use morphman for other languages and there is no reason why they shouldn't benefit for Spacy |
Fixes #162
for installing spacy and its models and notifying other addons. This posed
several issues (mainly in windows). See Issues with Japanese model installation rteabeault/AnkiSpacy#7
Add initial support for SpaCy and SudachiPy parsers. #193
uses spaCy to parse the passed text.
kill the subprocess after a recalc. This may not be an issue in practice but
in the future it may be good to have a open/close for morphemizers. These could
be used to initialize the subprocess and then close it after.
Adding and removing morphemizers fires events. The MorphemizerComboBox listens
to these events to keep itself updated.
fake_aqt
was being added to sys.modules for tests but this was done inall_tests.py. This meant you could not run the tests individually. Moved
all modifications of sys.modules into fake_aqt and added some additional
sys.modules needed for new tests.