-
Notifications
You must be signed in to change notification settings - Fork 63
Add spaCy support for MorphMan #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can't speak for the maintainers, but as a user, I would hate that. Most people use MorphMan for Japanese, so that is a lot of space for something a minority would use. Is that the install size without any models? I believe spaCy has good support for downloading models at runtime. |
Yes. That is the install size without models. We could then allow users to download models via the preferences dialog. I actually have it working now to download spacy via the preferences. It just requires some sys.path shenanigans to get it loaded after install. Having it pre-installed with MorphMan would just make it simpler. |
I think spaCy support would be very nice. I'm planning to learn a new language that would benefit from it. I think one addon that supports a lot of languages is better than multiple addons that only support one language. |
It occurred to me that we can not prepackage spaCy with Morphman. Due to binaries the install is OS specific. So I think the solution I will implement is: Add a panel to the prefs dialog for spaCy where the user could install spaCy and any of its models. Would be great to get some feedback from the maintainers on this. |
Update: I am creating a separate spacy addon. This addon will provide a dialog that allows you to install spacy and associated language models. When spacy has been installed then the addon will add it to anki's python sys.path so other addons can make use of the spacy API. The addon will also send hooks that other addons can listen to, such as when a model has been installed or removed. I am making some changes to Morphman's morphemizer registry so that it can listen to these events and dynamically add new morphemizers that are registered by the spacy addon. I have had this working end-to-end nicely. I am currently writing the spacy management dialog and working out pip installs from Anki's python (PITA). It seems that the maintainers of Morphman are not paying much attention to this repo. Once I have everything working I can either provide instructions on installing Morphman from my fork or adding a new version of Morphman to ankiweb (yuck). Ideally the maintainers would find some people interested in taking over MX of this repo. |
I'm working on PR #193 that adds spacy support. It will use spacy if it's installed to your system's Python. |
If you add a dialog to make installing spacy easy, that would be great. |
@ianki I have been working on the addon for managing spacy and its models. It has taken me a bit longer than expected due to other commitments. I plan to have it released the week of Thanksgiving. In the meantime if you have time take a look at the linked commit above in my fork of Morphman. It adds a MorphemizerManager that allows for morphemizers to get registered at runtime via hooks. So as different models are installed/uninstalled from the spacy addon they will get correctly reflected in the Morphman dialogs. The morphemizer combo box has been updated to handle this. It also adds a Spacy morphemizer. As I get this a bit more tested I will send it over as a pull-request. |
Hey @rteabeault, just pinging regarding progress on this? |
Sorry for the lack of updates @ianki. I have been pretty busy with work and holiday related things and have not had a lot of time to work on this. However, I have some free time and plan to focus on it this weekend. |
@ianki The spacy addon is functional and I have made the repository public. You should be able to clone and symlink the
But with the Morphman integration in #221 you can start taking a look in the meantime. If you are familiar with Japanese/Chinese I would love to get some testing there as I am not qualified. I have only tested with German. Please let me know if you have any feedback/questions/concerns. In the meantime I will finish the outstanding items to get the AnkiSpacy plugin published to ankiweb. |
Can test Japanese and Spanish sometime within the next few days. Thank you for your work, I'm excited to see the implementation. |
I'm waiting for this. It would be great if we can use it for other languages like Korean |
@khanguyenwk , unfortunately the maintainers have abandoned this project, and there's no current interest in continuing to develop Spacy support for Morphman. |
@rteabeault what's the current status on this? Were you able to weed out the issues? |
I have started working on an addition to MorphMan that would add spaCy enabled Morphemizers. There are currently trained spaCy models for zh, da, nl, en, fr, de, el, it, ja, lt, mb, pl, pt, ro, and es. And as new language models are added they would also be available for this addon. spaCy would be a great alternative to both the Languages with Spaces as well as the existing Japanese and Chinese support included with MorphMan. Here is an example of spaCy output showing the parsed word and the base (lemmatized) form.
Expression:
Eher ziehe ich wieder zu meinen Eltern, als einen Tag länger bei dir zu wohnen.
deps
dir? That would be about 170MB.The text was updated successfully, but these errors were encountered: