https://universaldependencies.org/treebanks/tr_imst/index.html https://github.com/UniversalDependencies/docs/blob/pages-source/_tr/index.md
This page describes the annotation of a Turkish “GB” treebank, a treebank of grammar-book examples annotated according to Universal Dependencies (UD) annotation scheme. http://coltekin.github.io/gk-treebank/ http://coltekin.github.io/gk-treebank/pos/ Part of speech tags http://coltekin.github.io/gk-treebank/feat/ Morphological features Animacy, Aspect, Case http://coltekin.github.io/gk-treebank/dep/ Dependencies
https://github.com/explosion/spaCy/tree/master/spacy/lang/tr https://github.com/UniversalDependencies/UD_Turkish-IMST
https://spacy.io/usage/training#basics https://spacy.io/usage/saving-loading#basics https://spacy.io/usage/adding-languages#tag-map
https://spacy.io/api/cli#pretrain https://spacy.io/api/cli#train
Other language examples: https://github.com/aajanki/spacy-fi https://github.com/ipipan/spacy-pl https://github.com/buriy/spacy-ru
Adding models for new languages master thread explosion/spaCy#3056
https://stackoverflow.com/questions/56779217/train-spacys-existing-pos-tagger-with-my-own-training-examples adding tag_map after loading language model
previous try: explosion/spaCy#3056 (comment)
git clone https://github.com/UniversalDependencies/UD_Turkish-IMST mkdir imst-json
py -m spacy convert UD_Turkish-IMST/tr_imst-ud-train.conllu imst-json py -m spacy convert UD_Turkish-IMST/tr_imst-ud-dev.conllu imst-json
(To create) The spacy convert CLI command has an argument that specifies whether to merge morphological features with the coarse-grained POS tags, to make the fine-grained tags. If this is set differently than the tag_map is expecting, you might see the error you're experiencing. explosion/spaCy#2503 py -m spacy convert UD_Turkish-IMST/tr_imst-ud-train.conllu imst-json-m --morphology
py -m spacy debug-data tr imst-json\tr_imst-ud-train.json imst-json\tr_imst-ud-dev.json
Vector modelleri spacy modeline çevirmek py -m spacy init-model tr .\vectors\tr_vectors_cc_50K_3 --vectors-name tr_cc_md.vectors --vectors-loc .\vectors_org-files\cc.tr.300.vec.gz --prune-vectors 50000 common crawl: cc.tr.300.vec.gz > çok büyük o yüzden prune-vectors alternatif wiki: wiki.tr.vec py -m spacy init-model tr .\vectors\tr_vectors_conll17-md --vectors-loc .\vectors\CoNLL17-w2vec\modeltest.txt --prune-vectors 100000
mkdir models
[vector modeli dikkate almadan] py -m spacy train tr models imst-json/tr_imst-ud-train.json imst-json/tr_imst-ud-dev.json
bunu kullan tag_map i dikkate alarak (tr folder) tarining py spacy_tr.py train tr models imst-json\tr_imst-ud-train.json imst-json\tr_imst-ud-dev.json --pipeline tagger,parser
vector modeli dikkate alarak py spacy_tr.py train tr models\model-cc-50K imst-json\tr_imst-ud-train.json imst-json\tr_imst-ud-dev.json --vectors vectors/tr_vectors_cc_50K_3 --meta-path meta.json
Train cmd details: python -m spacy train [lang] [output_path] [train_path] [dev_path] [--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping] [--n-examples] [--use-gpu] [--version] [--meta-path] [--init-tok2vec] [--parser-multitasks] [--entity-multitasks] [--gold-preproc] [--noise-level] [--orth-variant-level] [--learn-tokens] [--textcat-arch] [--textcat-multilabel] [--textcat-positive-label] [--verbose]
Display training result values py -m spacy evaluate .\models\model-cc-md\model-best\ .\imst-json\tr_imst-ud-test.json
mkdir models_packaged py -m spacy package models/model-best models/_packaged py -m spacy package models/model-cc-md/model-best models/_packaged py -m spacy package models/model-cc-50K_2/model-best models/_packaged
cd models_packaged\tr_model0-0.0.0 python setup.py sdist
pip install models_packaged\tr_model0-0.0.0\dist\tr_model0-0.0.0.tar.gz