First release
MainCorpus
.
Paper2000Corpus
.PaperRegionalCorpus
.DialectCorpus
.SpokenCorpus
.AccentologyCorpus
.
- Marking found wordforms with
re
. - Logging simplified.
- Exceptions catching and evaluating.
- Some methods became static.
__repr__
and__str__
methods inExample
andCorpus
.- Ex_type init fixed.
- Docs corrected.
- Now one can dump data to files even if they exist.
- Set classmethods.
- Clear function.
- NotImplementedError in
Corpus._parse_doc
. - Setitem method.
Restrict show
param: by default Corpus shows in print 50 examples. One can change this param or turn the restriction off.
- Func to set stream_handlers levels.
- Func to set file_handlers levels.
- Func to set loggers levels.
data
property, more changes with this.- Setter methods.
- Parsing structure corrected.
is_http_request_correct
andwhether_result_found
joined, amount of operations deteriorated.- Order of receiving pages corrected.
- Corpus init divided:
_from_file
,_from_Corpus
.
- ParallelCorpus.
- Adding found wordforms to the Corpus initting from file.
Searching with gramm params fixed – Issue #3 closed.
- Default filename contains letters and digits, len = 8.
- Corpora classes to
rnc.
- Working with file: validating, that the Corpus type in the file is equal to the Corpus class type.
- Requesting two or more words with one str – Issue #5 closed.
- Deleting the example by the index – Issue #6 closed.
- Method
filter
to Corpus – Issue #9 closed.
- Distance between words set – Issue #7 closed.
- Order of texts in the ParallelCorpus – Issue #8 closed.
Other minor fixes and improvements.
- Docs changed, extended.
- Corpora were renamed.
- MultilingualParaCorpus.
- TutoringCorpus.
- Some Corpora inherited from MainCorpus didn't work.
Minor fixes and improvements.
- Examples improved and fixed.
- Docs fixed.
- Setting loggers/handlers levels.
- Some features to Example.
Minor fixes and improvements.
- Logging/creating a logger simplified.
- Additional info from the first RNC page.
MultimodalCorpus
.
- Docs fixed and improved.
- Way to make async request to RNC was changed.
- Now Python3.7 is required.
- Required libraries were fixed, lxml and new aiojobs were added.
- Making a folder for csv files and media files by default was removed.
Folder will be created when
.dump()
or.download()
method is called. - Compare operators were removed from
Corpus
objects. subcorpus['en']
andsubcorpus.en
,subcorpus['Pushkin']
andsubcorpus.Pushkin
now available for subcorpus.- More logging messages were added.
- Downloading media files in
MultimodalCorpus
was fixed. - Logger is made in
__init__
once and used as one in all modules.
Other minor fixes and improvements.
- Chinese parallel corpus added.
Quickfix: stream handler level was set to WARNING
instead of NOTSET
.
- Docs were improved.
- Params validating was added. Issue #17 closed.
- Encoding was changed from
utf-16
toutf-8
. Issue #16 closed. findall
andfinditer
implemented. Issue #19 closed.- Using
ujson
instead ofjson
. Issue #15 closed. - Some useless validating requests removed. Issue #21 closed.
- Logging improved.
- Use workers and queue instead of aiojobs. Issue #26 closed.
- Versions of requirements specified. Issue #27 closed.
- Email changed. Issue #28 closed.
- Other performance improvements.
- Quickfix: deepcopy removed
- Log message format changed, #34 closed.
- Custom exceptions created, #36 closed.
- ABC used, #37 closed.
- Log message added, #39 closed.
- Requirements version updated according to security vulnerability.
Corpus.findall()/finditer()
fixed- Docs and logging improved.
- Setting on GitHub added: issue templates etc.
- Parsing
MultimodalCorpus
fixed.
- Asyncio support added, some methods implemented:
corp.request_examples_async()
MultimodalCorpus.dump_all_async()
MultimodalExample.dump_file_async()
- Use poetry instead of
setup.py
; dependencies updated. - Request validation fixed according to new ruscorpora.ru page structure.
- Python 3.9 support added.
...
- Setting language in the parallel corpus updated, new values set.
- Kwarg
lang
added to ParallelCorpus. - Bumpversion added.
- Dependencies updated.