Skip to content

Commit 4711e73

Browse files
Version 2.0.5
1 parent 41efa23 commit 4711e73

File tree

3 files changed

+9
-3
lines changed

3 files changed

+9
-3
lines changed

README.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -743,6 +743,9 @@ can be found in the file ``test/toktest_normal_gold_expected.txt``.
743743
Changelog
744744
---------
745745

746+
* Version 2.0.5: Fixed bug where single uppercase letters were erroneously
747+
being recognized as abbreviations, causing prepositions such as 'Í' and 'Á'
748+
at the beginning of sentences to be misunderstood in ReynirPackage
746749
* Version 2.0.4: Added imperfect abbreviations (*amk.*, *osfrv.*); recognized
747750
*klukkan hálf tvö* as a ``TOK.TIME``
748751
* Version 2.0.3: Fixed bug in ``detokenize()`` where abbreviations, domains

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ def read(*names, **kwargs):
5757

5858
setup(
5959
name="tokenizer",
60-
version="2.0.4",
60+
version="2.0.5",
6161
license="MIT",
6262
description="A tokenizer for Icelandic text",
6363
long_description=u"{0}\n{1}".format(

test/test_tokenizer.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1054,11 +1054,14 @@ def test_correct_spaces():
10541054

10551055

10561056
def test_abbrev():
1057-
tokens = list(t.tokenize("Ég las fréttina um IBM t.d. á Mbl."))
1057+
tokens = list(t.tokenize("Í dag las ég fréttina um IBM t.d. á Mbl."))
10581058
assert tokens == [
10591059
Tok(kind=TOK.S_BEGIN, txt=None, val=(0, None)),
1060-
Tok(kind=TOK.WORD, txt="Ég", val=None),
1060+
# We are testing that 'Í' is not an abbreviation
1061+
Tok(kind=TOK.WORD, txt="Í", val=None),
1062+
Tok(kind=TOK.WORD, txt="dag", val=None),
10611063
Tok(kind=TOK.WORD, txt="las", val=None),
1064+
Tok(kind=TOK.WORD, txt="ég", val=None),
10621065
Tok(kind=TOK.WORD, txt="fréttina", val=None),
10631066
Tok(kind=TOK.WORD, txt="um", val=None),
10641067
Tok(

0 commit comments

Comments
 (0)