-
Notifications
You must be signed in to change notification settings - Fork 116
Add Korean TN support for cardinal numbers #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
||
# 1-99 reading | ||
read_1_to_99 = pynini.union(read_1, read_10_to_19, read_20_to_99).optimize() | ||
read_100_to_999 = (NEMO_DIGIT**3) @ graph_hundred_component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different from line 66?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this line
graph_thousand = thousands @ graph_thousand_component | ||
|
||
# 1-99 reading | ||
read_1_to_99 = pynini.union(read_1, read_10_to_19, read_20_to_99).optimize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different from line 44?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this line as well.
graph_10_to_19 = graph_teen | ||
graph_20_to_99 = graph_ty | ||
|
||
graph_all = pynini.union( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use a more descriptive name for this variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the variable names and used a new logic for the creation part.
# 1-99 reading | ||
read_1_to_99 = pynini.union(read_1, read_10_to_19, read_20_to_99).optimize() | ||
read_100_to_999 = (NEMO_DIGIT**3) @ graph_hundred_component | ||
read_1000_to_9999 = (NEMO_DIGIT**4) @ graph_thousand_component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different from line 78?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this line
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
…l.py Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
Superseded by #285. Closing this PR. |
What does this PR do ?
Adds support for Korean cardinal number text normalization (TN), including:
Notes
Before your PR is "Ready for review"
Pre checks:
git commit -s
to sign.pytest
or (if your machine does not have GPU)pytest --cpu
from the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')
).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
pytest
and Sparrowhawk here.__init__.py
for every folder and subfolder, includingdata
folder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
to all newly added Python files?Copyright 2015 and onwards Google, Inc.
. See an example here.try import: ... except: ...
) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.