Skip to content

Commit 57e6109

Browse files
committed
add semicolon punctuation
1 parent 07b12ca commit 57e6109

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Diff for: src/datatrove/pipeline/filters/fineweb_quality_filter.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ def __init__(
2828
def filter(self, doc) -> bool | tuple[bool, str]:
2929
from nltk import word_tokenize
3030

31-
stop_chars = (".", "'", '"', "!", "?")
31+
stop_chars = (".", "'", '"', "!", "?", ";")
3232

3333
lines = doc.text.split("\n")
3434
ratio = sum(1 for line in lines if line.endswith(stop_chars)) / len(lines)

0 commit comments

Comments
 (0)