Skip to content

Commit e49fea1

Browse files
committed
make metadata structure parquet compatible
1 parent 09ddae1 commit e49fea1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Diff for: src/datatrove/pipeline/filters/oscar_filter.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def filter(self, doc: Document) -> bool | tuple[bool, str]:
5050
Returns:
5151
is_filter
5252
"""
53-
if doc.metadata['oscar_quality_warnings']:
53+
if doc.metadata['oscar_quality_warnings'] and len(doc.metadata['oscar_quality_warnings']) > 0:
5454
return False, 'oscar_quality_warning'
5555
if doc.metadata['harmful_pp'] and doc.metadata['harmful_pp'] < self.min_harmful_ppl:
5656
return False, 'kenlm_min_harmful_ppl'

0 commit comments

Comments
 (0)