Open
Description
Found this issue when analysing the result of the page Diffraction. ID: 8603
In section "Patterns" there are three bullet points:
- The angular spacing of the features...
...
These bullet points are ignore and not included in the final cleaned text. I think is because of the asterisk.
To replicate:
I extracted the page with extractPage
, then created a new file with the single page from its output. Then executed the WikiExtractor
.
python -m wikiextractor.extractPage --id 8603 enwiki-latest-pages-articles-multistream.xml.bz2
python -m wikiextractor.WikiExtractor page_8603.xml --json -o teste
Metadata
Metadata
Assignees
Labels
No labels