xml2csv is a Python module to transform standards from semi-structured to structured data. It provides a set of classes to parse XML that uses the ISO Standard Tag Set (ISO STS) and/or NISO Standard Tag Suite (NISO STS). The results are written to CSV.
The API documentation and additional information are available via data.nen.nl.
Parses standards as XML and outputs data as CSV. The output includes:
- committees
- ICS codes
- dates, e.g. review or withdrawal
- references
- meta data
- terms and definitions
- titles, e.g. NL and EN
- sections
- equations
- Create an instance of a Processor and call the process method.
- Pass a reader oject and writer object as parameters to the constructor of the class.
from xml2csv import IcsProcessor
from csv import DictWriter
reader = open('input.xml', 'r', encoding='utf-8')
writer = DictWriter(open('output.csv', 'a'), delimiter=',', lineterminator='\n', fieldnames=IcsProcessor.fieldnames)
p = IcsProcessor(reader, writer)
p.process()
To implement your own parser:
- Create a subclass of the Processor class
- Overwrite the converter method
How to install the project locally:
- Clone the repository
- Copy the XML documents to /data/xml directory.
Note the /data/xml directory contains a sample document (NISO-STS-Standard-1-0.XML)
- Run main.py which defines a pipeline (list of processors)
- The output is written to the /data/csv directory (set of CSV files)
Exclusive copyright: GNU GPLv3