Programmatic curation of Glottography datasets
Install via pip from PyPI:
pip install pyglottography
Note
We use GDAL's ogr2ogr command to convert between
GeoJSON and GeoPackage formats. Thus, some functionality of pyglottography
requires a working
GDAL installation.
pyglottography
provides a cldfbench project template,
which can be used with the cldfbench new
command:
cldfbench new --template glottography
The cldfbench
workflow uses data in a project's raw
directory - enriched with information from
etc
- to create a CLDF dataset in the cldf
directory. By default, pyglottography
expects input
data as follows:
- Geo-data, i.e. shapes for languoid areas, is expected in a GeoJSON file
raw/dataset.geojson
. Each feature in thie GeoJSON file should have a unique value for theid
property. - Metadata about the shapes is expected in a CSV file
etc/features.csv
. This file must have anid
column with values corresponding to the featureid
s in the geo-data.
While metadata could be read entirely from the properties
object of features in the GeoJSON file,
pyglottography
looks up the metadata in a different file to allow for more transparent curation.
Since the Glottolog language catalog is released in a new version about twice a year, it is necessary
to be able to recreate a Glottography dataset with updated Glottocodes. With the raw data setup as
implemented in pyglottography
, this only requires changes in etc/features.csv
, which can easily
be tracked with versioning software such as git.
cldfbench makecldf cldfbench_<dsid>.py