Skip to content

Commit

Permalink
finalize for lexibank release
Browse files Browse the repository at this point in the history
  • Loading branch information
Mattis List committed Jan 13, 2025
1 parent 3a4dc05 commit 45e5851
Show file tree
Hide file tree
Showing 11 changed files with 26 additions and 60 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Lexibank Analysed

[![CLDF validation](https://github.com/lexibank/lexibank-study//workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/lexibank-study//actions?query=workflow%3ACLDF-validation)

## How to cite

If you use these data please cite
Expand Down Expand Up @@ -27,6 +29,7 @@ The core-sets are defined by using the following criteria:
## Statistics


[![CLDF validation](https://github.com/lexibank/lexibank-study//workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/lexibank-study//actions?query=workflow%3ACLDF-validation)
![Glottolog: 100%](https://img.shields.io/badge/Glottolog-100%25-brightgreen.svg "Glottolog: 100%")
![Concepticon: 100%](https://img.shields.io/badge/Concepticon-100%25-brightgreen.svg "Concepticon: 100%")
![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%")
Expand Down
8 changes: 4 additions & 4 deletions cldf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://lexibank.clld.org
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/lexibank/lexibank-study/
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/451c2aa">lexibank/lexibank-study/ v1.0-61-g451c2aa</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/3a4dc05">lexibank/lexibank-study/ v1.0-77-g3a4dc05</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.12.7</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | lexibank-analysed
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
Expand Down Expand Up @@ -121,7 +121,7 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://lexibank.clld.org
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/lexibank/lexibank-study/
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/451c2aa">lexibank/lexibank-study/ v1.0-61-g451c2aa</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/3a4dc05">lexibank/lexibank-study/ v1.0-77-g3a4dc05</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.12.7</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | lexibank-analysed
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
Expand Down Expand Up @@ -268,7 +268,7 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://lexibank.clld.org
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/lexibank/lexibank-study/
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/451c2aa">lexibank/lexibank-study/ v1.0-61-g451c2aa</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/3a4dc05">lexibank/lexibank-study/ v1.0-77-g3a4dc05</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.12.7</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | lexibank-analysed
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
Expand Down Expand Up @@ -415,7 +415,7 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://lexibank.clld.org
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/lexibank/lexibank-study/
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/451c2aa">lexibank/lexibank-study/ v1.0-61-g451c2aa</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/lexibank/lexibank-study//tree/3a4dc05">lexibank/lexibank-study/ v1.0-77-g3a4dc05</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.3.0">Concepticon v3.3.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.12.7</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | lexibank-analysed
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
Expand Down
Binary file modified cldf/forms.csv.zip
Binary file not shown.
2 changes: 1 addition & 1 deletion cldf/lexicon-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
{
"rdf:about": "https://github.com/lexibank/lexibank-study/",
"rdf:type": "prov:Entity",
"dc:created": "v1.0-61-g451c2aa",
"dc:created": "v1.0-77-g3a4dc05",
"dc:title": "Repository"
},
{
Expand Down
4 changes: 2 additions & 2 deletions cldf/lingpy-rcParams.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
10,
10
],
"filename": "lingpy-2025-01-10",
"filename": "lingpy-2025-01-13",
"gap_symbol": "-",
"gap_weight": 0.5,
"gop": -2,
Expand Down Expand Up @@ -123,7 +123,7 @@
"scorer": {},
"sonar": true,
"stress": "\u02c8\u02cc'",
"timestamp": "2025-01-10 14:12",
"timestamp": "2025-01-13 09:40",
"tones": "\u00b9\u00b2\u00b3\u2074\u2075\u2076\u2077\u2078\u2079\u2070\u2081\u2082\u2083\u2084\u2085\u2086\u2087\u2088\u2089\u20800123456789\u02e5\u02e6\u02e7\u02e8\u02e9\u02ea\u02eb-\ua708-\ua709-\ua70a-\ua70b-\ua70c-\ua70d-\ua70e-\ua70f-\ua710-\ua711-\ua712-\ua713-\ua714-\ua715-\ua716-\ua717-\ua718-\ua719-\ua71a-\ua700-\ua701-\ua702-\ua703-\ua704-\ua705-\ua706-\ua707",
"tree_calc": "neighbor",
"unique_sequences": true,
Expand Down
2 changes: 1 addition & 1 deletion cldf/phonemes-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
{
"rdf:about": "https://github.com/lexibank/lexibank-study/",
"rdf:type": "prov:Entity",
"dc:created": "v1.0-61-g451c2aa",
"dc:created": "v1.0-77-g3a4dc05",
"dc:title": "Repository"
},
{
Expand Down
2 changes: 1 addition & 1 deletion cldf/phonology-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
{
"rdf:about": "https://github.com/lexibank/lexibank-study/",
"rdf:type": "prov:Entity",
"dc:created": "v1.0-61-g451c2aa",
"dc:created": "v1.0-77-g3a4dc05",
"dc:title": "Repository"
},
{
Expand Down
38 changes: 2 additions & 36 deletions cldf/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,88 +1,54 @@
appdirs==1.4.4
arrow==1.3.0
asttokens==3.0.0
attrs==24.3.0
babel==2.16.0
bibtexparser==2.0.0b8
bs4==0.0.2
Cartopy==0.24.1
certifi==2024.12.14
chardet==5.2.0
cldfbench==1.14.0
cldfcatalog==1.5.1
cldfviz==1.3.0
cldfzenodo==2.1.2
clldutils==3.24.0
cltoolkit==0.2.0
colorama==0.4.6
colorlog==6.9.0
commonnexus==1.9.2
csvw==3.5.1
cycler==0.12.1
decorator==5.1.1
executing==2.1.0
gitdb==4.0.12
greenlet==3.1.1
idna==3.10
iniconfig==2.0.0
ipython==8.31.0
isodate==0.7.2
jedi==0.19.2
Jinja2==3.1.5
jmespath==1.0.1
jsonschema==4.23.0
kiwisolver==1.4.8
lingpy==2.6.13
lxml==5.3.0
Markdown==3.7
MarkupSafe==3.0.2
matplotlib==3.10.0
multipledispatch==1.0.0
nameparser==1.1.3
networkx==3.4.2
newick==1.9.0
numpy==2.2.1
openpyxl==3.1.5
packaging==24.2
parso==0.8.4
pluggy==1.5.0
prompt_toolkit==3.0.48
pure_eval==0.2.3
pybtex==0.24.0
pycldf==1.40.2
pycldf==1.40.3
pyclts==3.2.0
pyconcepticon==3.1.0
pycountry==24.6.1
pyglottolog==3.14.0
Pygments==2.18.0
pylatexenc==2.10
pylexibank==3.5.0
pyparsing==3.2.1
pyproj==3.7.0
pytest==8.3.4
python-dateutil==2.9.0.post0
python-frontmatter==1.1.0
rdflib==7.1.1
rdflib==7.1.2
referencing==0.35.1
regex==2024.11.6
reportlab==4.2.5
requests==2.32.3
rfc3986==1.5.0
scipy==1.14.1
segments==2.2.1
shapely==2.0.6
six==1.17.0
smmap==5.0.2
soupsieve==2.6
SQLAlchemy==1.4.54
tabulate==0.9.0
termcolor==2.5.0
toyplot==2.0.0
toytree==2.0.5
tqdm==4.67.1
traitlets==5.14.3
uritemplate==4.1.1
urllib3==2.3.0
wcwidth==0.2.13
xlrd==2.0.1
zenodoclient==0.5.1
2 changes: 1 addition & 1 deletion cldf/wordlist-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
{
"rdf:about": "https://github.com/lexibank/lexibank-study/",
"rdf:type": "prov:Entity",
"dc:created": "v1.0-61-g451c2aa",
"dc:created": "v1.0-77-g3a4dc05",
"dc:title": "Repository"
},
{
Expand Down
9 changes: 0 additions & 9 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,11 @@
platforms='any',
python_requires='>=3.8',
install_requires=[
'collabutils[googlesheets]',
'cldfbench>=1.7.2',
'cltoolkit>=0.1.1',
'cldfviz>=0.3.0',
'cldfzenodo',
'pylexibank',
'attrs>=18.2',
'clldutils>=3.5',
'cldfcatalog>=1.3',
'csvw>=1.6',
'pycldf',
'uritemplate',
'lingpy>=2.6.8',
'pyclts>=3.1',
'cartopy',
'pillow',
'matplotlib',
Expand Down
16 changes: 11 additions & 5 deletions workflow.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
# Using the Lexibank Data Repository

Lexibank is a collection of lexical datasets provided in [CLDF](https://cldf.clld.org) formats. These CLDF datasets were compiled with the help of the `pylexibank` package, which is an extension for the [CLDFBench](https://github.com/cldf/cldfbench) package for handling CLDF datasets. Since data in the lexibank collection is maximally integrated with cross-linguistic resources that have been compiled during the past years, it is possible to make active use of the data to compute many features (lexical and phonological) automatically. In the following, we will describe the major workflow.
Lexibank is a collection of lexical datasets provided in
[CLDF](https://cldf.clld.org) formats. These CLDF datasets were compiled with
the help of the `pylexibank` package, which is an extension for the
[CLDFBench](https://github.com/cldf/cldfbench) package for handling CLDF
datasets. Since data in the lexibank collection is maximally integrated with
cross-linguistic resources that have been compiled during the past years, it is
possible to make active use of the data to compute many features (lexical and
phonological) automatically. In the following, we will describe the major
workflow.

## 1 Lexibank Collection

The lexibank collection consists of mainly two types of datasets:

1. CLDF datasets linked to Concepticon and Glottolog with consistent lexeme forms which have a
sufficient size in terms of concepts covered. This collection is called `ClicsCore` collection, since
sufficient size in terms of concepts covered. This collection is called the `ClicsCore` collection, since
it fulfills the criteria to be included in the [CLICS](https://clics.clld.org) database. The collection
can be used to compute various lexical features for individual language varieties.
2. CLDF datasets linked to Concepticon and Glottolog with lexeme forms which are transcribed in the BIPA
Expand All @@ -17,9 +25,7 @@ The lexibank collection consists of mainly two types of datasets:

The decision about which datasets are assigned to which collection is currently carried out by the board of lexibank editors, who estimate how well each of the datasets qualifies for the inclusion in either or both collections. The decisions are available in the form of a spreadsheet, shared along with this repository (see [etc/lexibank.tsv](etc/lexibank.tsv)).

The authoritative spreadsheet itself is curated on the [nextcloud server of MPI-EVA](https://share.eva.mpg.de/index.php/s/dqmqQn567P4PKie).
For now, however, we experience problems with the nextcloud server and therefore edit the spreadsheet on
[GoogleSheets](https://docs.google.com/spreadsheets/d/1x8c_fuWkUYpDKedn2mNkKFxpwtHCFAOBUeRT8Mihy3M/edit?usp=sharing).
Upon each new release of Lexibank, the spreadsheet is updated, individual datasets in CLDF are shared on Zenodo, and published with a dedicated version also on GitHub, in order to document their status clearly.


## 2 Lexibank Workflow
Expand Down

0 comments on commit 45e5851

Please sign in to comment.