Lexibank Analysed

How to cite

If you use these data please cite

the original source

Blum, Frederic; Barrientos, Carlos; Englisch, Johannes; Forkel, Robert; Gray, Russell D.; Greenhill, Simon J.; Rzymski, Christoph and List, Johann-Mattis (2025): Lexibank²: Precomputed Features for Large-Scale Lexical Data [Dataset, Version 2.0]. Leipzig: Max Planck Institute for Evolutionary Anthropology.
the derived dataset using the DOI of the particular released version you were using

This dataset is licensed under a CC-BY-4.0 license

The core-sets are defined by using the following criteria:

Varieties: 4,745 (linked to 2,791 different Glottocodes)
Concepts: 3,204 (linked to 3,204 different Concepticon concept sets)
Lexemes: 1,663,640
Sources: 128
Synonymy: 1.09
Invalid lexemes: 0
Tokens: 9,286,763
Segments: 2,380 (0 BIPA errors, 0 CLTS sound class errors, 2371 CLTS modified)
Inventory size (avg): 39.52

Name	GitHub user	Description	Role
Frederic Blum	@FredericBlum	maintainer	Author
Carlos Barrientos	@MuffinLinwist	maintainer	Author
Johannes Englisch	@johenglisch	maintainer	Author
Robert Forkel	@xrotwang	maintainer	Author
Russell D. Gray		maintainer	Author
Simon J. Greenhill	@simongreenhill	maintainer	Author
Christoph Rzymski	@chrzyki	maintainer	Author
Johann-Mattis List	@LinguList	maintainer	Author

The following CLDF datasets are available in cldf: