If you use these data please cite
- the original source
Blum, Frederic; Barrientos, Carlos; Englisch, Johannes; Forkel, Robert; Gray, Russell D.; Greenhill, Simon J.; Rzymski, Christoph and List, Johann-Mattis (2025): Lexibank²: Precomputed Features for Large-Scale Lexical Data [Dataset, Version 2.0]. Leipzig: Max Planck Institute for Evolutionary Anthropology.
- the derived dataset using the DOI of the particular released version you were using
This dataset is licensed under a CC-BY-4.0 license
Available online at https://lexibank.clld.org
The core-sets are defined by using the following criteria:
- Varieties: 4,745 (linked to 2,791 different Glottocodes)
- Concepts: 3,204 (linked to 3,204 different Concepticon concept sets)
- Lexemes: 1,663,640
- Sources: 128
- Synonymy: 1.09
- Invalid lexemes: 0
- Tokens: 9,286,763
- Segments: 2,380 (0 BIPA errors, 0 CLTS sound class errors, 2371 CLTS modified)
- Inventory size (avg): 39.52
- Languages linked to bookkeeping languoids in Glottolog:
Name | GitHub user | Description | Role |
---|---|---|---|
Frederic Blum | @FredericBlum | maintainer | Author |
Carlos Barrientos | @MuffinLinwist | maintainer | Author |
Johannes Englisch | @johenglisch | maintainer | Author |
Robert Forkel | @xrotwang | maintainer | Author |
Russell D. Gray | maintainer | Author | |
Simon J. Greenhill | @simongreenhill | maintainer | Author |
Christoph Rzymski | @chrzyki | maintainer | Author |
Johann-Mattis List | @LinguList | maintainer | Author |
The following CLDF datasets are available in cldf: