Skip to content

Latest commit

 

History

History
87 lines (58 loc) · 4.2 KB

README.md

File metadata and controls

87 lines (58 loc) · 4.2 KB

Lexibank Analysed

CLDF validation

How to cite

If you use these data please cite

  • the original source

    Blum, Frederic; Barrientos, Carlos; Englisch, Johannes; Forkel, Robert; Gray, Russell D.; Greenhill, Simon J.; Rzymski, Christoph and List, Johann-Mattis (2025): Lexibank²: Precomputed Features for Large-Scale Lexical Data [Dataset, Version 2.0]. Leipzig: Max Planck Institute for Evolutionary Anthropology.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://lexibank.clld.org

Notes

Core Sets

The core-sets are defined by using the following criteria:

Statistics

CLDF validation Glottolog: 100% Concepticon: 100% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 4,745 (linked to 2,791 different Glottocodes)
  • Concepts: 3,204 (linked to 3,204 different Concepticon concept sets)
  • Lexemes: 1,663,640
  • Sources: 128
  • Synonymy: 1.09
  • Invalid lexemes: 0
  • Tokens: 9,286,763
  • Segments: 2,380 (0 BIPA errors, 0 CLTS sound class errors, 2371 CLTS modified)
  • Inventory size (avg): 39.52

Possible Improvements:

Contributors

Name GitHub user Description Role
Frederic Blum @FredericBlum maintainer Author
Carlos Barrientos @MuffinLinwist maintainer Author
Johannes Englisch @johenglisch maintainer Author
Robert Forkel @xrotwang maintainer Author
Russell D. Gray maintainer Author
Simon J. Greenhill @simongreenhill maintainer Author
Christoph Rzymski @chrzyki maintainer Author
Johann-Mattis List @LinguList maintainer Author

CLDF Datasets

The following CLDF datasets are available in cldf: