Skip to content

Commit a4987bb

Browse files
mela (#1970)
* mela * Update mela_en.yaml * Create _mela.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>
1 parent b536f06 commit a4987bb

File tree

12 files changed

+127
-0
lines changed

12 files changed

+127
-0
lines changed

lm_eval/tasks/mela/README.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Task-name
2+
3+
### Paper
4+
5+
Title: [MELA: Multilingual Evaluation of Linguistic Acceptability](https://arxiv.org/abs/2311.09033)
6+
7+
**Abstract**: In this work, we present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability -- MELA, with 46K samples covering 10 languages from a diverse set of language families. We establish LLM baselines on this benchmark, and investigate cross-lingual transfer in acceptability judgements with XLM-R. In pursuit of multilingual interpretability, we conduct probing experiments with fine-tuned XLM-R to explore the process of syntax capability acquisition. Our results show that GPT-4o exhibits a strong multilingual ability, outperforming fine-tuned XLM-R, while open-source multilingual models lag behind by a noticeable gap. Cross-lingual transfer experiments show that transfer in acceptability judgment is non-trivial: 500 Icelandic fine-tuning examples lead to 23 MCC performance in a completely unrelated language -- Chinese. Results of our probing experiments indicate that training on MELA improves the performance of XLM-R on syntax-related tasks.
8+
9+
Homepage: https://github.com/sjtu-compling/MELA
10+
11+
### Citation
12+
13+
```
14+
@inproceedings{zhang2023mela,
15+
author = {Ziyin Zhang and
16+
Yikang Liu and
17+
Weifang Huang and
18+
Junyu Mao and
19+
Rui Wang and
20+
Hai Hu},
21+
title = {{MELA:} Multilingual Evaluation of Linguistic Acceptability},
22+
booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), {ACL} 2024, Bangkok, Thailand},
23+
publisher = {Association for Computational Linguistics},
24+
year = {2024},
25+
url = {https://doi.org/10.48550/arXiv.2311.09033}
26+
}
27+
```
28+
29+
### Groups and Tasks
30+
31+
#### Groups
32+
33+
- `mela`: multilingual evaluation of linguistic acceptability
34+
35+
#### Tasks
36+
37+
- `mela_en`: English
38+
- `mela_zh`: Chinese
39+
- `mela_it`: Italian
40+
- `mela_ru`: Russian
41+
- `mela_de`: Germany
42+
- `mela_fr`: French
43+
- `mela_es`: Spanish
44+
- `mela_ja`: Japanese
45+
- `mela_ar`: Arabic
46+
- `mela_ar`: Icelandic
47+
48+
### Checklist
49+
50+
For adding novel benchmarks/datasets to the library:
51+
52+
- [x] Is the task an existing benchmark in the literature?
53+
- [x] Have you referenced the original paper that introduced the task?
54+
- [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
55+
56+
If other tasks on this dataset are already supported:
57+
58+
- [ ] Is the "Main" variant of this task clearly denoted?
59+
- [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
60+
- [ ] Have you noted which, if any, published evaluation setups are matched by this variant?

lm_eval/tasks/mela/_mela.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
group: mela
2+
task:
3+
- mela_en
4+
- mela_zh
5+
- mela_it
6+
- mela_ru
7+
- mela_de
8+
- mela_fr
9+
- mela_es
10+
- mela_ja
11+
- mela_ar
12+
- mela_ar
13+
aggregate_metric_list:
14+
- metric: mcc
15+
weight_by_size: False
16+
metadata:
17+
version: 1

lm_eval/tasks/mela/mela_ar.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include: mela_en.yaml
2+
task: mela_ar
3+
dataset_name: ar
4+
training_split: null

lm_eval/tasks/mela/mela_de.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include: mela_en.yaml
2+
task: mela_de
3+
dataset_name: de
4+
training_split: null

lm_eval/tasks/mela/mela_en.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
task: mela_en
2+
dataset_path: Geralt-Targaryen/MELA
3+
dataset_name: en
4+
training_split: train
5+
validation_split: dev
6+
test_split: test
7+
output_type: multiple_choice
8+
doc_to_text: "Sentence: {{sentence}}\nDetermine whether this sentence is acceptable or unacceptable?\nA. Acceptable\nB. Unacceptable\nAnswer:"
9+
doc_to_choice: ["A", "B"]
10+
doc_to_target: "{{['B', 'A'][label]}}"
11+
description: "Determine whether the following sentence(s) violate certain linguistic constraints. If yes, then it is \"unacceptable\"; otherwise, \"acceptable\".\n\n"
12+
fewshot_split: dev
13+
fewshot_config:
14+
sampler: first_n
15+
metric_list:
16+
- metric: mcc
17+
higher_is_better: true

lm_eval/tasks/mela/mela_es.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include: mela_en.yaml
2+
task: mela_es
3+
dataset_name: es
4+
training_split: null

lm_eval/tasks/mela/mela_fr.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include: mela_en.yaml
2+
task: mela_fr
3+
dataset_name: fr
4+
training_split: null

lm_eval/tasks/mela/mela_is.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include: mela_en.yaml
2+
task: mela_is
3+
dataset_name: is
4+
training_split: null

lm_eval/tasks/mela/mela_it.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
include: mela_en.yaml
2+
task: mela_it
3+
dataset_name: it

lm_eval/tasks/mela/mela_ja.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include: mela_en.yaml
2+
task: mela_ja
3+
dataset_name: ja
4+
training_split: null

lm_eval/tasks/mela/mela_ru.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
include: mela_en.yaml
2+
task: mela_ru
3+
dataset_name: ru

lm_eval/tasks/mela/mela_zh.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
include: mela_en.yaml
2+
task: mela_zh
3+
dataset_name: zh

0 commit comments

Comments
 (0)