This document provides examples training, evaluating and predicting with dependency parsers. All files passed through input arguments to load data are CoNLL files.
Identifier | Parser | Paper | Arguments |
---|---|---|---|
dep-idx |
Absolute and relative indexing | Strzyz et al. (2019) | rel |
dep-pos |
PoS-tag relative indexing | Strzyz et al. (2019) | gold |
dep-bracket |
Bracketing encoding ( |
Strzyz et al. (2020) | k |
dep-bit4 |
|
Gómez-Rodríguez et al. (2023) | proj |
dep-bit7 |
|
Gómez-Rodríguez et al. (2023) | |
dep-eager |
Arc-Eager system | Nivre and Fernández-González (2002) |
stack , buffer , proj
|
dep-biaffine |
Biaffine dependency parser | Dozat et al. (2016) | |
dep-hexa |
Hexa-Tagging | Amini et al. (2023) | proj |
- Absolute indexing (Strzyz et al., 2019) with XLNet (Yang et al., 2019) as encoder. Add
--rel
argument as<specific-args>
to exchange absolute for relative positions.
python3 run.py dep-idx -p results/dep-idx-xlnet -c configs/xlnet.ini \
train --train treebanks/english-ewt/train.conllu \
--dev treebanks/english-ewt/dev.conllu \
--test treebanks/english-ewt/test.conllu --num-workers 20
- PoS-based relative indexing (Strzyz et al., 2019) with BiLSTMs as encoder. Add
--gold
argument as<specific-args>
to not predict the PoS-tags but use the gold annotations.
python3 run.py dep-pos -p results/dep-pos-bilstm -c configs/bilstm.ini \
train --train treebanks/english-ewt/train.conllu \
--dev treebanks/english-ewt/dev.conllu \
--test treebanks/english-ewt/test.conllu --num-workers 20
- Bracketing encoding (Strzyz et al., 2020) with
$k=2$ with XLM (Conneau et al., 2019) as encoder:
python3 run.py dep-bracket -k 2 -p results/dep-bracket-xlm -c configs/xlm.ini \
train --train treebanks/english-ewt/train.conllu \
--dev treebanks/english-ewt/dev.conllu \
--test treebanks/english-ewt/test.conllu --num-workers 20
- Arc-Eager transition-based system (Nivre and Fernández-González, 2002) where each state is represented with 1 position of the stack and 2 positions of the buffer. Use XLM (Conneau et al., 2019) as encoder:
python3 run.py dep-eager --stack 1 --buffer 2 -p results/dep-eager-xlm -c configs/xlm.ini \
train --train treebanks/english-ewt/train.conllu \
--dev treebanks/english-ewt/dev.conllu \
--test treebanks/english-ewt/test.conllu --num-workers 20
- Hexa-Tagging (Amini et al., 2023) with XLNet as encoder and the head pseudo-projective transformation (modes available are
head
,head+path
,path
) from Nivre and Nilsson (2005):
python3 run.py dep-hexa -p results/dep-hexa-xlnet -c configs/xlnet.ini --proj head \
--train treebanks/english-ewt/train.conllu \
--dev treebanks/english-ewt/dev.conllu \
--test treebanks/english-ewt/test.conllu --num-workers 20
Evaluate now the trained parser at results/dep-bracket-xlm/parser.pt
with the same test file:
python3 run.py dep-bracket -p results/dep-bracket-xlm/parser.pt eval treebanks/english-ewt/test.conllu --batch-size 50
Predict with the trained parser at results/dep-hexa-xlnet
:
python3 run.py dep-hexa -p results/dep-hexa-xlnet/parser.pt predict \
treebanks/english-ewt/test.conllu results/dep-hexa-xlnet/pred.conllu --batch-size 50