Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ask questions about the TAT-QA evaluation set #11

Open
Ningshiqi opened this issue Sep 20, 2024 · 0 comments
Open

Ask questions about the TAT-QA evaluation set #11

Ningshiqi opened this issue Sep 20, 2024 · 0 comments

Comments

@Ningshiqi
Copy link

Should the evaluation set use tatqa_dataset_dev.json or tatqa_dataset_test.json? In the submission section of https://nextplusplus.github.io/TAT-DQA/, the script: python tatqa_eval.py --gold_path=:path_to_dev --pred_path=:path_to_predictions seems to use the dev dataset for evaluation.

Additionally, the TABLELLM GitHub code also uses the dev dataset for evaluation. However, this confuses me, as it seems different from our usual definitions of train, dev, and test—where dev is generally used to evaluate model performance during training, and the final evaluation should use test. Yet, both the TQA official website and the TABLELLM paper appear to use dev as the final evaluation test set.

So, should I use the dev dataset as the final test set for the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant