You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Should the evaluation set use tatqa_dataset_dev.json or tatqa_dataset_test.json? In the submission section of https://nextplusplus.github.io/TAT-DQA/, the script: python tatqa_eval.py --gold_path=:path_to_dev --pred_path=:path_to_predictions seems to use the dev dataset for evaluation.
Additionally, the TABLELLM GitHub code also uses the dev dataset for evaluation. However, this confuses me, as it seems different from our usual definitions of train, dev, and test—where dev is generally used to evaluate model performance during training, and the final evaluation should use test. Yet, both the TQA official website and the TABLELLM paper appear to use dev as the final evaluation test set.
So, should I use the dev dataset as the final test set for the model?
The text was updated successfully, but these errors were encountered:
Should the evaluation set use tatqa_dataset_dev.json or tatqa_dataset_test.json? In the submission section of https://nextplusplus.github.io/TAT-DQA/, the script: python tatqa_eval.py --gold_path=:path_to_dev --pred_path=:path_to_predictions seems to use the dev dataset for evaluation.
Additionally, the TABLELLM GitHub code also uses the dev dataset for evaluation. However, this confuses me, as it seems different from our usual definitions of train, dev, and test—where dev is generally used to evaluate model performance during training, and the final evaluation should use test. Yet, both the TQA official website and the TABLELLM paper appear to use dev as the final evaluation test set.
So, should I use the dev dataset as the final test set for the model?
The text was updated successfully, but these errors were encountered: