Open
Description
Hi Team,
I was using the bigcode-evaluation-harness to evaluate generation for go on Multiple-E dataset and found that, all the evaluation had output ? command-line-arguments [no test files]
although status_code = 0
.
On debugging further, it looks like we set self.language here instead of prompt_name['langugage']
in the problem dict to process execution downstream, and when language is checked in evaluators here, it is appended without _test.go
suffix leading to non detecting any test files.
To make it easy to repro this, I have added a video below which evaluate one go generation test case (used deepseek coder to generate this)
generations_go_example.json
[
[
"package strlen_test\n\nimport (\n \"testing\"\n \"fmt\"\n)\n\n// Return length of given string\n// >>> strlen(\"\")\n// 0\n// >>> strlen(\"abc\")\n// 3\nfunc strlen(myString string) int {\n return len(myString)\n}\n"
]
]
bigcode_go_test_file_name_issue.mp4
Metadata
Metadata
Assignees
Labels
No labels