-
Notifications
You must be signed in to change notification settings - Fork 41
What is the prompt format? #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The OpenAI endpoint will receive the user prompts and output the response, as described in the code. Regarding the prompt context, please refer to this block.
You should be able to run the evaluation without changing the current implementation. Passing a |
Thank you. Referring to that block in utility.py, It is a great start, but I'd like to know it in terms of the dataset columns. I want to know the column mapping from the code to the dataset (https://huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?row=0&views%5B%5D=v010_hf)
I'm assuming:
Task_prompt -> instruction_prompt / complete_prompt
Instruction_prefix -> ?
Also, I don't quite understand why there is a "response" variable. Could you please clarify that? Is that the expected output
…________________________________
From: Terry Yue Zhuo ***@***.***>
Sent: Tuesday, April 1, 2025 10:12 AM
To: bigcode-project/bigcodebench ***@***.***>
Cc: blazgocompany ***@***.***>; Author ***@***.***>
Subject: Re: [bigcode-project/bigcodebench] What is the prompt format? (Issue #91)
The OpenAI endpoint will receive the user prompts and output the response, as described in the code<https://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/gen/util/openai_request.py#L29>.
Regarding the prompt context, please refer to this block<https://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/provider/utility.py#L43-L60>.
I have an OpenAI compatible endpoint that I'm prepping up for evaluation
You should be able to run the evaluation without changing the current implementation. Passing a base_url arg should be enough. If you need any other customization, please check out this doc<https://github.com/bigcode-project/bigcodebench/blob/main/ADVANCED_USAGE.md>.
—
Reply to this email directly, view it on GitHub<#91 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXA4YJMLO5E2ZKETYB4RVKT2XLCGJAVCNFSM6AAAAAB2HMJHQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZQGA3DCMRUHA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
[terryyz]terryyz left a comment (bigcode-project/bigcodebench#91)<#91 (comment)>
The OpenAI endpoint will receive the user prompts and output the response, as described in the code<https://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/gen/util/openai_request.py#L29>.
Regarding the prompt context, please refer to this block<https://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/provider/utility.py#L43-L60>.
I have an OpenAI compatible endpoint that I'm prepping up for evaluation
You should be able to run the evaluation without changing the current implementation. Passing a base_url arg should be enough. If you need any other customization, please check out this doc<https://github.com/bigcode-project/bigcodebench/blob/main/ADVANCED_USAGE.md>.
—
Reply to this email directly, view it on GitHub<#91 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXA4YJMLO5E2ZKETYB4RVKT2XLCGJAVCNFSM6AAAAAB2HMJHQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZQGA3DCMRUHA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Instruction_prefix -> these two variables The "response" variable is used for profiling, the same design as inside EvalPlus. It won't be used by any model APIs. |
@terryyz While evaluating, I'm getting issues with eval (0.00 pass@1), and:
Is there a way I can debug this? |
The pointer of Instruction_prefix has been fixed.
I think you may use this on Windows, where you are supposed to run inside a Linux env. You may find this helpful. |
@terryyz Ok, but why is it giving 0 pass@1, In fact inorder to test it i made my code retrive the hf dataset (bcb_hard) and if the input matches one of the prompts in the dataset it replies with the last codeblock in the input (because it says write code starting with...) Plus the canonical_solution in the dataset. When tested manually it is fine but it fails here |
|
@terryyz No module named "pytesseract", "lxml", "sklearn", etc, etc I'm using Linux now (Jupyter notebook on HF Spaces) and getting low pass rate because of all theses import errors. Is my model supposed to respond with a dynamically importing solution? But when I see the santized prompts for o3-mini-high, I see it never does that. It just imports normally. So do I need to import them myself before evaluating? or is the model supposed to use subprocess.run('pip install module')? |
I have an OpenAI compatible endpoint that I'm prepping up for evaluation, and I want to know what is the FINAL end request being sent? e.g:
(I'm looking at the HF dataset https://huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?row=0&views%5B%5D=v010_hf)
Basically what format should I expect and what should I send back?
The text was updated successfully, but these errors were encountered: