Skip to content

What is the prompt format? #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
blazgocompany opened this issue Apr 1, 2025 · 8 comments
Open

What is the prompt format? #91

blazgocompany opened this issue Apr 1, 2025 · 8 comments

Comments

@blazgocompany
Copy link

blazgocompany commented Apr 1, 2025

I have an OpenAI compatible endpoint that I'm prepping up for evaluation, and I want to know what is the FINAL end request being sent? e.g:
(I'm looking at the HF dataset https://huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?row=0&views%5B%5D=v010_hf)

[
{
"role":"user"
"content": "{complete_prompt if complete else instruct_prompt}"
},
{
"role":"assistant"
"content": "Sure, I can do that! Blah Blah Blah: ```python {canonical_solution} ``` This function does this and that..."
}
]

Basically what format should I expect and what should I send back?

@terryyz
Copy link
Collaborator

terryyz commented Apr 1, 2025

The OpenAI endpoint will receive the user prompts and output the response, as described in the code.

Regarding the prompt context, please refer to this block.

I have an OpenAI compatible endpoint that I'm prepping up for evaluation

You should be able to run the evaluation without changing the current implementation. Passing a base_url arg should be enough. If you need any other customization, please check out this doc.

@blazgocompany
Copy link
Author

blazgocompany commented Apr 1, 2025 via email

@terryyz
Copy link
Collaborator

terryyz commented Apr 2, 2025

Instruction_prefix -> these two variables

The "response" variable is used for profiling, the same design as inside EvalPlus. It won't be used by any model APIs.

@blazgocompany
Copy link
Author

blazgocompany commented Apr 7, 2025

@terryyz While evaluating, I'm getting issues with eval (0.00 pass@1), and:

AttributeError: module 'os' has no attribute 'killpg'

Is there a way I can debug this?

@terryyz
Copy link
Collaborator

terryyz commented Apr 9, 2025

The pointer of Instruction_prefix has been fixed.

Is there a way I can debug this?

I think you may use this on Windows, where you are supposed to run inside a Linux env. You may find this helpful.

@blazgocompany
Copy link
Author

@terryyz Ok, but why is it giving 0 pass@1, In fact inorder to test it i made my code retrive the hf dataset (bcb_hard) and if the input matches one of the prompts in the dataset it replies with the last codeblock in the input (because it says write code starting with...) Plus the canonical_solution in the dataset. When tested manually it is fine but it fails here

@terryyz
Copy link
Collaborator

terryyz commented Apr 11, 2025

AttributeError: module 'os' has no attribute 'killpg' is a part of the evaluation logic. If you fail on the os.killpg`, you basically cannot run the evaluation correctly. You have to make sure that you are running inside a correct environment first.

@blazgocompany
Copy link
Author

@terryyz No module named "pytesseract", "lxml", "sklearn", etc, etc

I'm using Linux now (Jupyter notebook on HF Spaces) and getting low pass rate because of all theses import errors. Is my model supposed to respond with a dynamically importing solution? But when I see the santized prompts for o3-mini-high, I see it never does that. It just imports normally. So do I need to import them myself before evaluating? or is the model supposed to use subprocess.run('pip install module')?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants