Add debug mode #1505

rlouf · 2025-03-19T17:21:02Z

We could add a logits processor that checks whether we're forbidding tokens that should be allowed or conversely.

parkervg · 2025-03-22T06:31:04Z

For more context: the reason I think this may be useful comes from an experience I had with outlines' internal JSON grammar. I believe it may be related to #994, and might even warrant setting up another issue?

I had finetuned a language model using json.dumps on Python dictionaries. Some of the JSON values had newline characters. During inference, when I applied outlines JSON constraints, I noticed a drop in performance, with many generations ending with something like {"answer": "Some text here \"}.

What seems to be happening is the model was trying to mimic a pattern it saw in finetuning - a \n character with a single backslash. But, shown in the code snippet below, this is invalid under the STRING_INNER regex in outlines_core

from outlines_core.fsm.json_schema import build_regex_from_schema
schema_object = '{"type": "json", "properties": {"answer": {"title": "Answer", "type": "string"}}}'
regex_str = build_regex_from_schema(schema_object)
re_pattern = re.compile(regex_str)

assert re_pattern.search(json.dumps({"answer": "This is an answer \n with a newline"})) is None 
assert re_pattern.search(json.dumps({"answer": r"This is an answer \n with a newline"})) is not None # Converting to a raw string first works (`\\n`)

The solution for this particular usecase is to transform the Python dict values to a raw string prior to calling json.dumps, to make sure we get \\n in the text. But this only became apparent to me after digging around into the underlying regular expression guiding JSON strings in outlines. It would be awesome to have a debugging feature that could flag this sort of behavior (grammar constraints causing different outputs from the unconstrained greedy decoding alternative) automatically (outlines.generate.json(..., debug=True)?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add debug mode #1505

Add debug mode #1505

rlouf commented Mar 19, 2025

parkervg commented Mar 22, 2025 •

edited

Loading

Add debug mode #1505

Add debug mode #1505

Comments

rlouf commented Mar 19, 2025

parkervg commented Mar 22, 2025 • edited Loading

parkervg commented Mar 22, 2025 •

edited

Loading