Skip to content

Enable detokenizing special tokens #1596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 29, 2024
Merged

Conversation

benniekiss
Copy link
Contributor

I noticed that it was not possible to detokenize special tokens, such as the EOS token, when using Phi-3(.1). This PR makes sure that the special flag can be passed to the detokenize() method of both the LlamaTokenizer and the LlamaHFTokenizer

I also noticed that prev_tokens was not being used in the detokenize method of LlamaTokenizer, so this PR also adds that functionality based on LlamaHFTokenizer

@benniekiss benniekiss force-pushed the detokenize_special branch 3 times, most recently from ac6efb6 to 0e6d843 Compare July 23, 2024 14:30
@benniekiss benniekiss force-pushed the detokenize_special branch from 0e6d843 to b884fc6 Compare July 29, 2024 10:36
@benniekiss
Copy link
Contributor Author

Hey @abetlen , when you get a chance, would you give this a review?

@benniekiss benniekiss force-pushed the detokenize_special branch 3 times, most recently from 0f105ba to bf241c1 Compare August 7, 2024 15:07
@benniekiss benniekiss force-pushed the detokenize_special branch 2 times, most recently from c75f724 to e3e7da0 Compare August 20, 2024 16:34
@benniekiss benniekiss force-pushed the detokenize_special branch 3 times, most recently from 39d4f0e to c4f73a2 Compare August 23, 2024 11:06
@abetlen
Copy link
Owner

abetlen commented Aug 29, 2024

Hey @benniekiss thank you for the contribution!

I reverted the changes to prev_token in the LlamaTokenizer and changed special to default to False everywhere to avoid any possible breaking changes. Hope this works for you!

@abetlen abetlen merged commit d981d32 into abetlen:main Aug 29, 2024
13 checks passed
@benniekiss
Copy link
Contributor Author

That's perfect! Thank you for taking the time to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants