Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add option to cache model compilation for modular/max-openai-api #4031

Open
ematejska opened this issue Feb 26, 2025 · 0 comments
Labels
enhancement New feature or request max max-repo

Comments

@ematejska
Copy link
Collaborator

Originally filed by remorses in #271. With the merge of max/mojo repo, reopening here.

What is your request?

I tried deploying modular/max-openai-api to fly.io, but it takes a lot of time to do the first compilation of the model, is it possible to cache the model compilation on disk?

What is your motivation for this change?

Add --model-compile-cache=/.root/model parameter

Any other details?

Fly.io is a serverless GPU deployment platform, the machine is stopped and started often, now model compilation is too slow to be able to deploy in this kind of infrastructure

@ematejska ematejska added enhancement New feature or request max labels Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request max max-repo
Projects
None yet
Development

No branches or pull requests

1 participant