[Feature Request] Add option to cache model compilation for modular/max-openai-api #4031

ematejska · 2025-02-26T21:40:49Z

Originally filed by remorses in #271. With the merge of max/mojo repo, reopening here.

What is your request?

I tried deploying modular/max-openai-api to fly.io, but it takes a lot of time to do the first compilation of the model, is it possible to cache the model compilation on disk?

What is your motivation for this change?

Add --model-compile-cache=/.root/model parameter

Any other details?

Fly.io is a serverless GPU deployment platform, the machine is stopped and started often, now model compilation is too slow to be able to deploy in this kind of infrastructure

The text was updated successfully, but these errors were encountered:

ematejska added enhancement New feature or request max labels Feb 26, 2025

ematejska added the max-repo label Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add option to cache model compilation for modular/max-openai-api #4031

[Feature Request] Add option to cache model compilation for modular/max-openai-api #4031

ematejska commented Feb 26, 2025

[Feature Request] Add option to cache model compilation for modular/max-openai-api #4031

[Feature Request] Add option to cache model compilation for modular/max-openai-api #4031

Comments

ematejska commented Feb 26, 2025

What is your request?

What is your motivation for this change?

Any other details?