-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs update for feature summary #11
base: master
Are you sure you want to change the base?
Conversation
Thanks @mart-r, this summary was definitely missing. Just one thing, what you're seeing around MLflow looks like a bug and presumably doesn't match the expected behaviour. I believe the intention there was to allow running the CMS services without the MLflow server, simply using the MLflow client to track runs with local files. Note, for instance, how we set the default value for the CogStack-ModelServe/app/envs/.env Line 12 in ce8354f
On the other hand, the default environment variable value in the compose file is set to Perhaps @baixiac has some more insight here. Edit: If you set the default
We could remove the tracking URI from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the docs are quite shitty at the moment I have to say. Currently docker-compose.yml
assumes all services are containerised, which makes "auxiliary" a deceptive word to some degree. docker-compose-dev.yml
may be more appropriate for the used case raised by the OP? I can see four scenarios:
- CMS is running on the host OS and mlflow client logs to the local file system (e.g, /tmp/mlruns);
- CMS is running on the host OS and mlflow client logs to the remote DB;
- CMS is running in a container and mlflow client logs into the same container;
- Both CMS and mlflow DB are running in containers and there is no logging to local.
New users will likely end up with scenario 1 and a full deployment requires scenario 4. The README is getting bloated so a balance had to be struck for normal and advanced users. Start feeling README is not a good place for covering all details and a proper readthedocs
site is needed, as some other CogStack projects have got. Will log a ticket for this.
Thanks for all the input. So the way I wanted to spin this up is entirely possible, just not well documented. Good to know. And I agree, the README does look pretty big. But doesn't necessarily mean you have to have a separate In any case, I think it would make sense to have the main README go through the most common / expected use case and link to other documentation for more detailed stuff (like mine clearly is). PS: |
Hi, looking back, the one that best aligns with your need is scenario 3, i.e., the trained model and training metrics would be stored locally within your CMS container. In a production environment, this requires proper backup and housekeeping to prevent data loss and out-of-space errors (or maybe you deliberately want your CMS containers to be short-lived?). It would be difficult if not impossible to leverage the tracking UI in scenario 3, unless you can somehow proxy it to read the data buried inside a folder of the container. So to clarify your use-case/workflow, do you want sth like spinning up a CMS instance on-demand, firing up the training, downloading the trained model and/or metrics and finally terminating the instance to free up the resources on your machine? To move this PR forward, could creating a separate |
While trying to run the core functionality of CMS, I felt that the documentation wasn't really clear as to which features are enabled by which compose files.
For instance, if I follow the documentation and spin up a service for a Snomed model, I can run inference and evaluation on the model. However, if I try to run training on it, it fails due to not being able to acces MLFlow1.
And when I then run the services in
docker-compose-mlflow.yml
, this does work successfully.As such, this PR suggests an additional summary / overview of the features / functionality and what extra services they may need.
PS: There may be prerequisites I'm not aware of. But this was just be going back and trying to spin up a service that is able to run inference and training on a model. Perhaps there's ways to do this without mlflow, but the documentation does not seem to make it clear.
1 Exception when trying to train without mlflow:
Failed to resolve 'mlflow-ui'
The full exception: